feat: single-pass streaming Rust bigWig write path#235
Merged
Conversation
Phase 4 (bigWig slice) of the Rust migration: replace the count+read double-decode write path with a single-pass streaming Rust writer behind a byte-identical parity gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…itch Flips GVL_RUST_BIGWIG_WRITE default on and removes it. Legacy orchestration retained only for non-bigWig (Table) tracks. Roadmap Phase 4 bigWig slice updated with baseline + after numbers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…or instead of panic Convert four .expect() panic paths in write_track (open_file, exactly_one contig match, get_interval, per-interval read) to anyhow error returns using .with_context() and .map_err(). Failures now propagate as Err through the rayon closure and surface to Python as RuntimeError (an Exception subclass) instead of PanicException (a BaseException subclass) — consistent with the existing max_mem bail! path. Happy-path byte output is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lder Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the count-then-read double-decode bigWig write path with a single Rust entry point (
bigwig::write_track) that decodes each(region, sample)exactly once, streams memory-bounded batches to disk, and writesintervals.npy/offsets.npydirectly. Proven byte-identical to the legacy path before the old code was deleted.This is the Phase 4 bigWig slice of the Rust migration roadmap.
Approach (strangler-fig)
bigwig::write_trackopens each bigWig once per worker thread (thread-local cache), parallelizes over regions with rayon, decodes once, and writes the raw on-disk byte layout (intervals.npy= packed 12-byte(i32 start, i32 end, f32 value)records;offsets.npy=i64; region-major / sample-minor; native interval coords, only the query range clamped).bigwig_write_track+ Python dispatch behind env switchGVL_RUST_BIGWIG_WRITE.IntervalTracks (e.g. Table).Performance (synthetic chr21/chr22, 8 samples, density 0.05, 2000 regions × 5000 bp)
gvl.write()bigWig wall-clockBehavior change worth noting
ValueErrorat the Python layer.RuntimeError(a catchableException), consistent with the existingmax_memerror — not an uncatchablePanicException.Public API
Unchanged.
gvl.write/gvl.updatesignatures and defaults are identical;skills/genvarloader/SKILL.mdis intentionally not updated (no public-API change).Testing
cargo test --release: 6/6 (oracle proves happy-path bytes match the existingcount_intervals/intervalsfunctions).test_write_tracks_e2e.py,test_update.py,test_write_annot_bigwig.py(retains a legacy-vs-rust byte comparison for the annotation path), andtest_bigwig_write_binding.py.Follow-ups (non-blocking, deferred)
scripts/profile_bigwig_write.shstill A/B on the now-removedGVL_RUST_BIGWIG_WRITE/--impl(baselines were captured pre-flip) — simplify to profile-only._write_annot_track_rustconstructs a throwawayBigWigsjust to read.contigs(one extra header open).write_track's_sample_lessparam is unread (annotation collapsing relies on the caller passing a single pseudo-sample).🤖 Generated with Claude Code