Skip to content

feat: single-pass streaming Rust bigWig write path#235

Merged
d-laub merged 15 commits into
mainfrom
bigwig-impl
Jun 20, 2026
Merged

feat: single-pass streaming Rust bigWig write path#235
d-laub merged 15 commits into
mainfrom
bigwig-impl

Conversation

@d-laub

@d-laub d-laub commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Replaces the count-then-read double-decode bigWig write path with a single Rust entry point (bigwig::write_track) that decodes each (region, sample) exactly once, streams memory-bounded batches to disk, and writes intervals.npy / offsets.npy directly. Proven byte-identical to the legacy path before the old code was deleted.

This is the Phase 4 bigWig slice of the Rust migration roadmap.

Approach (strangler-fig)

  1. Rust writerbigwig::write_track opens each bigWig once per worker thread (thread-local cache), parallelizes over regions with rayon, decodes once, and writes the raw on-disk byte layout (intervals.npy = packed 12-byte (i32 start, i32 end, f32 value) records; offsets.npy = i64; region-major / sample-minor; native interval coords, only the query range clamped).
  2. PyO3 binding bigwig_write_track + Python dispatch behind env switch GVL_RUST_BIGWIG_WRITE.
  3. Byte-identical parity gate — differential tests proved the new path produces output byte-for-byte identical to legacy for both per-sample tracks and annotation tracks on a synthetic chr21/chr22 corpus.
  4. Flip — made Rust the unconditional default and deleted the legacy bigWig orchestration, the env switch, and the transitional parity tests in the same PR. Legacy orchestration is retained only for non-BigWigs IntervalTracks (e.g. Table).

Performance (synthetic chr21/chr22, 8 samples, density 0.05, 2000 regions × 5000 bp)

Metric Legacy Rust Δ
gvl.write() bigWig wall-clock 1.502 s 0.801 s ~1.88× faster
Peak RSS 3.538 GB 3.386 GB −4% (dominated by numba/llvmlite JIT ~3.2 GB)
Total allocated 8.380 GB 6.004 GB ~28% less

Behavior change worth noting

  • Missing-contig handling improved. The legacy per-sample path silently skipped regions whose contig wasn't found (a latent partial-output bug); the new path raises a clean ValueError at the Python layer.
  • Error surface unified. bigWig I/O / contig-match failures inside the Rust writer now surface as RuntimeError (a catchable Exception), consistent with the existing max_mem error — not an uncatchable PanicException.

Public API

Unchanged. gvl.write / gvl.update signatures and defaults are identical; skills/genvarloader/SKILL.md is intentionally not updated (no public-API change).

Testing

  • Rust cargo test --release: 6/6 (oracle proves happy-path bytes match the existing count_intervals/intervals functions).
  • Full Python suite: 765 passed / 28 skipped / 4 xfailed.
  • Durable coverage of the now-default Rust path: the annot bigWig readback collapses wide intervals to value/span_length (realign_tracks=False) #233 annot-track readback snapshot, test_write_tracks_e2e.py, test_update.py, test_write_annot_bigwig.py (retains a legacy-vs-rust byte comparison for the annotation path), and test_bigwig_write_binding.py.

Follow-ups (non-blocking, deferred)

  • Bench driver / scripts/profile_bigwig_write.sh still A/B on the now-removed GVL_RUST_BIGWIG_WRITE / --impl (baselines were captured pre-flip) — simplify to profile-only.
  • _write_annot_track_rust constructs a throwaway BigWigs just to read .contigs (one extra header open).
  • write_track's _sample_less param is unread (annotation collapsing relies on the caller passing a single pseudo-sample).
  • Thread-local readers persist for the process lifetime (intentional cross-call cache).

🤖 Generated with Claude Code

d-laub and others added 15 commits June 19, 2026 11:51
Phase 4 (bigWig slice) of the Rust migration: replace the count+read
double-decode write path with a single-pass streaming Rust writer behind
a byte-identical parity gate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…itch

Flips GVL_RUST_BIGWIG_WRITE default on and removes it. Legacy orchestration
retained only for non-bigWig (Table) tracks. Roadmap Phase 4 bigWig slice updated
with baseline + after numbers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…or instead of panic

Convert four .expect() panic paths in write_track (open_file, exactly_one contig
match, get_interval, per-interval read) to anyhow error returns using .with_context()
and .map_err(). Failures now propagate as Err through the rayon closure and surface
to Python as RuntimeError (an Exception subclass) instead of PanicException
(a BaseException subclass) — consistent with the existing max_mem bail! path.
Happy-path byte output is unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lder

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@d-laub d-laub merged commit 3bcb3c6 into main Jun 20, 2026
13 of 14 checks passed
@d-laub d-laub deleted the bigwig-impl branch June 20, 2026 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant