feat(engine): static-frame dedup default-on + render telemetry#1549
Conversation
f8b6408 to
3c33e13
Compare
Skip re-seeking + re-screenshotting frames byte-identical to their predecessor. A frame is dedupable iff no GSAP tween or clip cut is active in it or its predecessor (predicted from window.__timelines + clip schedule) AND an empirical anchor-compare confirms it. Opt-in HF_STATIC_DEDUP=true, default off. Correctness (designed for the multi-worker / distributed render paths): - Reuse is keyed by the ABSOLUTE composition frame (derived from the frame's time), NOT the captureFrameCore frameIndex arg — chunked/parallel callers pass a chunk- relative index. Validated lossless (PSNR=inf) on both single- and multi-worker renders of a static-hold comp. - verifyStaticFramesSafe checks EVERY run (no longest-first budget truncation that left runs armed-but-unverified), and samples each run's FIRST reused frame, its END, and interior points at a stride; a hard cap disables dedup rather than trust an unverified set. - Conservative arming: skipped when capture mode != screenshot (BeginFrame tick semantics + the verifier's screenshot path wouldn't transfer), when a before-capture hook is set (per-frame video injection), when page-side compositing is active (shader / drawElement composite the plain verification screenshot can't reproduce), and when any data-start is a non-numeric reference expression the clip-boundary parser can't protect, or duration is unknown/zero. - Session reuse (prepareCaptureSessionForReuse) resets lastFrameBuffer + dedup counter so a probe/prior-render buffer can't bleed into the first static frame; the armed set is kept (same-composition reuse). Cost calibration bypasses dedup for its sparse, non-contiguous sample sweep, then restores the armed set. - HF_STATIC_DEDUP_SAMPLES is NaN-guarded. Disqualifies on signals the GSAP predictor can't see: video, canvas/webgl, zero tweens, running CSS/WAAPI animation. Pays on static-hold content (title cards, slideshow/kiosk loops, data-viz pauses); no-op on continuously-animated comps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3c33e13 to
6db3c07
Compare
Flip dedup from opt-in (HF_STATIC_DEDUP=true) to default-on (opt-out
HF_STATIC_DEDUP=false). Verification (verifyStaticFramesSafe) is the
safety net that keeps reuse sound at scale.
Add end-to-end dedup observability. The capture session records
enabled / armed / skipReason / predicted; these surface via
CapturePerfSummary -> a dedupPerfs accumulator (disk sequential +
parallel AND streaming sequential + parallel) -> aggregated into
RenderPerfSummary.staticDedup (OR armed, SUM frames across workers) ->
render_complete props static_dedup_{enabled,armed,skip_reason,
predicted_frames,reused_frames}. skip_reason is a low-cardinality code:
capture_mode | video_injection | page_composite | ineligible |
verification_failed.
Distributed chunks run on Linux/beginframe where dedup never arms, so
they pass a throwaway dedupPerfs sink (no per-chunk reporting).
Tests: aggregation logic (OR/SUM/skip-reason) + opt-out passthrough.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
miga-heygen
left a comment
There was a problem hiding this comment.
Review: feat(engine): static-frame dedup default-on + render telemetry
Vance, this is a really well-structured performance optimization. The layered safety model (prediction from GSAP timeline walker -> clip-boundary padding -> eligibility gates -> empirical anchor verification -> runtime reuse) is the kind of defense-in-depth that makes me confident shipping a default-on optimization. The telemetry plumbing is thorough — every capture path feeds the accumulator and the aggregation semantics (OR for booleans, SUM for counts, first-reported skipReason) are clean. Two new test files, both targeted at the right invariants (absolute vs relative index keying, aggregation algebra). Solid work.
That said, I found a few things worth discussing before merge:
1. Stale "opt-in" comments (low severity, but confusing)
The PR description and code title say "default-on" — the actual runtime code agrees (process.env.HF_STATIC_DEDUP !== "false"). But several JSDoc comments in the diff still say "opt-in HF_STATIC_DEDUP=true":
CaptureSession.staticFramesJSDoc: "Static-frame dedup (opt-inHF_STATIC_DEDUP=true)"armStaticDedupfunction JSDoc: "Arm static-frame dedup for this render (opt-in HF_STATIC_DEDUP=true, default off)."- The
// ── Static-frame dedup (opt-in HF_STATIC_DEDUP=true)section banner CapturePerfSummary.staticDedupReusedJSDoc: "(opt-in HF_STATIC_DEDUP)"
These read as contradicting the actual behavior. Someone reading the code in six months will be confused about whether they need to opt in or opt out. Quick find-replace to "opt-out HF_STATIC_DEDUP=false" would clean this up.
2. Math.round vs Math.floor rounding mismatch (no bug today, but fragile)
The dedup check in captureFrameCore uses:
const absFrameIndex = Math.round(time * fpsToNumber(options.fps));
Meanwhile quantizeTimeToFrame (used in prepareFrameForCapture) uses:
Math.floor(safeTime * safeFps + 1e-9)
And computeStaticFrameSet builds the animated-frame set using Math.floor(start * fps) / Math.ceil(end * fps).
Today this is safe because callers compute time = (absoluteIdx * fps.den) / fps.num, making the round-trip Math.round(i * den/num * num/den) = Math.round(i) = i exactly. But if any caller ever feeds a time that isn't exactly frame / fps — say, from a user-specified seek time or a floating-point-intermediate path — Math.round and Math.floor+epsilon would disagree on boundary values, causing the dedup check to hit when the actual seek lands on a different frame, or vice versa.
Consider using the same Math.floor(time * fps + 1e-9) idiom as quantizeTimeToFrame for consistency, so the dedup lookup is guaranteed to agree with the frame the seek actually lands on. Alternatively, a shared timeToAbsoluteFrame(time, fps) helper used by both paths would make the contract explicit.
3. Verification budget escape hatch is conservative but unobservable
When verifyStaticFramesSafe exceeds its hardCap budget, it returns the first frame of the current run as a "bad frame," which causes armStaticDedup to log verification_failed and set skipReason = "verification_failed". That's the safe direction, but the operator has no way to distinguish "verification actually failed (content drifted)" from "verification budget exhausted (too many runs to fully check)." If you see verification_failed spiking in PostHog, you won't know whether compositions are genuinely non-static or the sample budget just needs tuning.
Suggestion: use a distinct skip reason like "verification_budget_exhausted" for the hard-cap path, or include the distinction in the log line.
4. calibration bypass — lastFrameBuffer cleared but staticDedupCount is not
In captureCost.ts, the calibration sweep correctly saves/restores staticFrames and clears lastFrameBuffer after, but if any calibration frames somehow incremented staticDedupCount (they shouldn't, because staticFrames is set to undefined — but just for belt-and-suspenders), the count would carry into the real render's telemetry. Consider also resetting session.staticDedupCount = 0 in the finally block for symmetry, especially since prepareCaptureSessionForReuse already does exactly this.
5. getAnimations() timing assumption
computeStaticFrameSet calls document.getAnimations() during initializeSession to detect running CSS/WAAPI animations. But GSAP-driven compositions that add CSS animations dynamically during playback (e.g., a tween callback that adds a CSS class with @keyframes) won't have those animations registered at init time. The PR description already flags this as a known limitation ("the eligibility predictor reads getAnimations() at init"), and the verification step provides a safety net, so this is more of a "known risk to monitor" than a blocker. The follow-up PostHog dashboard should include arm-rate breakdowns by composition type to catch this in the wild.
6. Distributed chunk sink: empty dedupPerfs array
In renderChunk.ts, the distributed chunk path passes dedupPerfs: [] as a throwaway sink. This is fine since dedup never arms on Linux/beginframe, but the comment should probably note that this array is never read — it exists purely to satisfy the CaptureStageInput type. Currently the code reads like the array might be consumed downstream. Minor clarity nit.
Summary
The architecture is sound, the safety gates are well-layered, and the telemetry plumbing is complete. The main actionable items are (1) fixing the stale opt-in comments to match the actual default-on behavior, (2) aligning the Math.round vs Math.floor+epsilon frame-index computation for long-term robustness, and (3) adding a distinct skip reason for budget exhaustion vs. actual verification failure. None are ship-blockers given the verification safety net, but (2) is the kind of subtle invariant that bites when a future caller doesn't produce exact frame/fps times.
LGTM — ready for stamp.
— Miga
miguel-heygen
left a comment
There was a problem hiding this comment.
Verified the 1079-line diff end-to-end. Multi-gate safety chain (capture_mode → video_injection → page_composite → eligibility → anchor verification) is clean and fails closed at every step. Key correctness checks: absolute-frame indexing via time (not chunk-relative frameIndex) is locked by the regression test; calibration correctly bypasses dedup with save/restore; worker aggregation ORs armed + SUMs predicted/reused; session reuse clears lastFrameBuffer to prevent cross-render bleed. Telemetry flows cleanly from CaptureSession through RenderPerfSummary into PostHog. LGTM.
Review feedback (miga-heygen) + self-review fixes:
- Retry double-count: executeDiskCaptureWithAdaptiveRetry pushed worker
dedup perf inside the retry loop, so an adaptive retry counted frames
twice (reused/predicted could exceed totalFrames). Reset dedupPerfs at
the start of each attempt — retry now REPLACES rather than accumulates;
common no-retry path is unchanged.
- Opt-out parsing: HF_STATIC_DEDUP now disables on {false,0,off}
case/space-insensitive (was strict !== "false", so `False`/`0` silently
kept dedup on — the kill-switch could no-op).
- Verification budget vs drift: verifyStaticFramesSafe returns
{badFrame, budgetExhausted}; armStaticDedup reports a distinct
`verification_budget` skip reason so a telemetry spike means "raise
HF_STATIC_DEDUP_SAMPLES", not "compositions are non-static".
- Index idiom: captureFrameCore now uses Math.floor(time*fps + 1e-9)
(matches quantizeTimeToFrame) so the dedup lookup agrees with the frame
the seek lands on even for non-exact times.
- Stale "opt-in HF_STATIC_DEDUP=true" comments -> "opt-out
HF_STATIC_DEDUP=false" across frameCapture.ts + types.ts.
- Extract pushWorkerDedupPerfs helper (perfSummary.ts), used by the disk
and streaming parallel paths — removes the duplicated push loop and
drops captureStreamingStage back under the complexity threshold.
- dedupPerfs is now required (not optional) on
executeDiskCaptureWithAdaptiveRetry — a missing arg silently dropped
telemetry.
- Test: captureStreamingStage createInput() now provides the required
dedupPerfs field.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks @miga-heygen — all three addressed in 1. Stale "opt-in" comments → fixed. Find-replaced to "opt-out 2. 3. Verification budget unobservable → fixed. Bonus — retry double-count (found while re-reviewing): the worker-perf push sat inside the adaptive-retry loop in One thing I did not change: the per-render verification cost (computeStaticFrameSet + sample screenshots) now runs on every screenshot render by default. That's the deliberate cost of default-on; the new |
- Derivable state: drop session.staticDedupArmed/staticDedupPredicted;
derive both from session.staticFrames in getCapturePerfSummary
(armed ⟺ non-empty set, predicted === size) so they can't desync.
- Config altitude: HF_STATIC_DEDUP now resolves into
EngineConfig.staticFrameDedup (resolveConfig, opt-out on {false,0,off}),
alongside forceScreenshot/browserGpuMode — armStaticDedup reads config
instead of process.env. Default-on preserved (missing config → enabled).
- Lossy aggregation: aggregateDedup now reports DISTINCT skip reasons
(sorted, `|`-joined) across diverging unarmed workers instead of just
the first.
- discardWarmupCapture: also snapshot/restore staticDedupCount and
lastFrameBuffer so a warmup capture can't leak a phantom reuse or a
stale buffer anchor into the real summary.
- Convention: perfSummary-dedup.test builds its job via createRenderJob
instead of `as unknown as RenderJob`.
- Docs: verification_budget added to skip-reason lists.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

What
Static-frame dedup reuses byte-identical frames (no GSAP tween / clip cut active, predicted from
window.__timelinesand anchor-verified) instead of re-seeking + re-screenshotting them — skipping the seek + JPEG/PNG encode for those frames.This PR makes dedup default-on and adds end-to-end observability.
Changes
Default-on (opt-out)
HF_STATIC_DEDUP=true); now on by default, opt out withHF_STATIC_DEDUP=false.verifyStaticFramesSafe(anchor-compare sampling) is the safety net: dedup only arms after the predicted-static set is empirically verified against the plain screenshot path.screenshotcapture mode (i.e. macOS/Windows local renders). Linux/docker uses BeginFrame → dedup never arms there.Render telemetry
The capture session records
enabled/armed/skipReason/predicted. These flow:CaptureSession→CapturePerfSummary→ adedupPerfsaccumulator → aggregated intoRenderPerfSummary.staticDedup→render_completePostHog event.Coverage — all local capture paths feed the accumulator:
Aggregation across workers:
enabled/armed= OR;predictedFrames/reusedFrames= SUM;skipReason= first reported when unarmed.New
render_completeproperties:static_dedup_enabled(bool) — adoptionstatic_dedup_armed(bool) — passed every gate + verificationstatic_dedup_skip_reason(string) —capture_mode|video_injection|page_composite|ineligible|verification_failedstatic_dedup_predicted_frames(number)static_dedup_reused_frames(number) — reuse % = reused /total_framesRisk
The eligibility predictor reads
getAnimations()at init, so it can't see animations driven by playback-time CSS or rAF not yet registered — those frames could be mis-marked static → silent visual damage.verifyStaticFramesSafe(frame sampling) is the only guard, and sampling is not exhaustive; production has no PSNR check. Recommend watching arm-rates + reuse via the new telemetry and running a broad PSNR eval post-merge.Testing
perfSummary-dedup.test.tsframeCapture-staticDedupIndex.test.tsFollow-ups
🤖 Generated with Claude Code