Skip to content

feat(engine): static-frame dedup default-on + render telemetry#1549

Merged
vanceingalls merged 4 commits into
mainfrom
06-17-feat_engine_static-frame_dedup_for_screenshot_capture_opt-in_
Jun 18, 2026
Merged

feat(engine): static-frame dedup default-on + render telemetry#1549
vanceingalls merged 4 commits into
mainfrom
06-17-feat_engine_static-frame_dedup_for_screenshot_capture_opt-in_

Conversation

@vanceingalls

@vanceingalls vanceingalls commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

What

Static-frame dedup reuses byte-identical frames (no GSAP tween / clip cut active, predicted from window.__timelines and anchor-verified) instead of re-seeking + re-screenshotting them — skipping the seek + JPEG/PNG encode for those frames.

This PR makes dedup default-on and adds end-to-end observability.

Changes

Default-on (opt-out)

  • Was opt-in (HF_STATIC_DEDUP=true); now on by default, opt out with HF_STATIC_DEDUP=false.
  • verifyStaticFramesSafe (anchor-compare sampling) is the safety net: dedup only arms after the predicted-static set is empirically verified against the plain screenshot path.
  • Arms only in screenshot capture mode (i.e. macOS/Windows local renders). Linux/docker uses BeginFrame → dedup never arms there.

Render telemetry

The capture session records enabled / armed / skipReason / predicted. These flow:

CaptureSessionCapturePerfSummary → a dedupPerfs accumulator → aggregated into RenderPerfSummary.staticDeduprender_complete PostHog event.

Coverage — all local capture paths feed the accumulator:

  • disk capture: sequential + parallel
  • streaming encode: sequential + parallel
  • distributed chunks (Linux/beginframe, never arms): throwaway sink, no per-chunk reporting

Aggregation across workers: enabled/armed = OR; predictedFrames/reusedFrames = SUM; skipReason = first reported when unarmed.

New render_complete properties:

  • static_dedup_enabled (bool) — adoption
  • static_dedup_armed (bool) — passed every gate + verification
  • static_dedup_skip_reason (string) — capture_mode | video_injection | page_composite | ineligible | verification_failed
  • static_dedup_predicted_frames (number)
  • static_dedup_reused_frames (number) — reuse % = reused / total_frames

Risk

The eligibility predictor reads getAnimations() at init, so it can't see animations driven by playback-time CSS or rAF not yet registered — those frames could be mis-marked static → silent visual damage. verifyStaticFramesSafe (frame sampling) is the only guard, and sampling is not exhaustive; production has no PSNR check. Recommend watching arm-rates + reuse via the new telemetry and running a broad PSNR eval post-merge.

Testing

  • tsc clean across engine / producer / cli
  • Aggregation logic (OR/SUM/skip-reason + opt-out passthrough) — perfSummary-dedup.test.ts
  • Reuse-index regression (absolute-frame keying) — frameCapture-staticDedupIndex.test.ts
  • lint + format clean

Follow-ups

  • Dashboard dedup tiles in PostHog (adoption, arm-rate, skip-reason breakdown, reuse-% distribution, speed armed-vs-not) — buildable once the props land in PostHog after ship.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@vanceingalls vanceingalls force-pushed the 06-17-feat_engine_static-frame_dedup_for_screenshot_capture_opt-in_ branch 2 times, most recently from f8b6408 to 3c33e13 Compare June 18, 2026 06:12
Skip re-seeking + re-screenshotting frames byte-identical to their predecessor. A
frame is dedupable iff no GSAP tween or clip cut is active in it or its predecessor
(predicted from window.__timelines + clip schedule) AND an empirical anchor-compare
confirms it. Opt-in HF_STATIC_DEDUP=true, default off.

Correctness (designed for the multi-worker / distributed render paths):
- Reuse is keyed by the ABSOLUTE composition frame (derived from the frame's time),
  NOT the captureFrameCore frameIndex arg — chunked/parallel callers pass a chunk-
  relative index. Validated lossless (PSNR=inf) on both single- and multi-worker
  renders of a static-hold comp.
- verifyStaticFramesSafe checks EVERY run (no longest-first budget truncation that
  left runs armed-but-unverified), and samples each run's FIRST reused frame, its END,
  and interior points at a stride; a hard cap disables dedup rather than trust an
  unverified set.
- Conservative arming: skipped when capture mode != screenshot (BeginFrame tick
  semantics + the verifier's screenshot path wouldn't transfer), when a before-capture
  hook is set (per-frame video injection), when page-side compositing is active (shader
  / drawElement composite the plain verification screenshot can't reproduce), and when
  any data-start is a non-numeric reference expression the clip-boundary parser can't
  protect, or duration is unknown/zero.
- Session reuse (prepareCaptureSessionForReuse) resets lastFrameBuffer + dedup counter
  so a probe/prior-render buffer can't bleed into the first static frame; the armed set
  is kept (same-composition reuse). Cost calibration bypasses dedup for its sparse,
  non-contiguous sample sweep, then restores the armed set.
- HF_STATIC_DEDUP_SAMPLES is NaN-guarded.

Disqualifies on signals the GSAP predictor can't see: video, canvas/webgl, zero
tweens, running CSS/WAAPI animation. Pays on static-hold content (title cards,
slideshow/kiosk loops, data-viz pauses); no-op on continuously-animated comps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vanceingalls vanceingalls force-pushed the 06-17-feat_engine_static-frame_dedup_for_screenshot_capture_opt-in_ branch from 3c33e13 to 6db3c07 Compare June 18, 2026 06:20
Flip dedup from opt-in (HF_STATIC_DEDUP=true) to default-on (opt-out
HF_STATIC_DEDUP=false). Verification (verifyStaticFramesSafe) is the
safety net that keeps reuse sound at scale.

Add end-to-end dedup observability. The capture session records
enabled / armed / skipReason / predicted; these surface via
CapturePerfSummary -> a dedupPerfs accumulator (disk sequential +
parallel AND streaming sequential + parallel) -> aggregated into
RenderPerfSummary.staticDedup (OR armed, SUM frames across workers) ->
render_complete props static_dedup_{enabled,armed,skip_reason,
predicted_frames,reused_frames}. skip_reason is a low-cardinality code:
capture_mode | video_injection | page_composite | ineligible |
verification_failed.

Distributed chunks run on Linux/beginframe where dedup never arms, so
they pass a throwaway dedupPerfs sink (no per-chunk reporting).

Tests: aggregation logic (OR/SUM/skip-reason) + opt-out passthrough.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vanceingalls vanceingalls changed the title feat(engine): static-frame dedup for screenshot capture (opt-in) feat(engine): static-frame dedup default-on + render telemetry Jun 18, 2026

@miga-heygen miga-heygen left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(engine): static-frame dedup default-on + render telemetry

Vance, this is a really well-structured performance optimization. The layered safety model (prediction from GSAP timeline walker -> clip-boundary padding -> eligibility gates -> empirical anchor verification -> runtime reuse) is the kind of defense-in-depth that makes me confident shipping a default-on optimization. The telemetry plumbing is thorough — every capture path feeds the accumulator and the aggregation semantics (OR for booleans, SUM for counts, first-reported skipReason) are clean. Two new test files, both targeted at the right invariants (absolute vs relative index keying, aggregation algebra). Solid work.

That said, I found a few things worth discussing before merge:


1. Stale "opt-in" comments (low severity, but confusing)

The PR description and code title say "default-on" — the actual runtime code agrees (process.env.HF_STATIC_DEDUP !== "false"). But several JSDoc comments in the diff still say "opt-in HF_STATIC_DEDUP=true":

  • CaptureSession.staticFrames JSDoc: "Static-frame dedup (opt-in HF_STATIC_DEDUP=true)"
  • armStaticDedup function JSDoc: "Arm static-frame dedup for this render (opt-in HF_STATIC_DEDUP=true, default off)."
  • The // ── Static-frame dedup (opt-in HF_STATIC_DEDUP=true) section banner
  • CapturePerfSummary.staticDedupReused JSDoc: "(opt-in HF_STATIC_DEDUP)"

These read as contradicting the actual behavior. Someone reading the code in six months will be confused about whether they need to opt in or opt out. Quick find-replace to "opt-out HF_STATIC_DEDUP=false" would clean this up.

2. Math.round vs Math.floor rounding mismatch (no bug today, but fragile)

The dedup check in captureFrameCore uses:

const absFrameIndex = Math.round(time * fpsToNumber(options.fps));

Meanwhile quantizeTimeToFrame (used in prepareFrameForCapture) uses:

Math.floor(safeTime * safeFps + 1e-9)

And computeStaticFrameSet builds the animated-frame set using Math.floor(start * fps) / Math.ceil(end * fps).

Today this is safe because callers compute time = (absoluteIdx * fps.den) / fps.num, making the round-trip Math.round(i * den/num * num/den) = Math.round(i) = i exactly. But if any caller ever feeds a time that isn't exactly frame / fps — say, from a user-specified seek time or a floating-point-intermediate path — Math.round and Math.floor+epsilon would disagree on boundary values, causing the dedup check to hit when the actual seek lands on a different frame, or vice versa.

Consider using the same Math.floor(time * fps + 1e-9) idiom as quantizeTimeToFrame for consistency, so the dedup lookup is guaranteed to agree with the frame the seek actually lands on. Alternatively, a shared timeToAbsoluteFrame(time, fps) helper used by both paths would make the contract explicit.

3. Verification budget escape hatch is conservative but unobservable

When verifyStaticFramesSafe exceeds its hardCap budget, it returns the first frame of the current run as a "bad frame," which causes armStaticDedup to log verification_failed and set skipReason = "verification_failed". That's the safe direction, but the operator has no way to distinguish "verification actually failed (content drifted)" from "verification budget exhausted (too many runs to fully check)." If you see verification_failed spiking in PostHog, you won't know whether compositions are genuinely non-static or the sample budget just needs tuning.

Suggestion: use a distinct skip reason like "verification_budget_exhausted" for the hard-cap path, or include the distinction in the log line.

4. calibration bypass — lastFrameBuffer cleared but staticDedupCount is not

In captureCost.ts, the calibration sweep correctly saves/restores staticFrames and clears lastFrameBuffer after, but if any calibration frames somehow incremented staticDedupCount (they shouldn't, because staticFrames is set to undefined — but just for belt-and-suspenders), the count would carry into the real render's telemetry. Consider also resetting session.staticDedupCount = 0 in the finally block for symmetry, especially since prepareCaptureSessionForReuse already does exactly this.

5. getAnimations() timing assumption

computeStaticFrameSet calls document.getAnimations() during initializeSession to detect running CSS/WAAPI animations. But GSAP-driven compositions that add CSS animations dynamically during playback (e.g., a tween callback that adds a CSS class with @keyframes) won't have those animations registered at init time. The PR description already flags this as a known limitation ("the eligibility predictor reads getAnimations() at init"), and the verification step provides a safety net, so this is more of a "known risk to monitor" than a blocker. The follow-up PostHog dashboard should include arm-rate breakdowns by composition type to catch this in the wild.

6. Distributed chunk sink: empty dedupPerfs array

In renderChunk.ts, the distributed chunk path passes dedupPerfs: [] as a throwaway sink. This is fine since dedup never arms on Linux/beginframe, but the comment should probably note that this array is never read — it exists purely to satisfy the CaptureStageInput type. Currently the code reads like the array might be consumed downstream. Minor clarity nit.


Summary

The architecture is sound, the safety gates are well-layered, and the telemetry plumbing is complete. The main actionable items are (1) fixing the stale opt-in comments to match the actual default-on behavior, (2) aligning the Math.round vs Math.floor+epsilon frame-index computation for long-term robustness, and (3) adding a distinct skip reason for budget exhaustion vs. actual verification failure. None are ship-blockers given the verification safety net, but (2) is the kind of subtle invariant that bites when a future caller doesn't produce exact frame/fps times.

LGTM — ready for stamp.

— Miga

@miguel-heygen miguel-heygen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified the 1079-line diff end-to-end. Multi-gate safety chain (capture_mode → video_injection → page_composite → eligibility → anchor verification) is clean and fails closed at every step. Key correctness checks: absolute-frame indexing via time (not chunk-relative frameIndex) is locked by the regression test; calibration correctly bypasses dedup with save/restore; worker aggregation ORs armed + SUMs predicted/reused; session reuse clears lastFrameBuffer to prevent cross-render bleed. Telemetry flows cleanly from CaptureSession through RenderPerfSummary into PostHog. LGTM.

Review feedback (miga-heygen) + self-review fixes:

- Retry double-count: executeDiskCaptureWithAdaptiveRetry pushed worker
  dedup perf inside the retry loop, so an adaptive retry counted frames
  twice (reused/predicted could exceed totalFrames). Reset dedupPerfs at
  the start of each attempt — retry now REPLACES rather than accumulates;
  common no-retry path is unchanged.
- Opt-out parsing: HF_STATIC_DEDUP now disables on {false,0,off}
  case/space-insensitive (was strict !== "false", so `False`/`0` silently
  kept dedup on — the kill-switch could no-op).
- Verification budget vs drift: verifyStaticFramesSafe returns
  {badFrame, budgetExhausted}; armStaticDedup reports a distinct
  `verification_budget` skip reason so a telemetry spike means "raise
  HF_STATIC_DEDUP_SAMPLES", not "compositions are non-static".
- Index idiom: captureFrameCore now uses Math.floor(time*fps + 1e-9)
  (matches quantizeTimeToFrame) so the dedup lookup agrees with the frame
  the seek lands on even for non-exact times.
- Stale "opt-in HF_STATIC_DEDUP=true" comments -> "opt-out
  HF_STATIC_DEDUP=false" across frameCapture.ts + types.ts.
- Extract pushWorkerDedupPerfs helper (perfSummary.ts), used by the disk
  and streaming parallel paths — removes the duplicated push loop and
  drops captureStreamingStage back under the complexity threshold.
- dedupPerfs is now required (not optional) on
  executeDiskCaptureWithAdaptiveRetry — a missing arg silently dropped
  telemetry.
- Test: captureStreamingStage createInput() now provides the required
  dedupPerfs field.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vanceingalls

Copy link
Copy Markdown
Collaborator Author

Thanks @miga-heygen — all three addressed in a55cb7480, plus a real bug the review prompted me to find:

1. Stale "opt-in" comments → fixed. Find-replaced to "opt-out HF_STATIC_DEDUP=false" across CaptureSession.staticFrames, armStaticDedup JSDoc, the section banner, and CapturePerfSummary.staticDedupReused.

2. Math.round vs Math.floor+ε → fixed. captureFrameCore now uses Math.floor(time * fps + 1e-9), matching quantizeTimeToFrame, so the dedup lookup agrees with the frame the seek actually lands on even if time isn't exactly i/fps. (Regression test still green — exact-frame times floor identically.)

3. Verification budget unobservable → fixed. verifyStaticFramesSafe now returns { badFrame, budgetExhausted }, and armStaticDedup records a distinct verification_budget skip reason (vs verification_failed). A spike in PostHog now tells you "raise HF_STATIC_DEDUP_SAMPLES" rather than "compositions are non-static."

Bonus — retry double-count (found while re-reviewing): the worker-perf push sat inside the adaptive-retry loop in executeDiskCaptureWithAdaptiveRetry, so a retry counted frames twice and could push reused/predicted past totalFrames. Now resets per attempt (retry replaces; no-retry path unchanged). Also: opt-out parsing now accepts {false,0,off} case-insensitively, dedupPerfs made required (was silently droppable), and the duplicated push loop extracted to pushWorkerDedupPerfs (which also drops runCaptureStreamingStage back under the complexity threshold).

One thing I did not change: the per-render verification cost (computeStaticFrameSet + sample screenshots) now runs on every screenshot render by default. That's the deliberate cost of default-on; the new static_dedup_armed / skip_reason telemetry is there to watch it, and I'd suggest a broad PSNR eval before we lean on it heavily.

- Derivable state: drop session.staticDedupArmed/staticDedupPredicted;
  derive both from session.staticFrames in getCapturePerfSummary
  (armed ⟺ non-empty set, predicted === size) so they can't desync.
- Config altitude: HF_STATIC_DEDUP now resolves into
  EngineConfig.staticFrameDedup (resolveConfig, opt-out on {false,0,off}),
  alongside forceScreenshot/browserGpuMode — armStaticDedup reads config
  instead of process.env. Default-on preserved (missing config → enabled).
- Lossy aggregation: aggregateDedup now reports DISTINCT skip reasons
  (sorted, `|`-joined) across diverging unarmed workers instead of just
  the first.
- discardWarmupCapture: also snapshot/restore staticDedupCount and
  lastFrameBuffer so a warmup capture can't leak a phantom reuse or a
  stale buffer anchor into the real summary.
- Convention: perfSummary-dedup.test builds its job via createRenderJob
  instead of `as unknown as RenderJob`.
- Docs: verification_budget added to skip-reason lists.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vanceingalls vanceingalls merged commit 7310223 into main Jun 18, 2026
43 checks passed
@vanceingalls vanceingalls deleted the 06-17-feat_engine_static-frame_dedup_for_screenshot_capture_opt-in_ branch June 18, 2026 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants