Skip to content

feat: configurable SetCloseTimeout for the shutdown flush#24

Merged
acoshift merged 1 commit into
mainfrom
feat/close-timeout
Jun 13, 2026
Merged

feat: configurable SetCloseTimeout for the shutdown flush#24
acoshift merged 1 commit into
mainfrom
feat/close-timeout

Conversation

@acoshift

Copy link
Copy Markdown
Member

"With care" item #5 from the review: cut shutdown data loss.

Problem

Close retried the final flush a fixed 5 times (~0.8s) then discarded the buffer tail. That's too short to ride out a rolling-restart 5xx blip, so records are silently lost on shutdown (fire-and-forget) or ReasonClosed-Nacked (tracked).

Change

The closing retry loop is now bounded by a deadline instead of a retry count. New SetCloseTimeout(d) (default CloseTimeout = 5s):

  • Close keeps re-attempting the final flush until it succeeds or the window elapses.
  • A permanent 4xx still exits immediately via the failure taxonomy (fix: classify permanent ingest failures; cut a per-flush allocation #21) — only transient failures consume the window.
  • Backoff stays jittered; Retry-After is not honored on this path so a server-requested delay can't stretch shutdown.
  • Records still undelivered when the window elapses are dropped: OnDiscard for fire-and-forget, settled ReasonClosed for tracked.

Honest bound

It bounds the retry window, not total Close time: each attempt makes a real request bounded by SetIngestTimeout, so an in-flight flush can overrun the window by up to that, and a record already mid-retry when Close is called can take up to ~2× the window. Documented on SetCloseTimeout/Close.

Default change

The effective close budget goes from ~0.8s → 5s, so Close blocks longer when the server is genuinely down (the point — it now rides out blips). Set a smaller value for fast shutdown.

Scrutinize

/scrutinize caught that the first draft's tests used IngestBatch (tracked), which idle-flushes before Close — so they exercised the indefinite-retry path, not the close deadline. Rewrote them with fire-and-forget Ingest (which sits in the buffer until Close), and tightened the doc that overpromised a hard time bound. Both fixed before this PR.

Tests

3 close tests: rides out a transient blip → delivered, not dropped; gives up bounded near the timeout → OnDiscard; tracked → ReasonClosed (existing, updated to a short timeout). Full suite green, go vet clean, race-clean ×2, close tests held ×5.

🤖 Generated with Claude Code

Close previously retried the final flush a fixed 5 times (~0.8s) then
discarded the buffer tail — too short to ride out a rolling-restart 5xx blip,
so records were silently lost on shutdown.

The closing retry loop is now bounded by a deadline instead of a retry count.
Add SetCloseTimeout (default CloseTimeout = 5s): Close keeps re-attempting the
final flush until it succeeds or the window elapses, then discards what's left
(OnDiscard for fire-and-forget, ReasonClosed for tracked). A permanent 4xx
still exits immediately via the failure taxonomy; only transient failures
consume the window. Backoff stays jittered and Retry-After is not honored on
this path so a server delay can't stretch shutdown.

It bounds the retry window, not total Close time: each attempt is a real
request bounded by SetIngestTimeout, documented accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@acoshift acoshift merged commit ddbc73d into main Jun 13, 2026
1 check passed
@acoshift acoshift deleted the feat/close-timeout branch June 13, 2026 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant