feat: configurable SetCloseTimeout for the shutdown flush#24
Merged
Conversation
Close previously retried the final flush a fixed 5 times (~0.8s) then discarded the buffer tail — too short to ride out a rolling-restart 5xx blip, so records were silently lost on shutdown. The closing retry loop is now bounded by a deadline instead of a retry count. Add SetCloseTimeout (default CloseTimeout = 5s): Close keeps re-attempting the final flush until it succeeds or the window elapses, then discards what's left (OnDiscard for fire-and-forget, ReasonClosed for tracked). A permanent 4xx still exits immediately via the failure taxonomy; only transient failures consume the window. Backoff stays jittered and Retry-After is not honored on this path so a server delay can't stretch shutdown. It bounds the retry window, not total Close time: each attempt is a real request bounded by SetIngestTimeout, documented accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
"With care" item #5 from the review: cut shutdown data loss.
Problem
Closeretried the final flush a fixed 5 times (~0.8s) then discarded the buffer tail. That's too short to ride out a rolling-restart 5xx blip, so records are silently lost on shutdown (fire-and-forget) orReasonClosed-Nacked (tracked).Change
The closing retry loop is now bounded by a deadline instead of a retry count. New
SetCloseTimeout(d)(defaultCloseTimeout = 5s):Closekeeps re-attempting the final flush until it succeeds or the window elapses.Retry-Afteris not honored on this path so a server-requested delay can't stretch shutdown.OnDiscardfor fire-and-forget, settledReasonClosedfor tracked.Honest bound
It bounds the retry window, not total Close time: each attempt makes a real request bounded by
SetIngestTimeout, so an in-flight flush can overrun the window by up to that, and a record already mid-retry whenCloseis called can take up to ~2× the window. Documented onSetCloseTimeout/Close.Default change
The effective close budget goes from ~0.8s → 5s, so
Closeblocks longer when the server is genuinely down (the point — it now rides out blips). Set a smaller value for fast shutdown.Scrutinize
/scrutinizecaught that the first draft's tests usedIngestBatch(tracked), which idle-flushes beforeClose— so they exercised the indefinite-retry path, not the close deadline. Rewrote them with fire-and-forgetIngest(which sits in the buffer untilClose), and tightened the doc that overpromised a hard time bound. Both fixed before this PR.Tests
3 close tests: rides out a transient blip → delivered, not dropped; gives up bounded near the timeout →
OnDiscard; tracked →ReasonClosed(existing, updated to a short timeout). Full suite green,go vetclean, race-clean ×2, close tests held ×5.🤖 Generated with Claude Code