Skip to content

Fix flaky TestQueryFrontendResponseSizeLimit: wait for ring convergence#7490

Open
yeya24 wants to merge 1 commit intocortexproject:masterfrom
yeya24:fix-flaky-test-56
Open

Fix flaky TestQueryFrontendResponseSizeLimit: wait for ring convergence#7490
yeya24 wants to merge 1 commit intocortexproject:masterfrom
yeya24:fix-flaky-test-56

Conversation

@yeya24
Copy link
Copy Markdown
Contributor

@yeya24 yeya24 commented May 8, 2026

What this PR does:

Wait for the distributor to discover the ingester in the ring before pushing samples in TestQueryFrontendResponseSizeLimit.

Why:

The test was flaky because StartAndWaitReady only ensures the HTTP endpoint is healthy, not that the distributor has discovered the ingester's tokens in the hash ring. This caused intermittent DoBatch: InstancesCount <= 0 errors (HTTP 500 on push).

Fix:

Added distributor.WaitSumMetrics(e2e.Equals(512), "cortex_ring_tokens_total") before the push loop, consistent with other integration tests in the same file (e.g., lines 323, 490, 583).

The test was flaky because it pushed samples before the distributor
discovered the ingester in the ring, causing 'InstancesCount <= 0'
errors. Add WaitSumMetrics on cortex_ring_tokens_total to ensure the
distributor sees the ingester's tokens before pushing.

Signed-off-by: Ben Ye <benye@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant