feat(ui): embed-vs-store latency breakdown + warm HTTP API on startup by Neverdecel · Pull Request #60 · Neverdecel/CodeRAG

Neverdecel · 2026-06-19T10:31:24Z

Context

Follow-up to the LanceDB latency investigation (#59). A user reported 207 ms retrieval over only 906 chunks. Measured on normal hardware the full warm path at that scale is ~17 ms (≈3 ms query embedding + ≈14 ms LanceDB store), so the high number is host-bound (a shared/throttled demo box), not a code regression. To make that provable from the UI instead of guessed, this adds a per-phase latency breakdown — and fixes one real warm-up gap found along the way.

Changes

1. Embed-vs-store breakdown in the demo speed badge

HybridSearcher.search() now takes an optional timings dict and records per-phase latency (embed_ms / dense_ms / lexical_ms / hydrate_ms / rerank_ms). Default None keeps every existing caller unchanged.
The demo UI splits the ⚡ badge into embed (model inference) vs store (vector + BM25 + hydrate), so a slow query is immediately attributable. On a busy host the embedding usually dominates — the single number hid that.

2. Warm the embedding model on HTTP API startup (separate commit)

run_server() called cr.status(), which builds the provider/store but never embeds — so the first HTTP /search paid the full cold model load + ONNX JIT. run_ui() already calls cr.warm(); this does the same for the HTTP surface.

Verification

Rendered the demo UI end-to-end (TestClient + fake provider): badge shows embed X · store Y ms.
All retrieval / store / indexer / webui / surfaces tests pass; lint + format clean.
Measurements backing the diagnosis: bge-small warm embed ≈2.5 ms; full LanceDB store path at 906 rows ≈14 ms (median).

🤖 Generated with Claude Code

Generated by Claude Code

codecov-commenter · 2026-06-19T10:32:49Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 48.00000% with 13 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
coderag/surfaces/webui.py	0.00%	6 Missing ⚠️
coderag/retrieval/search.py	70.58%	5 Missing ⚠️
coderag/surfaces/http_api.py	0.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

The ⚡ badge timed the whole search() call, so a slow result couldn't be attributed. On a busy/throttled host the query embedding (model inference) usually dominates — not the LanceDB retrieval — but the single number hid that. - HybridSearcher.search() takes an optional timings dict and records per-phase latency (embed_ms / dense_ms / lexical_ms / hydrate_ms / rerank_ms). Default None keeps every existing caller unchanged. - The demo UI splits the badge into "embed" (model) vs "store" (vector + BM25 + hydrate) so a slow query is immediately attributable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Y1DfHPqxHppXF6zEYgFKi3

run_server() called cr.status(), which builds the provider/store but never embeds — so the first HTTP /search paid the full cold model load + ONNX JIT. run_ui() already calls cr.warm(); do the same here so the first request is fast. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Y1DfHPqxHppXF6zEYgFKi3

…61) warm() ran status() + embed_query(), so the store's vector/FTS/scalar indexes and LanceDB's query path stayed cold until the first real query — which then paid the full index-load cost (visible via #60's badge as a large store_ms: embed 26ms vs store 363ms over 548 chunks on the demo). Run one representative search() in warm() so the retrieval indexes are resident before the first user query. Local repro (~550 chunks): first-query store drops from ~35ms to ~14ms. Best-effort and guarded so warm-up can't block startup.

claude added 2 commits June 19, 2026 10:36

Neverdecel force-pushed the claude/retrieval-timing-breakdown branch from e090396 to 3dd014d Compare June 19, 2026 10:36

Neverdecel merged commit 895f9d4 into master Jun 19, 2026
13 checks passed

Neverdecel mentioned this pull request Jun 19, 2026

perf(warm): warm the full search path at startup, not just the model #61

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ui): embed-vs-store latency breakdown + warm HTTP API on startup#60

feat(ui): embed-vs-store latency breakdown + warm HTTP API on startup#60
Neverdecel merged 2 commits into
masterfrom
claude/retrieval-timing-breakdown

Neverdecel commented Jun 19, 2026

Uh oh!

codecov-commenter commented Jun 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Neverdecel commented Jun 19, 2026

Context

Changes

Verification

Uh oh!

codecov-commenter commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Jun 19, 2026 •

edited

Loading