feat(ui): embed-vs-store latency breakdown + warm HTTP API on startup#60
Merged
Conversation
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
The ⚡ badge timed the whole search() call, so a slow result couldn't be attributed. On a busy/throttled host the query embedding (model inference) usually dominates — not the LanceDB retrieval — but the single number hid that. - HybridSearcher.search() takes an optional timings dict and records per-phase latency (embed_ms / dense_ms / lexical_ms / hydrate_ms / rerank_ms). Default None keeps every existing caller unchanged. - The demo UI splits the badge into "embed" (model) vs "store" (vector + BM25 + hydrate) so a slow query is immediately attributable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Y1DfHPqxHppXF6zEYgFKi3
run_server() called cr.status(), which builds the provider/store but never embeds — so the first HTTP /search paid the full cold model load + ONNX JIT. run_ui() already calls cr.warm(); do the same here so the first request is fast. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Y1DfHPqxHppXF6zEYgFKi3
e090396 to
3dd014d
Compare
Neverdecel
added a commit
that referenced
this pull request
Jun 19, 2026
…61) warm() ran status() + embed_query(), so the store's vector/FTS/scalar indexes and LanceDB's query path stayed cold until the first real query — which then paid the full index-load cost (visible via #60's badge as a large store_ms: embed 26ms vs store 363ms over 548 chunks on the demo). Run one representative search() in warm() so the retrieval indexes are resident before the first user query. Local repro (~550 chunks): first-query store drops from ~35ms to ~14ms. Best-effort and guarded so warm-up can't block startup.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Follow-up to the LanceDB latency investigation (#59). A user reported 207 ms retrieval over only 906 chunks. Measured on normal hardware the full warm path at that scale is ~17 ms (≈3 ms query embedding + ≈14 ms LanceDB store), so the high number is host-bound (a shared/throttled demo box), not a code regression. To make that provable from the UI instead of guessed, this adds a per-phase latency breakdown — and fixes one real warm-up gap found along the way.
Changes
1. Embed-vs-store breakdown in the demo speed badge
HybridSearcher.search()now takes an optionaltimingsdict and records per-phase latency (embed_ms/dense_ms/lexical_ms/hydrate_ms/rerank_ms). DefaultNonekeeps every existing caller unchanged.2. Warm the embedding model on HTTP API startup (separate commit)
run_server()calledcr.status(), which builds the provider/store but never embeds — so the first HTTP/searchpaid the full cold model load + ONNX JIT.run_ui()already callscr.warm(); this does the same for the HTTP surface.Verification
embed X · store Y ms.🤖 Generated with Claude Code
Generated by Claude Code