fix(store): keep LanceDB vector ANN index fresh to stop brute-force latency regression#59
Merged
Merged
Conversation
…atency regression
Incremental indexing only flushed, so rows piled into an unindexed tail that
every vector query brute-forced. Brute-force scales linearly with corpus size
(~130ms at 20k chunks vs ~20ms with the ANN index), turning sub-50ms retrieval
into hundreds of ms once a watcher/MCP session grew the index — and if optimize()
never ran, there was no ANN index at all.
- Add LanceStore.maybe_reindex(): rebuilds the FTS + scalar + vector ANN indexes
when the unindexed tail grows past _ANN_REINDEX_TAIL (or no ANN index exists yet
at scale); cheap no-op otherwise. The indexer calls it on incremental passes so
the brute-forced tail can't grow unbounded.
- Track _ann_built from what is actually on disk (detected on open via
_vector_index_stats) and on every build, so a silently swallowed index-build
failure is observable via index_kind ("lancedb" vs "lancedb-ann") instead of
masquerading as an ANN index.
- Build scalar indexes on id/path so hydrate's `id IN (...)` and path deletes are
index lookups, not full-table scans that grow with the corpus.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Y1DfHPqxHppXF6zEYgFKi3
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Y1DfHPqxHppXF6zEYgFKi3
Neverdecel
added a commit
that referenced
this pull request
Jun 19, 2026
…#60) Adds a per-phase latency breakdown so a slow result is attributable, and warms the embedding model on HTTP API startup. - HybridSearcher.search() takes an optional timings dict (embed_ms/dense_ms/lexical_ms/hydrate_ms/rerank_ms); default None keeps existing callers unchanged. - The demo UI splits the speed badge into embed (model) vs store (vector + BM25 + hydrate). - run_server() now calls cr.warm() instead of cr.status(), so the first HTTP /search doesn't pay the cold model load. Follow-up to #59. The reported 207ms over 906 chunks is host-bound (shared demo CPU): warm embed ~3ms + store ~14ms on normal hardware.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Incremental indexing only flushed, so rows piled into an unindexed tail that
every vector query brute-forced. Brute-force scales linearly with corpus size
(~130ms at 20k chunks vs ~20ms with the ANN index), turning sub-50ms retrieval
into hundreds of ms once a watcher/MCP session grew the index — and if optimize()
never ran, there was no ANN index at all.
when the unindexed tail grows past _ANN_REINDEX_TAIL (or no ANN index exists yet
at scale); cheap no-op otherwise. The indexer calls it on incremental passes so
the brute-forced tail can't grow unbounded.
_vector_index_stats) and on every build, so a silently swallowed index-build
failure is observable via index_kind ("lancedb" vs "lancedb-ann") instead of
masquerading as an ANN index.
id IN (...)and path deletes areindex lookups, not full-table scans that grow with the corpus.
Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com
Claude-Session: https://claude.ai/code/session_01Y1DfHPqxHppXF6zEYgFKi3