Skip to content

fix(store): keep LanceDB vector ANN index fresh to stop brute-force latency regression#59

Merged
Neverdecel merged 2 commits into
masterfrom
claude/lancedb-latency-regression-hb57dw
Jun 19, 2026
Merged

fix(store): keep LanceDB vector ANN index fresh to stop brute-force latency regression#59
Neverdecel merged 2 commits into
masterfrom
claude/lancedb-latency-regression-hb57dw

Conversation

@Neverdecel

Copy link
Copy Markdown
Owner

Incremental indexing only flushed, so rows piled into an unindexed tail that
every vector query brute-forced. Brute-force scales linearly with corpus size
(~130ms at 20k chunks vs ~20ms with the ANN index), turning sub-50ms retrieval
into hundreds of ms once a watcher/MCP session grew the index — and if optimize()
never ran, there was no ANN index at all.

  • Add LanceStore.maybe_reindex(): rebuilds the FTS + scalar + vector ANN indexes
    when the unindexed tail grows past _ANN_REINDEX_TAIL (or no ANN index exists yet
    at scale); cheap no-op otherwise. The indexer calls it on incremental passes so
    the brute-forced tail can't grow unbounded.
  • Track _ann_built from what is actually on disk (detected on open via
    _vector_index_stats) and on every build, so a silently swallowed index-build
    failure is observable via index_kind ("lancedb" vs "lancedb-ann") instead of
    masquerading as an ANN index.
  • Build scalar indexes on id/path so hydrate's id IN (...) and path deletes are
    index lookups, not full-table scans that grow with the corpus.

Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com
Claude-Session: https://claude.ai/code/session_01Y1DfHPqxHppXF6zEYgFKi3

…atency regression

Incremental indexing only flushed, so rows piled into an unindexed tail that
every vector query brute-forced. Brute-force scales linearly with corpus size
(~130ms at 20k chunks vs ~20ms with the ANN index), turning sub-50ms retrieval
into hundreds of ms once a watcher/MCP session grew the index — and if optimize()
never ran, there was no ANN index at all.

- Add LanceStore.maybe_reindex(): rebuilds the FTS + scalar + vector ANN indexes
  when the unindexed tail grows past _ANN_REINDEX_TAIL (or no ANN index exists yet
  at scale); cheap no-op otherwise. The indexer calls it on incremental passes so
  the brute-forced tail can't grow unbounded.
- Track _ann_built from what is actually on disk (detected on open via
  _vector_index_stats) and on every build, so a silently swallowed index-build
  failure is observable via index_kind ("lancedb" vs "lancedb-ann") instead of
  masquerading as an ANN index.
- Build scalar indexes on id/path so hydrate's `id IN (...)` and path deletes are
  index lookups, not full-table scans that grow with the corpus.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Y1DfHPqxHppXF6zEYgFKi3
@codecov-commenter

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 98.21429% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
coderag/store/lance_store.py 98.14% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Y1DfHPqxHppXF6zEYgFKi3
@Neverdecel Neverdecel merged commit daa545d into master Jun 19, 2026
13 checks passed
Neverdecel added a commit that referenced this pull request Jun 19, 2026
…#60)

Adds a per-phase latency breakdown so a slow result is attributable, and warms the embedding model on HTTP API startup.

- HybridSearcher.search() takes an optional timings dict (embed_ms/dense_ms/lexical_ms/hydrate_ms/rerank_ms); default None keeps existing callers unchanged.
- The demo UI splits the speed badge into embed (model) vs store (vector + BM25 + hydrate).
- run_server() now calls cr.warm() instead of cr.status(), so the first HTTP /search doesn't pay the cold model load.

Follow-up to #59. The reported 207ms over 906 chunks is host-bound (shared demo CPU): warm embed ~3ms + store ~14ms on normal hardware.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants