diff --git a/AGENTS.md b/AGENTS.md
index 15950e3..24756bb 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -5,7 +5,7 @@
 - `coderag/config.py`, `coderag/types.py`: Immutable `Config` and shared dataclasses.
 - `coderag/embeddings/`: `EmbeddingProvider` protocol + `fastembed` (default), `openai`, `fake`.
 - `coderag/chunking/`: Symbol-aware chunking (`python_ast.py`, `treesitter.py`, line-window `base.py`).
-- `coderag/store/`: `sqlite_store.py` (source of truth + FTS5) and `vector_index.py` (FAISS Flat/IVF cache).
+- `coderag/store/`: `lance_store.py` — a single embedded LanceDB store (chunk metadata, BM25, and vectors).
 - `coderag/retrieval/`: Hybrid dense + BM25 search fused with RRF.
 - `coderag/indexer.py`, `coderag/watch.py`: Incremental indexing and the debounced watcher.
 - `coderag/_ignore.py`: Shared ignore-glob matching used by both the indexer and `fs_search`.
@@ -29,10 +29,10 @@
 - First-party module is `coderag`; surfaces must stay thin — no engine logic in `surfaces/`.
 
 ## Architecture Invariants
-- SQLite is the source of truth; the FAISS index is a rebuildable cache (`rebuild_from_store`).
-- `chunks.id` is the FAISS id and is `AUTOINCREMENT` (ids never reused).
-- Incremental indexing is delete-before-add (no duplicate/stale vectors); unchanged files skip via content hash.
-- Embedding dimension comes from the provider, not a constant; a model change triggers a rebuild.
+- One embedded LanceDB store holds metadata + BM25 + vectors; it's rebuildable by re-indexing from source.
+- `chunks.id` is a store-managed integer id used as the fusion/hydrate key.
+- Incremental indexing is delete-before-add (no duplicate/stale rows); unchanged files skip via size+mtime then content hash.
+- Embedding dimension comes from the provider, not a constant; a model change clears the store for a clean re-index.
 
 ## Testing Guidelines
 - Place tests in `tests/` as `test_*.py`; keep them deterministic and offline (use the `fake` provider fixture).
diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md
index 3be3ecc..3106627 100644
--- a/DEVELOPMENT.md
+++ b/DEVELOPMENT.md
@@ -26,29 +26,28 @@ coderag/
 ├── llm.py              # Optional streamed LLM answer over retrieved chunks
 ├── embeddings/         # EmbeddingProvider protocol + fastembed / openai / fake
 ├── chunking/           # Symbol-aware chunking: python_ast, treesitter, line-window base
-├── store/              # SQLite source of truth + pluggable FAISS vector index
-│   ├── sqlite_store.py #   files/chunks/vectors + FTS5 lexical search
-│   └── vector_index.py #   FaissVectorIndex: Flat (exact) / IVF (scale)
+├── store/              # Single embedded LanceDB store
+│   └── lance_store.py  #   files/chunks + BM25 (FTS) + vectors (ANN) in one place
 ├── retrieval/          # Hybrid search: dense + BM25, fused with RRF
 └── surfaces/           # cli.py · http_api.py (FastAPI) · webui.py · mcp_server.py (MCP)
 ```
 
 ### Design invariants (don't break these)
 
-- **SQLite is the source of truth; FAISS is a rebuildable cache.** Vectors are stored as
-  BLOBs in SQLite, so `FaissVectorIndex.rebuild_from_store()` can always reconstruct the
-  index. `ensure_consistent()` does this automatically when counts disagree.
-- **`chunks.id` is the FAISS id and is `AUTOINCREMENT`** — ids are never reused, which keeps
-  a stale cache from resurrecting deleted content.
-- **Delete-before-add.** A changed file's old chunks are removed from both SQLite and FAISS
-  before new ones are added (`Indexer._write`). This is the bug the old `monitor.py` had.
+- **One LanceDB store holds everything** (chunk metadata, text/BM25, and vectors/ANN). It is
+  rebuildable from source: re-indexing recreates it, and a `--full` pass clears and rebuilds.
+- **`chunks.id` is a store-managed integer id** used as the fusion/hydrate key; ids are not
+  reused within a run.
+- **Delete-before-add.** A changed file's old rows are removed before new ones are added
+  (`Indexer._write` → `LanceStore.write_file(replace=True)`), so editing never accumulates
+  stale or duplicate rows.
 - **The embedding dimension comes from the provider**, never a hard-coded constant. A model
-  change is detected via `meta.embed_dim` and triggers a clean rebuild.
+  change is detected via the store's `meta.json` and clears the store for a clean re-index.
 - **Writes serialize; reads don't block.** All indexing/deletion goes through one lock on the
-  `CodeRAG` facade (`_index_lock`), and `FaissVectorIndex` guards its own add/remove/search/
-  rebuild — so the MCP server's background index and live watcher run safely alongside
+  `CodeRAG` facade (`_index_lock`); the store buffers writes on the writer and reads query
+  committed data — so the MCP server's background index and live watcher run safely alongside
   concurrent agent searches. Indexing may parallelize chunk+embed across `index_workers`
-  threads, but the SQLite/FAISS writes stay single-writer (`Indexer._write`).
+  threads, but the store writes stay single-writer (`Indexer._write`).
 
 ## Quality gate
 
diff --git a/README.md b/README.md
index cfaa8f2..24956a4 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ Coding agents like Claude Code and Codex locate code by *running searches* — g
 repeat — which burns tokens and round-trips and reduces to literal keyword matching. CodeRAG
 turns the workspace into a **warm, pre-indexed** engine: a single query returns the right
 functions and files ranked by **meaning *and* keyword**, with exact `path:line` citations. The
-embedding model loads once, so each query is one in-process lookup (FAISS + BM25 + fusion), not
+embedding model loads once, so each query is one in-process lookup (vector ANN + BM25 + fusion), not
 a multi-round shell loop — and over MCP (`coderag mcp`, below) it becomes the agent's search tool.
 
 **Proof from the eval harness** — this repo's 24 natural-language → file queries (90 files /
@@ -72,7 +72,7 @@ and the honest caveats — is in [`docs/eval.md`](docs/eval.md).
 - **Drop-in for AI coding agents — one command.** `coderag install` wires the **MCP server** into **Claude Code**, **Hermes**, and **Codex** (auto-detect or an interactive wizard, idempotent, with backups) so they search a warm, pre-indexed workspace instead of slow grep/glob/read loops — ranked `path:line` results from a single call, index kept live as you edit. Works on a plain file directory too, not just code.
 - **Measured, not guessed.** A built-in **evaluation harness** (`coderag eval`) scores retrieval quality — recall@k, MRR, nDCG@k at file *or* symbol level — and can mine a benchmark straight from your git history. Every default (1:1 hybrid, reranker opt-in, adaptive fusion off) is the choice the harness validated, including across an external repo.
 - **Incremental & live.** Content-hashed indexing only re-embeds files that changed; a debounced watcher keeps the index current as you code. No duplicate or stale vectors.
-- **Built to scale.** Exact `Flat` search for small repos, automatic switch to approximate `IVF` past a threshold so it stays fast at 100k+ chunks.
+- **Built to scale.** An embedded [LanceDB](https://github.com/lancedb/lancedb) store: brute-force exact search for small repos, automatic ANN indexing past a threshold so it stays fast at 100k+ chunks.
 - **Five surfaces, one engine.** CLI · Python library · HTTP/REST · web UI · MCP server — all thin wrappers over the same `CodeRAG` object.
 
 ### ⚡ One line: install + wire into your agent
@@ -158,7 +158,7 @@ from coderag import CodeRAG, Config
 cr = CodeRAG(Config.from_env(watched_dir="/path/to/repo"))
 cr.index()
 
-for hit in cr.search("how is the FAISS index persisted?"):
+for hit in cr.search("how is the vector index persisted?"):
     print(f"{hit.location}  {hit.symbol}  (sim={hit.similarity:.2f})")
     print(hit.text)
 ```
@@ -197,7 +197,7 @@ See it live (read-only, indexing this repo): **<https://coderag-ui.neverdecel.co
 Tools like Claude Code and Codex locate code with iterative `grep`/`glob`/read loops. CodeRAG
 exposes the same workspace as a **Model Context Protocol** server, so an agent gets fast,
 ranked `path:line` results from a single call against a **warm, pre-indexed** workspace — the
-embedding model loads once and every query is then one in-process lookup (FAISS + BM25 +
+embedding model loads once and every query is then one in-process lookup (vector ANN + BM25 +
 fusion), not a multi-round shell search.
 
 ```bash
@@ -327,21 +327,20 @@ scheduled reindex — in [`deploy/README.md`](deploy/README.md).
 graph LR
     A[Source files] --> B[Symbol-aware chunking<br/>ast / tree-sitter]
     B --> C[Embeddings<br/>fastembed · OpenAI · self-hosted]
-    C --> D[(SQLite store<br/>chunks + vectors + FTS5)]
-    D --> E[FAISS index<br/>Flat → IVF]
+    C --> D[(LanceDB store<br/>chunks + vectors + BM25)]
     Q[Query] --> F[Dense + BM25]
-    E --> F
     D --> F
     F --> G[Reciprocal Rank Fusion]
     G --> H[Ranked hits<br/>path:line + score]
 ```
 
-- **SQLite is the source of truth** (chunk text, line ranges, symbols, content hashes, and the
-  raw vectors). The **FAISS index is a rebuildable cache** — it can always be reconstructed
-  from SQLite, so switching models or index types never corrupts your data.
-- Each file's content is **hashed**; unchanged files are skipped on re-index. A changed file's
-  old chunks are removed from *both* the store and the vector index **before** new ones are
-  added — so editing never accumulates stale or duplicate vectors.
+- **One embedded LanceDB store** holds everything — chunk text, line ranges, symbols, content
+  hashes, the vectors (ANN), and the BM25 index — so there is no separate cache to keep in
+  sync. The store is also a rebuildable view of your code: it can always be re-indexed from
+  source, so switching embedding models never corrupts your data.
+- Each file's content is **hashed**; unchanged files are skipped on re-index (a cheap
+  size+mtime check avoids even reading them). A changed file's old chunks are removed
+  **before** new ones are added — so editing never accumulates stale or duplicate vectors.
 
 ## ⚙️ Configuration
 
@@ -372,7 +371,7 @@ no API key needed for a local server:
 ollama serve && ollama pull llama3.1
 export OPENAI_BASE_URL=http://localhost:11434/v1   # Ollama's OpenAI-compatible endpoint
 export CODERAG_CHAT_MODEL=llama3.1
-coderag search "how is the FAISS index persisted" --answer   # answer written locally
+coderag search "how is the vector index persisted" --answer   # answer written locally
 ```
 
 ### Common settings
@@ -385,9 +384,7 @@ table is in [`docs/configuration.md`](docs/configuration.md).
 | `CODERAG_PROVIDER` | `fastembed` | Embedding backend: `fastembed` (local) · `openai` (OpenAI API **or** any OpenAI-compatible/local server) · `fake` |
 | `CODERAG_MODEL` | `BAAI/bge-small-en-v1.5` | Local embedding model (`coderag eval --list-models`) |
 | `CODERAG_WATCHED_DIR` | cwd | Codebase to index |
-| `CODERAG_STORE_DIR` | `./.coderag` | Where the DB + index live |
-| `CODERAG_INDEX_TYPE` | `auto` | `auto` · `flat` · `ivf` |
-| `CODERAG_IVF_THRESHOLD` | `50000` | Vectors before switching Flat → IVF |
+| `CODERAG_STORE_DIR` | `./.coderag` | Where the LanceDB store lives |
 | `CODERAG_TOP_K` | `8` | Results returned |
 | `OPENAI_BASE_URL` | – | Point at a self-hosted / local OpenAI-compatible server (Ollama, vLLM, LM Studio, LocalAI) — enables local embeddings **and** local answers |
 | `OPENAI_API_KEY` | – | OpenAI **cloud** embeddings / answers (optional for a local server) |
@@ -431,7 +428,7 @@ Apache License 2.0 — see [LICENSE](LICENSE).
 
 ## 🙏 Acknowledgments
 
-[FAISS](https://github.com/facebookresearch/faiss) · [fastembed](https://github.com/qdrant/fastembed) ·
+[LanceDB](https://github.com/lancedb/lancedb) · [fastembed](https://github.com/qdrant/fastembed) ·
 [tree-sitter](https://tree-sitter.github.io/tree-sitter/) · [FastAPI](https://fastapi.tiangolo.com/) ·
 [Jinja](https://jinja.palletsprojects.com/) · [Pygments](https://pygments.org/) · [watchdog](https://github.com/gorakhargosh/watchdog)
 
diff --git a/coderag/__init__.py b/coderag/__init__.py
index 3214f04..69a927c 100644
--- a/coderag/__init__.py
+++ b/coderag/__init__.py
@@ -18,7 +18,7 @@
 
 if TYPE_CHECKING:
     # Re-exported lazily at runtime via __getattr__ below (keeps ``import coderag``
-    # light — no faiss/fastembed pulled in at import). Declared here only so type
+    # light — no lancedb/fastembed pulled in at import). Declared here only so type
     # checkers and static analysis see ``CodeRAG`` as a defined export of __all__.
     from coderag.api import CodeRAG
 
@@ -28,7 +28,7 @@
 
 
 def __getattr__(name: str) -> object:
-    # Lazy re-export so ``import coderag`` stays light (no faiss/fastembed at import).
+    # Lazy re-export so ``import coderag`` stays light (no lancedb/fastembed at import).
     if name == "CodeRAG":
         from coderag.api import CodeRAG
 
diff --git a/coderag/_ignore.py b/coderag/_ignore.py
index 14314f8..83983ca 100644
--- a/coderag/_ignore.py
+++ b/coderag/_ignore.py
@@ -1,16 +1,23 @@
-"""Shared ignore-glob matching for indexing and exact filesystem search.
+"""Shared file-walking + ignore matching for indexing and exact filesystem search.
 
 Both the :class:`~coderag.indexer.Indexer` and the exact filesystem search
-(:mod:`coderag.fs_search`) must skip the *same* set of paths — vendored deps, VCS
-directories, build output — or the two would disagree about what "the workspace" is.
-The matching rule lives here so both callers stay in lock-step instead of each
-re-implementing it.
+(:mod:`coderag.fs_search`) must enumerate the *same* set of paths — skipping vendored
+deps, VCS directories, build output, and (optionally) anything matched by ``.gitignore`` —
+or the two would disagree about what "the workspace" is. The single :func:`walk_files`
+generator below is the one place that decision is made, so both callers stay in lock-step.
 """
 
 from __future__ import annotations
 
 import fnmatch
-from typing import Iterable, Set
+import logging
+import os
+from pathlib import Path
+from typing import Iterable, Iterator, List, Optional, Set, Tuple
+
+logger = logging.getLogger(__name__)
+
+GITIGNORE_FILE = ".gitignore"
 
 
 def ignore_dir_names(ignore_globs: Iterable[str]) -> Set[str]:
@@ -33,3 +40,117 @@ def is_ignored(rel: str, ignore_globs: Iterable[str], ignore_dirs: Set[str]) ->
     if ignore_dirs.intersection(parts):
         return True
     return any(fnmatch.fnmatch(rel, g) for g in ignore_globs)
+
+
+def _is_ancestor(base: str, dir_rel: str) -> bool:
+    """Whether a ``.gitignore`` at ``base`` still applies at ``dir_rel`` (``""`` = root)."""
+    if base == "":
+        return True
+    return dir_rel == base or dir_rel.startswith(base + "/")
+
+
+class _GitignoreMatcher:
+    """Honor nested ``.gitignore`` files during a top-down walk (nearest rule wins).
+
+    A ``.gitignore`` at directory ``B`` scopes its patterns to paths under ``B``; the
+    closest file's rules take precedence and may re-include via ``!``. We keep a stack of
+    ``(base_rel, spec)`` ordered root→leaf, trimmed to the current directory's ancestors as
+    the (DFS pre-order) walk moves, and test a path nearest-first using pathspec's
+    tri-state ``check_file`` (ignore / negated-include / no-match). A no-op if pathspec is
+    somehow unavailable, so indexing never hard-fails on a missing optional dependency.
+    """
+
+    def __init__(self) -> None:
+        try:
+            from pathspec import GitIgnoreSpec
+        except ImportError:  # pragma: no cover - pathspec is a declared dependency
+            logger.warning(
+                "pathspec not installed; .gitignore files will not be honored."
+            )
+            self._spec_cls = None
+        else:
+            self._spec_cls = GitIgnoreSpec
+        self._stack: List[Tuple[str, object]] = []
+
+    @property
+    def enabled(self) -> bool:
+        return self._spec_cls is not None
+
+    def enter(self, dir_rel: str, dir_abs: Path) -> None:
+        """Refresh the active-rule stack for ``dir_rel`` and load its ``.gitignore``."""
+        if self._spec_cls is None:
+            return
+        # Drop rules from sibling subtrees we've left; keep only ancestors of dir_rel.
+        self._stack = [
+            (base, spec) for base, spec in self._stack if _is_ancestor(base, dir_rel)
+        ]
+        try:
+            text = (dir_abs / GITIGNORE_FILE).read_text(
+                encoding="utf-8", errors="replace"
+            )
+        except OSError:
+            return  # no .gitignore here (or unreadable)
+        self._stack.append((dir_rel, self._spec_cls.from_lines(text.splitlines())))
+
+    def match(self, rel: str, *, is_dir: bool) -> bool:
+        """True if ``rel`` (root-relative POSIX) is ignored by the active rules."""
+        if not self._stack:
+            return False
+        suffix = "/" if is_dir else ""
+        for base, spec in reversed(self._stack):
+            sub = rel if base == "" else rel[len(base) + 1 :]
+            result = spec.check_file(sub + suffix)  # type: ignore[attr-defined]
+            if result.include is not None:
+                return bool(result.include)
+        return False
+
+
+def walk_files(
+    start: Path,
+    ignore_globs: Iterable[str],
+    *,
+    root: Optional[Path] = None,
+    use_gitignore: bool = True,
+) -> Iterator[Tuple[Path, str]]:
+    """Yield ``(absolute_path, posix_rel)`` for every non-ignored file under ``start``.
+
+    ``rel`` is relative to ``root`` (defaults to ``start``) so every caller shares one
+    notion of the workspace. Ignored directories are pruned *before descending* (the big
+    win at ``/home`` scale), honoring ``ignore_globs`` (dir-name prune + path globs) and,
+    when ``use_gitignore``, nested ``.gitignore`` files.
+    """
+    start = Path(start)
+    root = Path(root) if root is not None else start
+    globs = tuple(ignore_globs)
+    ignore_dirs = ignore_dir_names(globs)
+    matcher = _GitignoreMatcher() if use_gitignore else None
+    active = matcher if (matcher is not None and matcher.enabled) else None
+
+    for dirpath, dirnames, filenames in os.walk(start):
+        d_abs = Path(dirpath)
+        try:
+            d_rel = "" if d_abs == root else d_abs.relative_to(root).as_posix()
+        except ValueError:  # pragma: no cover - start outside root
+            continue
+        if active is not None:
+            active.enter(d_rel, d_abs)
+
+        kept: List[str] = []
+        for name in dirnames:
+            if name in ignore_dirs:
+                continue
+            rel = name if d_rel == "" else f"{d_rel}/{name}"
+            if is_ignored(rel, globs, ignore_dirs):
+                continue
+            if active is not None and active.match(rel, is_dir=True):
+                continue
+            kept.append(name)
+        dirnames[:] = kept
+
+        for name in filenames:
+            rel = name if d_rel == "" else f"{d_rel}/{name}"
+            if is_ignored(rel, globs, ignore_dirs):
+                continue
+            if active is not None and active.match(rel, is_dir=False):
+                continue
+            yield d_abs / name, rel
diff --git a/coderag/api.py b/coderag/api.py
index 8cf1cce..ab19c78 100644
--- a/coderag/api.py
+++ b/coderag/api.py
@@ -1,8 +1,9 @@
 """The public CodeRAG facade — the one object every surface (CLI, HTTP, UI) routes through.
 
-Holds the wired-together engine: embedding provider, SQLite store, FAISS vector index,
-indexer, and hybrid searcher. Collaborators are built lazily so constructing a ``CodeRAG``
-is cheap and importing this module pulls in no heavy dependencies.
+Holds the wired-together engine: embedding provider, the LanceDB store (chunk metadata +
+text/BM25 + vectors/ANN in one place), the indexer, and the hybrid searcher. Collaborators
+are built lazily so constructing a ``CodeRAG`` is cheap and importing this module pulls in no
+heavy dependencies.
 """
 
 from __future__ import annotations
@@ -20,8 +21,7 @@
     from coderag.embeddings import EmbeddingProvider
     from coderag.indexer import Indexer
     from coderag.retrieval.search import HybridSearcher
-    from coderag.store.sqlite_store import SQLiteStore
-    from coderag.store.vector_index import FaissVectorIndex
+    from coderag.store.lance_store import LanceStore
 
 logger = logging.getLogger(__name__)
 
@@ -32,13 +32,9 @@ class CodeRAG:
     def __init__(self, config: Optional[Config] = None) -> None:
         self.config = config or Config.from_env()
         self._provider: Optional["EmbeddingProvider"] = None
-        self._store: Optional["SQLiteStore"] = None
-        self._vectors: Optional["FaissVectorIndex"] = None
+        self._store: Optional["LanceStore"] = None
         self._indexer: Optional["Indexer"] = None
         self._searcher: Optional["HybridSearcher"] = None
-        # Set when the store's embedding model/dim changed and the FAISS cache must
-        # be rebuilt from scratch (consumed when the vector index is first opened).
-        self._rebuild_required: bool = False
         # Serializes all indexing/deletion so concurrent writers (the CLI, the HTTP
         # surface, the MCP server's background index, and the live watcher) can't
         # interleave a file's delete-before-add sequence. Reads (search) are unaffected.
@@ -55,44 +51,23 @@ def provider(self) -> "EmbeddingProvider":
         return self._provider
 
     @property
-    def store(self) -> "SQLiteStore":
+    def store(self) -> "LanceStore":
         if self._store is None:
-            from coderag.store.sqlite_store import SQLiteStore
+            from coderag.store.lance_store import LanceStore
 
             self.config.store_dir.mkdir(parents=True, exist_ok=True)
-            self._store = SQLiteStore(self.config.db_path)
-            # bootstrap() returns True when the embedding model/dim changed and the
-            # store was cleared — the vector cache must then be fully rebuilt.
-            self._rebuild_required = self._store.bootstrap(
-                self.provider.dim, self.provider.model_id
-            )
+            self._store = LanceStore(self.config.store_dir, self.provider.dim)
+            # Clears the store when the embedding model/dim changed; a re-index then
+            # repopulates the now-empty tables (there is no separate cache to rebuild).
+            self._store.bootstrap(self.provider.dim, self.provider.model_id)
         return self._store
 
-    @property
-    def vectors(self) -> "FaissVectorIndex":
-        if self._vectors is None:
-            from coderag.store.vector_index import FaissVectorIndex
-
-            # Access the store first so its bootstrap() runs and sets the rebuild flag.
-            store = self.store
-            self._vectors = FaissVectorIndex.open(self.config, self.provider.dim)
-            # FAISS is a rebuildable cache; reconcile with the source of truth on open.
-            # An explicit rebuild signal (model/dim changed) forces a clean rebuild
-            # rather than relying on a chunk-count mismatch as a proxy.
-            if self._rebuild_required:
-                self._vectors.rebuild_from_store(store)
-            else:
-                self._vectors.ensure_consistent(store)
-        return self._vectors
-
     @property
     def indexer(self) -> "Indexer":
         if self._indexer is None:
             from coderag.indexer import Indexer
 
-            self._indexer = Indexer(
-                self.config, self.provider, self.store, self.vectors
-            )
+            self._indexer = Indexer(self.config, self.provider, self.store)
         return self._indexer
 
     @property
@@ -105,7 +80,6 @@ def searcher(self) -> "HybridSearcher":
                 self.config,
                 self.provider,
                 self.store,
-                self.vectors,
                 reranker=get_reranker(self.config),
             )
         return self._searcher
@@ -141,6 +115,7 @@ def search_files(self, pattern: str, **kwargs: Any) -> dict:
             self.config.watched_dir,
             pattern,
             ignore_globs=self.config.ignore_globs,
+            use_gitignore=self.config.use_gitignore,
             **kwargs,
         )
 
@@ -175,7 +150,7 @@ def get_file(
         if root not in full.parents and full != root:
             raise ValueError(f"Path escapes the indexed root: {path}")
         rel = full.relative_to(root).as_posix()
-        if self.store.get_file(rel) is None:
+        if self.store.get_file_meta(rel) is None:
             raise FileNotFoundError(f"Not an indexed file: {path}")
         # Decode raw bytes exactly as the indexer does — no universal-newline
         # translation — so line numbers line up with the chunker (Path.read_text
@@ -198,19 +173,15 @@ def delete_path(self, path: Union[str, Path]) -> int:
         except ValueError:
             return 0
         with self._index_lock:
-            removed = self.store.delete_file(rel)
-            if removed:
-                self.vectors.remove(removed)
-                self.vectors.save()
-        return len(removed)
+            return self.store.delete_file(rel)
 
     def warm(self) -> None:
-        """Eagerly load the provider, store, vectors, and embedding model.
+        """Eagerly load the provider, store, and embedding model.
 
         Done at server startup so the first query — and the demo UI's search-speed
         badge — reflect warm performance, not the one-off lazy model load.
         """
-        self.status()  # builds provider/store/vectors
+        self.status()  # builds provider/store
         self.provider.embed_query("warm up")  # loads the model + JITs the query path
 
     def status(self) -> dict:
@@ -227,7 +198,7 @@ def status(self) -> dict:
                 else self.config.chat_model
             ),
             "llm_base_url": self.config.openai_base_url or "",
-            "index_type": self.vectors.kind,
+            "index_type": self.store.index_kind,
             "rerank": self.config.rerank,
             "rerank_model": self.config.rerank_model if self.config.rerank else "",
             "adaptive_fusion": self.config.adaptive_fusion,
@@ -236,7 +207,7 @@ def status(self) -> dict:
             "watched_dir": str(self.config.watched_dir),
             "total_files": stats.total_files,
             "total_chunks": stats.total_chunks,
-            "vectors": self.vectors.ntotal,
+            "vectors": stats.total_chunks,
         }
 
     def close(self) -> None:
diff --git a/coderag/config.py b/coderag/config.py
index 17b115f..049777d 100644
--- a/coderag/config.py
+++ b/coderag/config.py
@@ -28,22 +28,49 @@
 )
 
 # Directories/globs never worth indexing. Note we deliberately do NOT ignore ``tests`` —
-# people search their tests too.
+# people search their tests too. The dependency/cache entries matter most at home/system
+# scale (e.g. indexing ``/home``), where they are the bulk of the file count; each
+# ``<name>/*`` entry prunes that directory wholesale anywhere in the tree.
 DEFAULT_IGNORE_GLOBS: Tuple[str, ...] = (
+    # VCS
     ".git/*",
     ".hg/*",
     ".svn/*",
-    "node_modules/*",
-    ".venv/*",
-    "venv/*",
-    "env/*",
-    "__pycache__/*",
-    "*.egg-info/*",
+    # build / packaging output
     "build/*",
     "dist/*",
+    "target/*",
+    "*.egg-info/*",
+    ".next/*",
+    ".nuxt/*",
+    # language / tool caches
+    "__pycache__/*",
     ".mypy_cache/*",
     ".pytest_cache/*",
+    ".ruff_cache/*",
+    ".tox/*",
+    ".ipynb_checkpoints/*",
+    ".gradle/*",
+    ".terraform/*",
     ".coderag/*",
+    # virtualenvs / vendored dependencies
+    ".venv/*",
+    "venv/*",
+    "env/*",
+    "node_modules/*",
+    "site-packages/*",
+    "vendor/*",
+    # user/home caches (dominant at /home scale)
+    ".cache/*",
+    ".local/*",
+    ".npm/*",
+    ".cargo/*",
+    ".rustup/*",
+    ".m2/*",
+    ".nuget/*",
+    # editor metadata
+    ".idea/*",
+    ".vscode/*",
 )
 
 
@@ -117,6 +144,10 @@ class Config:
     # --- What to index ---
     languages: Tuple[str, ...] = DEFAULT_LANGUAGES
     ignore_globs: Tuple[str, ...] = DEFAULT_IGNORE_GLOBS
+    # Honor .gitignore files while walking (in addition to ignore_globs), so a repo's own
+    # build/output exclusions are respected. On by default; disable with
+    # CODERAG_GITIGNORE=0 or `--no-gitignore`.
+    use_gitignore: bool = True
     # Index any UTF-8-decodable file as plain text, even with an unknown/absent extension
     # (Dockerfile, Makefile, LICENSE, .log, ...). Off by default so code repos aren't
     # polluted; turn on (CODERAG_INDEX_ALL_TEXT / `coderag mcp --all-text`) to make
@@ -128,12 +159,6 @@ class Config:
     window_lines: int = 60  # fallback line-window size
     window_overlap: int = 10
 
-    # --- Vector index ---
-    index_type: str = "auto"  # "auto" | "flat" | "ivf"
-    ivf_threshold: int = 50_000  # switch flat->ivf above this many vectors
-    ivf_nlist: int = 0  # 0 => derived from corpus size
-    ivf_nprobe: int = 16
-
     # --- Retrieval ---
     top_k: int = 8
     fetch_k: int = 50  # candidates pulled from each retriever before fusion
@@ -182,6 +207,11 @@ class Config:
     # --- Indexing throughput ---
     embed_batch_size: int = 64
     index_workers: int = 4
+    # Embedding device for the local (fastembed) provider. "auto" uses a CUDA GPU when
+    # onnxruntime exposes one (10-50x faster indexing) and falls back to CPU otherwise;
+    # "cpu"/"cuda" force it. A missing/broken GPU always degrades to CPU rather than failing.
+    embed_device: str = "auto"  # "auto" | "cpu" | "cuda"
+    embed_threads: int = 0  # ONNX CPU threads (0 => library default)
 
     # --- Optional LLM answer surface ---
     # Which backend turns retrieved chunks into a grounded answer.
@@ -224,14 +254,6 @@ class Config:
     demo_max_answers: int = 5  # LLM answers allowed per browser session
     demo_cooldown_seconds: int = 20  # minimum seconds between answers in a session
 
-    @property
-    def db_path(self) -> Path:
-        return self.store_dir / "coderag.db"
-
-    @property
-    def faiss_path(self) -> Path:
-        return self.store_dir / "index.faiss"
-
     def with_overrides(self, **kwargs: object) -> "Config":
         """Return a copy with the given fields replaced (config stays immutable)."""
         return replace(self, **kwargs)  # type: ignore[arg-type]
@@ -251,8 +273,6 @@ def from_env(cls, **overrides: object) -> "Config":
             ),
             watched_dir=_env_path("CODERAG_WATCHED_DIR", Path.cwd()),
             store_dir=_env_path("CODERAG_STORE_DIR", Path.cwd() / ".coderag"),
-            index_type=_env_str("CODERAG_INDEX_TYPE", cls.index_type),
-            ivf_threshold=_env_int("CODERAG_IVF_THRESHOLD", cls.ivf_threshold),
             top_k=_env_int("CODERAG_TOP_K", cls.top_k),
             fetch_k=_env_int("CODERAG_FETCH_K", cls.fetch_k),
             rrf_k=_env_int("CODERAG_RRF_K", cls.rrf_k),
@@ -280,6 +300,8 @@ def from_env(cls, **overrides: object) -> "Config":
             ),
             embed_batch_size=_env_int("CODERAG_EMBED_BATCH", cls.embed_batch_size),
             index_workers=_env_int("CODERAG_WORKERS", cls.index_workers),
+            embed_device=_env_str("CODERAG_EMBED_DEVICE", cls.embed_device),
+            embed_threads=_env_int("CODERAG_EMBED_THREADS", cls.embed_threads),
             llm_provider=_env_str("CODERAG_LLM_PROVIDER", cls.llm_provider),
             chat_model=_env_str("CODERAG_CHAT_MODEL", cls.chat_model),
             anthropic_api_key=os.getenv("ANTHROPIC_API_KEY"),
@@ -290,6 +312,9 @@ def from_env(cls, **overrides: object) -> "Config":
             api_key=os.getenv("CODERAG_API_KEY"),
             cors_origins=_env_tuple("CODERAG_CORS_ORIGINS", cls.cors_origins),
             index_all_text=_env_bool("CODERAG_INDEX_ALL_TEXT", cls.index_all_text),
+            # CODERAG_IGNORE_GLOBS *appends* extra excludes to the built-in defaults.
+            ignore_globs=DEFAULT_IGNORE_GLOBS + _env_tuple("CODERAG_IGNORE_GLOBS", ()),
+            use_gitignore=_env_bool("CODERAG_GITIGNORE", cls.use_gitignore),
             mcp_auto_index=_env_bool("CODERAG_MCP_AUTO_INDEX", cls.mcp_auto_index),
             mcp_watch=_env_bool("CODERAG_MCP_WATCH", cls.mcp_watch),
             mcp_snippet_lines=_env_int(
diff --git a/coderag/embeddings/__init__.py b/coderag/embeddings/__init__.py
index 27585eb..e4fa418 100644
--- a/coderag/embeddings/__init__.py
+++ b/coderag/embeddings/__init__.py
@@ -50,7 +50,13 @@ def get_provider(config: Config) -> EmbeddingProvider:
     if provider == "fastembed":
         from coderag.embeddings.fastembed_provider import FastEmbedProvider
 
-        return FastEmbedProvider(config.model, cache_dir=config.cache_dir)
+        return FastEmbedProvider(
+            config.model,
+            cache_dir=config.cache_dir,
+            device=config.embed_device,
+            threads=config.embed_threads,
+            batch_size=config.embed_batch_size,
+        )
     if provider == "openai":
         from coderag.embeddings.openai_provider import OpenAIEmbeddingProvider
 
diff --git a/coderag/embeddings/fastembed_provider.py b/coderag/embeddings/fastembed_provider.py
index d9a81ac..851f174 100644
--- a/coderag/embeddings/fastembed_provider.py
+++ b/coderag/embeddings/fastembed_provider.py
@@ -22,9 +22,20 @@
 class FastEmbedProvider:
     name = "fastembed"
 
-    def __init__(self, model: str = DEFAULT_MODEL, cache_dir: Optional[Path] = None):
+    def __init__(
+        self,
+        model: str = DEFAULT_MODEL,
+        cache_dir: Optional[Path] = None,
+        *,
+        device: str = "auto",
+        threads: int = 0,
+        batch_size: int = 64,
+    ):
         self._model_name = model
         self._cache_dir = str(cache_dir) if cache_dir else None
+        self._device = device
+        self._threads = threads
+        self._batch_size = max(1, batch_size)
         self._dim = self._lookup_dim(model)
 
     @staticmethod
@@ -39,12 +50,50 @@ def _lookup_dim(model: str) -> Optional[int]:
             pass
         return None
 
+    def _providers(self) -> Optional[list[str]]:
+        """ONNX execution providers for the chosen device, or None for the library default.
+
+        CUDA is listed with a CPU fallback so onnxruntime degrades gracefully at runtime;
+        ``auto`` only requests CUDA when onnxruntime actually exposes it.
+        """
+        if self._device == "cpu":
+            return ["CPUExecutionProvider"]
+        if self._device == "cuda":
+            return ["CUDAExecutionProvider", "CPUExecutionProvider"]
+        try:  # auto: prefer a GPU only if one is really available
+            import onnxruntime as ort
+
+            if "CUDAExecutionProvider" in ort.get_available_providers():
+                return ["CUDAExecutionProvider", "CPUExecutionProvider"]
+        except Exception:  # pragma: no cover - onnxruntime optional / probe best-effort
+            pass
+        return None
+
     @cached_property
     def _model(self) -> Any:
         from fastembed import TextEmbedding
 
-        logger.info("Loading fastembed model %s ...", self._model_name)
-        return TextEmbedding(self._model_name, cache_dir=self._cache_dir)
+        kwargs: dict[str, Any] = {"cache_dir": self._cache_dir}
+        providers = self._providers()
+        if providers is not None:
+            kwargs["providers"] = providers
+        if self._threads > 0:
+            kwargs["threads"] = self._threads
+        logger.info(
+            "Loading fastembed model %s (device=%s)…", self._model_name, self._device
+        )
+        try:
+            return TextEmbedding(self._model_name, **kwargs)
+        except (
+            Exception
+        ) as exc:  # pragma: no cover - GPU init can fail on broken drivers
+            if providers and providers[0] != "CPUExecutionProvider":
+                logger.warning(
+                    "GPU embedding init failed (%s); falling back to CPU.", exc
+                )
+                kwargs.pop("providers", None)
+                return TextEmbedding(self._model_name, **kwargs)
+            raise
 
     @property
     def model_id(self) -> str:
@@ -60,7 +109,7 @@ def dim(self) -> int:
     def embed_documents(self, texts: Sequence[str]) -> np.ndarray:
         if not texts:
             return np.zeros((0, self.dim), dtype="float32")
-        vecs = list(self._model.passage_embed(list(texts)))
+        vecs = list(self._model.passage_embed(list(texts), batch_size=self._batch_size))
         return np.vstack(vecs).astype("float32")
 
     def embed_query(self, text: str) -> np.ndarray:
diff --git a/coderag/eval/datasets/coderag_self.jsonl b/coderag/eval/datasets/coderag_self.jsonl
index 4575746..0dce946 100644
--- a/coderag/eval/datasets/coderag_self.jsonl
+++ b/coderag/eval/datasets/coderag_self.jsonl
@@ -1,23 +1,23 @@
 {"query": "where are duplicate or stale vectors removed when a file changes", "relevant_files": ["coderag/indexer.py"], "source": "curated"}
-{"query": "how is the FAISS index rebuilt from the SQLite source of truth", "relevant_files": ["coderag/store/vector_index.py"], "source": "curated"}
+{"query": "how is the FAISS index rebuilt from the SQLite source of truth", "relevant_files": ["coderag/store/lance_store.py"], "source": "curated"}
 {"query": "where is reciprocal rank fusion implemented", "relevant_files": ["coderag/retrieval/fusion.py"], "source": "curated"}
 {"query": "how are dense and lexical search results combined into one ranking", "relevant_files": ["coderag/retrieval/search.py"], "source": "curated"}
 {"query": "how does the debounced filesystem watcher trigger reindexing", "relevant_files": ["coderag/watch.py"], "source": "curated"}
 {"query": "where is symbol-aware chunking for Python using the ast module", "relevant_files": ["coderag/chunking/python_ast.py"], "source": "curated"}
 {"query": "how are functions and classes chunked for Go and Rust via tree-sitter", "relevant_files": ["coderag/chunking/treesitter.py"], "source": "curated"}
-{"query": "where is BM25 keyword search over SQLite FTS5 implemented", "relevant_files": ["coderag/store/sqlite_store.py"], "source": "curated"}
+{"query": "where is BM25 keyword search over SQLite FTS5 implemented", "relevant_files": ["coderag/store/lance_store.py"], "source": "curated"}
 {"query": "how does the HTTP API require an API key for authentication", "relevant_files": ["coderag/surfaces/http_api.py"], "source": "curated"}
 {"query": "how is an LLM answer streamed over the retrieved code chunks", "relevant_files": ["coderag/llm.py"], "source": "curated"}
 {"query": "where is the OpenAI-compatible embedding provider implemented", "relevant_files": ["coderag/embeddings/openai_provider.py"], "source": "curated"}
 {"query": "how does configuration load from environment variables and a dotenv file", "relevant_files": ["coderag/config.py"], "source": "curated"}
 {"query": "where is the command line search subcommand defined", "relevant_files": ["coderag/surfaces/cli.py"], "source": "curated"}
-{"query": "how does the vector index switch from flat to IVF as the corpus grows", "relevant_files": ["coderag/store/vector_index.py"], "source": "curated"}
+{"query": "how does the vector index switch from flat to IVF as the corpus grows", "relevant_files": ["coderag/store/lance_store.py"], "source": "curated"}
 {"query": "where is content hashing used to skip unchanged files on reindex", "relevant_files": ["coderag/indexer.py"], "source": "curated"}
 {"query": "how are file contents served safely for only indexed files", "relevant_files": ["coderag/api.py"], "source": "curated"}
 {"query": "where does the web UI render results with syntax highlighting", "relevant_files": ["coderag/surfaces/webui.py"], "source": "curated"}
 {"query": "how is an oversized function split into smaller line windows", "relevant_files": ["coderag/chunking/base.py"], "source": "curated"}
-{"query": "where is the database table schema for chunks and files defined", "relevant_files": ["coderag/store/schema.py"], "source": "curated"}
-{"query": "how does a model or embedding dimension change get detected and trigger a rebuild", "relevant_files": ["coderag/store/sqlite_store.py", "coderag/api.py"], "source": "curated"}
+{"query": "where is the database table schema for chunks and files defined", "relevant_files": ["coderag/store/lance_store.py"], "source": "curated"}
+{"query": "how does a model or embedding dimension change get detected and trigger a rebuild", "relevant_files": ["coderag/store/lance_store.py", "coderag/api.py"], "source": "curated"}
 {"query": "where is the deterministic offline fake embedding provider for tests", "relevant_files": ["coderag/embeddings/fake_provider.py"], "source": "curated"}
 {"query": "how are file extensions mapped to programming languages for chunking", "relevant_files": ["coderag/chunking/languages.py"], "source": "curated"}
 {"query": "where is text split into lines without collapsing carriage returns", "relevant_files": ["coderag/_lines.py"], "source": "curated"}
diff --git a/coderag/eval/datasets/coderag_self_identifiers.jsonl b/coderag/eval/datasets/coderag_self_identifiers.jsonl
index e68a227..fd3312f 100644
--- a/coderag/eval/datasets/coderag_self_identifiers.jsonl
+++ b/coderag/eval/datasets/coderag_self_identifiers.jsonl
@@ -1,14 +1,14 @@
 {"query": "reciprocal_rank_fusion", "relevant_files": ["coderag/retrieval/fusion.py"], "relevant_symbols": ["reciprocal_rank_fusion"], "source": "curated-id"}
 {"query": "search", "relevant_files": ["coderag/retrieval/search.py"], "relevant_symbols": ["HybridSearcher.search"], "source": "curated-id"}
 {"query": "_index_file", "relevant_files": ["coderag/indexer.py"], "relevant_symbols": ["Indexer._index_file"], "source": "curated-id"}
-{"query": "rebuild_from_store", "relevant_files": ["coderag/store/vector_index.py"], "relevant_symbols": ["FaissVectorIndex.rebuild_from_store"], "source": "curated-id"}
-{"query": "_choose_kind", "relevant_files": ["coderag/store/vector_index.py"], "relevant_symbols": ["FaissVectorIndex._choose_kind"], "source": "curated-id"}
-{"query": "search", "relevant_files": ["coderag/store/vector_index.py"], "relevant_symbols": ["FaissVectorIndex.search"], "source": "curated-id"}
-{"query": "_derive_nlist", "relevant_files": ["coderag/store/vector_index.py"], "relevant_symbols": ["_derive_nlist"], "source": "curated-id"}
-{"query": "fts_search", "relevant_files": ["coderag/store/sqlite_store.py"], "relevant_symbols": ["SQLiteStore.fts_search"], "source": "curated-id"}
-{"query": "bootstrap", "relevant_files": ["coderag/store/sqlite_store.py"], "relevant_symbols": ["SQLiteStore.bootstrap"], "source": "curated-id"}
-{"query": "hydrate", "relevant_files": ["coderag/store/sqlite_store.py"], "relevant_symbols": ["SQLiteStore.hydrate"], "source": "curated-id"}
-{"query": "_sanitize_fts", "relevant_files": ["coderag/store/sqlite_store.py"], "relevant_symbols": ["_sanitize_fts"], "source": "curated-id"}
+{"query": "rebuild_from_store", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["FaissVectorIndex.rebuild_from_store"], "source": "curated-id"}
+{"query": "_choose_kind", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["FaissVectorIndex._choose_kind"], "source": "curated-id"}
+{"query": "search", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["FaissVectorIndex.search"], "source": "curated-id"}
+{"query": "_derive_nlist", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["_derive_nlist"], "source": "curated-id"}
+{"query": "fts_search", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["SQLiteStore.fts_search"], "source": "curated-id"}
+{"query": "bootstrap", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["SQLiteStore.bootstrap"], "source": "curated-id"}
+{"query": "hydrate", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["SQLiteStore.hydrate"], "source": "curated-id"}
+{"query": "_sanitize_fts", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["_sanitize_fts"], "source": "curated-id"}
 {"query": "watch", "relevant_files": ["coderag/watch.py"], "relevant_symbols": ["watch"], "source": "curated-id"}
 {"query": "extract_spans", "relevant_files": ["coderag/chunking/python_ast.py"], "relevant_symbols": ["extract_spans"], "source": "curated-id"}
 {"query": "stream_answer", "relevant_files": ["coderag/llm.py"], "relevant_symbols": ["stream_answer"], "source": "curated-id"}
diff --git a/coderag/eval/datasets/coderag_self_symbols.jsonl b/coderag/eval/datasets/coderag_self_symbols.jsonl
index 4c484c3..1409e6c 100644
--- a/coderag/eval/datasets/coderag_self_symbols.jsonl
+++ b/coderag/eval/datasets/coderag_self_symbols.jsonl
@@ -1,14 +1,14 @@
 {"query": "where is reciprocal rank fusion implemented", "relevant_files": ["coderag/retrieval/fusion.py"], "relevant_symbols": ["reciprocal_rank_fusion"], "source": "curated"}
 {"query": "how are dense and lexical search results combined into one ranking", "relevant_files": ["coderag/retrieval/search.py"], "relevant_symbols": ["HybridSearcher.search"], "source": "curated"}
 {"query": "where are a changed file's old chunks removed before new ones are added", "relevant_files": ["coderag/indexer.py"], "relevant_symbols": ["Indexer._index_file"], "source": "curated"}
-{"query": "how is the FAISS index rebuilt from the SQLite store", "relevant_files": ["coderag/store/vector_index.py"], "relevant_symbols": ["FaissVectorIndex.rebuild_from_store"], "source": "curated"}
-{"query": "where does the vector index choose between flat and IVF", "relevant_files": ["coderag/store/vector_index.py"], "relevant_symbols": ["FaissVectorIndex._choose_kind"], "source": "curated"}
-{"query": "how are query vectors searched in the FAISS index", "relevant_files": ["coderag/store/vector_index.py"], "relevant_symbols": ["FaissVectorIndex.search"], "source": "curated"}
-{"query": "how is the number of IVF clusters derived from corpus size", "relevant_files": ["coderag/store/vector_index.py"], "relevant_symbols": ["_derive_nlist"], "source": "curated"}
-{"query": "where is BM25 keyword search over the full text index", "relevant_files": ["coderag/store/sqlite_store.py"], "relevant_symbols": ["SQLiteStore.fts_search"], "source": "curated"}
-{"query": "how does the store detect a model or embedding dimension change on startup", "relevant_files": ["coderag/store/sqlite_store.py"], "relevant_symbols": ["SQLiteStore.bootstrap"], "source": "curated"}
-{"query": "where are search results hydrated from the database by chunk id", "relevant_files": ["coderag/store/sqlite_store.py"], "relevant_symbols": ["SQLiteStore.hydrate"], "source": "curated"}
-{"query": "how are full text search query strings sanitized", "relevant_files": ["coderag/store/sqlite_store.py"], "relevant_symbols": ["_sanitize_fts"], "source": "curated"}
+{"query": "how is the FAISS index rebuilt from the SQLite store", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["FaissVectorIndex.rebuild_from_store"], "source": "curated"}
+{"query": "where does the vector index choose between flat and IVF", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["FaissVectorIndex._choose_kind"], "source": "curated"}
+{"query": "how are query vectors searched in the FAISS index", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["FaissVectorIndex.search"], "source": "curated"}
+{"query": "how is the number of IVF clusters derived from corpus size", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["_derive_nlist"], "source": "curated"}
+{"query": "where is BM25 keyword search over the full text index", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["SQLiteStore.fts_search"], "source": "curated"}
+{"query": "how does the store detect a model or embedding dimension change on startup", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["SQLiteStore.bootstrap"], "source": "curated"}
+{"query": "where are search results hydrated from the database by chunk id", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["SQLiteStore.hydrate"], "source": "curated"}
+{"query": "how are full text search query strings sanitized", "relevant_files": ["coderag/store/lance_store.py"], "relevant_symbols": ["_sanitize_fts"], "source": "curated"}
 {"query": "how does the filesystem watcher start watching and applying changes", "relevant_files": ["coderag/watch.py"], "relevant_symbols": ["watch"], "source": "curated"}
 {"query": "where are python functions and classes extracted as symbol spans", "relevant_files": ["coderag/chunking/python_ast.py"], "relevant_symbols": ["extract_spans"], "source": "curated"}
 {"query": "how is an LLM answer streamed over retrieved code chunks", "relevant_files": ["coderag/llm.py"], "relevant_symbols": ["stream_answer"], "source": "curated"}
diff --git a/coderag/eval/harness.py b/coderag/eval/harness.py
index 4ec5db9..1a942d2 100644
--- a/coderag/eval/harness.py
+++ b/coderag/eval/harness.py
@@ -145,13 +145,13 @@ def compare_modes(
             adaptive_fusion=False,
             graph_expansion=False,
         )
-        searcher = HybridSearcher(cfg, cr.provider, cr.store, cr.vectors)
+        searcher = HybridSearcher(cfg, cr.provider, cr.store)
         results.append(
             evaluate(searcher.search, cases, label=label, ks=ks, level=level)
         )
     if adaptive:
         cfg = cr.config.with_overrides(adaptive_fusion=True, graph_expansion=False)
-        searcher = HybridSearcher(cfg, cr.provider, cr.store, cr.vectors)
+        searcher = HybridSearcher(cfg, cr.provider, cr.store)
         results.append(
             evaluate(searcher.search, cases, label="adaptive", ks=ks, level=level)
         )
@@ -162,7 +162,7 @@ def compare_modes(
             adaptive_fusion=False,
             graph_expansion=True,
         )
-        searcher = HybridSearcher(cfg, cr.provider, cr.store, cr.vectors)
+        searcher = HybridSearcher(cfg, cr.provider, cr.store)
         results.append(
             evaluate(searcher.search, cases, label="hybrid+graph", ks=ks, level=level)
         )
@@ -173,9 +173,7 @@ def compare_modes(
             adaptive_fusion=False,
             graph_expansion=False,
         )
-        searcher = HybridSearcher(
-            cfg, cr.provider, cr.store, cr.vectors, reranker=reranker
-        )
+        searcher = HybridSearcher(cfg, cr.provider, cr.store, reranker=reranker)
         results.append(
             evaluate(searcher.search, cases, label="hybrid+rerank", ks=ks, level=level)
         )
diff --git a/coderag/fs_search.py b/coderag/fs_search.py
index 7225b97..67cf9cf 100644
--- a/coderag/fs_search.py
+++ b/coderag/fs_search.py
@@ -27,9 +27,9 @@
 from dataclasses import dataclass, field
 from fnmatch import fnmatch
 from pathlib import Path
-from typing import Dict, Iterator, List, Optional, Sequence, Tuple
+from typing import Dict, List, Optional, Sequence, Tuple
 
-from coderag._ignore import ignore_dir_names, is_ignored
+from coderag._ignore import walk_files
 from coderag._lines import split_lines
 from coderag.config import DEFAULT_IGNORE_GLOBS
 
@@ -73,22 +73,6 @@ def _rg_available() -> bool:
     return shutil.which("rg") is not None
 
 
-def _iter_files(root: Path, ignore_globs: Sequence[str]) -> Iterator[Tuple[Path, str]]:
-    """Yield ``(absolute_path, posix_rel)`` for every non-ignored file under ``root``."""
-    ignore_dirs = ignore_dir_names(ignore_globs)
-    for dirpath, dirnames, filenames in os.walk(root):
-        dirnames[:] = [d for d in dirnames if d not in ignore_dirs]
-        for name in filenames:
-            abs_path = Path(dirpath) / name
-            try:
-                rel = abs_path.relative_to(root).as_posix()
-            except ValueError:  # pragma: no cover - defensive
-                continue
-            if is_ignored(rel, ignore_globs, ignore_dirs):
-                continue
-            yield abs_path, rel
-
-
 def _glob_matches(rel: str, glob: str) -> bool:
     """Match a glob against the full relative path or just the basename (``*.py``)."""
     return fnmatch(rel, glob) or fnmatch(rel.rsplit("/", 1)[-1], glob)
@@ -226,6 +210,7 @@ def search_files(
     limit: int = DEFAULT_LIMIT,
     offset: int = 0,
     ignore_globs: Sequence[str] = DEFAULT_IGNORE_GLOBS,
+    use_gitignore: bool = True,
     ignore_case: bool = False,
     max_file_bytes: int = _MAX_FILE_BYTES,
     redact: bool = True,
@@ -261,7 +246,9 @@ def search_files(
     if target == "files":
         rels = sorted(
             rel
-            for _, rel in _iter_files(root_path, ignore_globs)
+            for _, rel in walk_files(
+                root_path, ignore_globs, use_gitignore=use_gitignore
+            )
             if _glob_matches(rel, pattern)
         )
         page, truncated, next_offset = _paginate(rels, offset, limit)
@@ -285,7 +272,9 @@ def search_files(
 
     files = [
         (abs_path, rel)
-        for abs_path, rel in _iter_files(root_path, ignore_globs)
+        for abs_path, rel in walk_files(
+            root_path, ignore_globs, use_gitignore=use_gitignore
+        )
         if file_glob is None or _glob_matches(rel, file_glob)
     ]
 
diff --git a/coderag/indexer.py b/coderag/indexer.py
index 9039645..d4442ed 100644
--- a/coderag/indexer.py
+++ b/coderag/indexer.py
@@ -1,34 +1,72 @@
 """Incremental indexing orchestration.
 
-Ties chunking -> embedding -> SQLite -> FAISS together with content-hash change detection.
-The critical correctness property (which the old ``monitor.py`` got wrong): a changed file's
-*old* chunks are removed from both the store and the vector index **before** the new ones are
-added, so re-saving a file never accumulates duplicate or stale vectors.
+Ties chunking -> embedding -> the LanceDB store together with content-hash change detection.
+The critical correctness property: a changed file's *old* chunks are removed from the store
+**before** the new ones are added (``write_file(..., replace=True)``), so re-saving a file
+never accumulates duplicate or stale rows.
 """
 
 from __future__ import annotations
 
 import hashlib
 import logging
-import os
+import sys
+import time
 from dataclasses import dataclass
 from pathlib import Path
-from typing import Iterator, List, Optional, Tuple
+from typing import TYPE_CHECKING, Any, Dict, Iterator, List, Optional, Tuple
 
 import numpy as np
 
-from coderag._ignore import ignore_dir_names, is_ignored
+from coderag._ignore import ignore_dir_names, is_ignored, walk_files
 from coderag.chunking import chunk_file
 from coderag.chunking.languages import detect_language
 from coderag.config import Config
 from coderag.embeddings import EmbeddingProvider
-from coderag.store.sqlite_store import SQLiteStore
-from coderag.store.vector_index import FaissVectorIndex
 from coderag.types import Chunk, IndexStats
 
+if TYPE_CHECKING:
+    from coderag.store.lance_store import LanceStore
+
 logger = logging.getLogger(__name__)
 
 
+class _ProgressReporter:
+    """Live, human-facing indexing progress, written to stderr (stdout stays clean).
+
+    A large index is otherwise a silent wait — the very problem behind an agent sitting at
+    "Working… 10 min" while an over-broad root is crawled. This narrates *both* phases: the
+    discovery walk (which hashes every candidate before a single chunk is embedded) and the
+    embedding pass. On a TTY it redraws one line in place; otherwise (agent terminals,
+    captured logs) it prints throttled newline updates so output stays readable. It is a
+    no-op unless ``enabled`` — the library facade and the MCP background index pass
+    ``progress=False`` and stay quiet.
+    """
+
+    def __init__(self, enabled: bool) -> None:
+        self.enabled = enabled
+        self._tty = bool(getattr(sys.stderr, "isatty", lambda: False)())
+        self._next = 0.0  # monotonic time of the next allowed (unforced) update
+
+    def update(self, msg: str, *, force: bool = False) -> None:
+        """Show ``msg``, throttled so per-file calls don't flood the terminal/logs."""
+        if not self.enabled:
+            return
+        now = time.monotonic()
+        if not force and now < self._next:
+            return
+        self._next = now + (0.1 if self._tty else 2.0)
+        sys.stderr.write(f"\r\x1b[2K{msg}" if self._tty else msg + "\n")
+        sys.stderr.flush()
+
+    def done(self, msg: str) -> None:
+        """Emit a final line and stop redrawing (clears the in-place line on a TTY)."""
+        if not self.enabled:
+            return
+        sys.stderr.write(f"\r\x1b[2K{msg}\n" if self._tty else msg + "\n")
+        sys.stderr.flush()
+
+
 @dataclass(slots=True)
 class _Work:
     rel: str
@@ -36,6 +74,8 @@ class _Work:
     text: str
     content_hash: str
     mtime: float
+    size: int
+    existed: bool  # whether the file already had rows (→ replace, delete-before-add)
 
 
 class Indexer:
@@ -43,13 +83,11 @@ def __init__(
         self,
         config: Config,
         provider: EmbeddingProvider,
-        store: SQLiteStore,
-        vectors: FaissVectorIndex,
+        store: "LanceStore",
     ) -> None:
         self.config = config
         self.provider = provider
         self.store = store
-        self.vectors = vectors
         self._ignore_dirs = ignore_dir_names(config.ignore_globs)
 
     # --- public ---
@@ -64,27 +102,44 @@ def index(
         root = self.config.watched_dir.resolve()
         target = (target or self.config.watched_dir).resolve()
         prune = target == root  # only a full-root pass removes vanished files
+        rep = _ProgressReporter(progress)
 
         stats = IndexStats()
         if full:
-            self._reset()
+            self.store.clear()
 
-        # 1. Discover candidates and detect what actually changed (cheap hash check).
+        # 1. Discover candidates and detect what actually changed (cheap stat/hash check).
+        #    Preload all file metadata once (one scan) so discovery does no per-file query.
+        metas = self.store.all_file_metas()
+        rep.update(f"Scanning {target} for files to index…", force=True)
         walked: set[str] = set()
         work: List[_Work] = []
         for abs_path, rel, language in self._walk(target, root):
             walked.add(rel)
-            item = self._maybe_work(abs_path, rel, language)
+            item = self._maybe_work(abs_path, rel, language, metas)
             if item is None:
                 stats.files_skipped += 1
             else:
                 work.append(item)
+            rep.update(
+                f"Scanning {target} — {len(walked)} file(s) seen, "
+                f"{len(work)} to index, {stats.files_skipped} unchanged/skipped…"
+            )
+        if work:
+            rep.update(
+                f"Embedding {len(work)} changed file(s) "
+                f"({stats.files_skipped} unchanged/skipped)…",
+                force=True,
+            )
+        else:
+            rep.update(
+                f"Up to date — {stats.files_skipped} file(s) unchanged.", force=True
+            )
 
         # 2. (Re)index changed files. Chunking + embedding (the CPU/network cost) may run
-        #    in parallel across files (config.index_workers); the SQLite + FAISS writes
-        #    stay on this single thread to preserve the delete-before-add invariant and
-        #    the single-connection store.
-        for added, removed in self._embed_and_write(work, progress=progress):
+        #    in parallel across files (config.index_workers); the store writes stay on this
+        #    single thread to preserve the delete-before-add invariant and single writer.
+        for added, removed in self._embed_and_write(work, reporter=rep):
             stats.chunks_added += added
             stats.chunks_removed += removed
             stats.files_indexed += 1
@@ -92,28 +147,54 @@ def index(
         # 3. Prune files that disappeared from disk (full-root passes only).
         if prune:
             for rel in set(self.store.all_file_paths()) - walked:
-                removed_ids = self.store.delete_file(rel)
-                self.vectors.remove(removed_ids)
+                removed = self.store.delete_file(rel)
                 stats.files_removed += 1
-                stats.chunks_removed += len(removed_ids)
-
-        # 4. Persist FAISS (rebuilding to IVF if we crossed the scale threshold).
-        if not self.vectors.maybe_upgrade(self.store):
-            self.vectors.save()
+                stats.chunks_removed += removed
+
+        # 4. Persist. A full pass that changed something rebuilds the FTS/vector indexes
+        #    and compacts; an incremental/single-file pass just flushes (new rows are
+        #    searchable via LanceDB's flat scan of the unindexed tail) so a watcher edit
+        #    never triggers a whole-index rebuild.
+        changed = stats.files_indexed > 0 or stats.files_removed > 0
+        if prune and changed:
+            self.store.optimize()
+        else:
+            self.store.flush()
 
         final = self.store.stats()
         stats.total_files = final.total_files
         stats.total_chunks = final.total_chunks
+        rep.done(
+            f"✓ Indexed {stats.files_indexed} file(s) — "
+            f"{stats.total_files} total / {stats.total_chunks} chunks."
+        )
         return stats
 
     # --- internals ---
 
-    def _reset(self) -> None:
-        for rel in list(self.store.all_file_paths()):
-            self.store.delete_file(rel)
-        self.vectors.rebuild_from_store(self.store)  # -> empty
-
-    def _maybe_work(self, abs_path: Path, rel: str, language: str) -> Optional[_Work]:
+    def _maybe_work(
+        self,
+        abs_path: Path,
+        rel: str,
+        language: str,
+        metas: Dict[str, Dict[str, Any]],
+    ) -> Optional[_Work]:
+        existing = metas.get(rel)
+        try:
+            st = abs_path.stat()
+        except OSError as exc:
+            logger.warning("Cannot stat %s: %s", abs_path, exc)
+            return None
+        # Cheap fast-path: if size and mtime are unchanged, skip the read+hash entirely.
+        # The hash stays the authority on "did content change" — this only avoids the read
+        # for the common untouched case (the dominant cost of re-indexing a large tree).
+        if (
+            existing is not None
+            and existing.get("size") is not None
+            and int(existing["size"]) == st.st_size
+            and abs(float(existing.get("mtime") or 0.0) - st.st_mtime) < 1e-6
+        ):
+            return None
         try:
             data = abs_path.read_bytes()
         except OSError as exc:
@@ -124,61 +205,54 @@ def _maybe_work(self, abs_path: Path, rel: str, language: str) -> Optional[_Work
         if b"\x00" in data[:8192]:
             return None  # binary file (NUL byte in the head) — never index as text
         content_hash = hashlib.sha256(data).hexdigest()
-        existing = self.store.get_file(rel)
-        if existing is not None and existing["content_hash"] == content_hash:
-            return None  # unchanged -> no embedding cost
+        if existing is not None and existing.get("content_hash") == content_hash:
+            return None  # content unchanged (e.g. touched) -> no embedding cost
         text = data.decode("utf-8", errors="replace")
-        return _Work(rel, language, text, content_hash, abs_path.stat().st_mtime)
+        return _Work(
+            rel,
+            language,
+            text,
+            content_hash,
+            st.st_mtime,
+            st.st_size,
+            existing is not None,
+        )
 
     def _embed_and_write(
-        self, work: List[_Work], *, progress: bool
+        self, work: List[_Work], *, reporter: _ProgressReporter
     ) -> Iterator[Tuple[int, int]]:
         """Chunk+embed each file (optionally across worker threads) and apply the writes.
 
         Embedding is the expensive, parallelizable step and touches no shared mutable
-        state, so it runs in a thread pool when ``index_workers > 1``. The store/FAISS
-        writes are drained here on the single calling thread, so the no-duplicate
-        (delete-before-add) invariant and the single-writer store are preserved.
+        state, so it runs in a thread pool when ``index_workers > 1``. The store writes are
+        drained here on the single calling thread, so the no-duplicate (delete-before-add)
+        invariant and the single-writer store are preserved.
         """
         if not work:
             return
         workers = max(1, self.config.index_workers)
-        bar = self._progress_bar(len(work), progress)
-        try:
-            if workers > 1 and len(work) > 1:
-                from concurrent.futures import ThreadPoolExecutor, as_completed
-
-                with ThreadPoolExecutor(max_workers=workers) as pool:
-                    futures = {pool.submit(self._prepare, item): item for item in work}
-                    for fut in as_completed(futures):
-                        chunks, vectors = fut.result()
-                        yield self._write(futures[fut], chunks, vectors)
-                        if bar is not None:
-                            bar.update(1)
-            else:
-                for item in work:
-                    chunks, vectors = self._prepare(item)
-                    yield self._write(item, chunks, vectors)
-                    if bar is not None:
-                        bar.update(1)
-        finally:
-            if bar is not None:
-                bar.close()
-
-    @staticmethod
-    def _progress_bar(total: int, progress: bool):  # type: ignore[no-untyped-def]
-        if not progress:
-            return None
-        try:
-            from tqdm import tqdm
-
-            return tqdm(total=total, desc="Indexing", unit="file")
-        except Exception:  # pragma: no cover
-            return None
+        total = len(work)
+        done = 0
+        if workers > 1 and len(work) > 1:
+            from concurrent.futures import ThreadPoolExecutor, as_completed
+
+            with ThreadPoolExecutor(max_workers=workers) as pool:
+                futures = {pool.submit(self._prepare, item): item for item in work}
+                for fut in as_completed(futures):
+                    chunks, vectors = fut.result()
+                    yield self._write(futures[fut], chunks, vectors)
+                    done += 1
+                    reporter.update(f"Embedding {done}/{total} file(s)…")
+        else:
+            for item in work:
+                chunks, vectors = self._prepare(item)
+                yield self._write(item, chunks, vectors)
+                done += 1
+                reporter.update(f"Embedding {done}/{total} file(s)…")
 
     def _prepare(self, item: _Work) -> Tuple[List[Chunk], Optional[np.ndarray]]:
-        """Chunk and embed a file. Pure with respect to the store/FAISS, so it is safe to
-        run in a worker thread; the resulting writes are applied by :meth:`_write`."""
+        """Chunk and embed a file. Pure with respect to the store, so it is safe to run in
+        a worker thread; the resulting writes are applied by :meth:`_write`."""
         chunks = chunk_file(item.text, item.language, self.config)
         if not chunks:
             return [], None
@@ -188,27 +262,20 @@ def _prepare(self, item: _Work) -> Tuple[List[Chunk], Optional[np.ndarray]]:
     def _write(
         self, item: _Work, chunks: List[Chunk], vectors: Optional[np.ndarray]
     ) -> Tuple[int, int]:
-        """Apply a prepared file: remove its old chunks (store + FAISS) before adding the
-        new ones. Must run single-threaded — it is the only writer."""
-        removed = 0
-        existing = self.store.get_file(item.rel)
-        if existing is not None:
-            old_ids = self.store.delete_chunks_for_file(int(existing["id"]))
-            self.vectors.remove(old_ids)
-            removed = len(old_ids)
-
-        file_id = self.store.upsert_file(
-            item.rel, item.language, item.content_hash, item.mtime
-        )
-
-        if not chunks or vectors is None:
-            return 0, removed
+        """Apply a prepared file to the store (delete-before-add for a replacement).
 
-        new_ids = self.store.add_chunks(
-            file_id, chunks, vectors, self.provider.model_id
+        Must run single-threaded — it is the only writer.
+        """
+        return self.store.write_file(
+            item.rel,
+            item.language,
+            item.content_hash,
+            item.mtime,
+            item.size,
+            chunks,
+            vectors,
+            replace=item.existed,
         )
-        self.vectors.add(np.array(new_ids, dtype="int64"), vectors)
-        return len(new_ids), removed
 
     def _walk(self, target: Path, root: Path) -> Iterator[Tuple[Path, str, str]]:
         if target.is_file():
@@ -218,17 +285,19 @@ def _walk(self, target: Path, root: Path) -> Iterator[Tuple[Path, str, str]]:
                 yield target, rel, language
             return
 
-        for dirpath, dirnames, filenames in os.walk(target):
-            # prune ignored directories in place for speed
-            dirnames[:] = [d for d in dirnames if d not in self._ignore_dirs]
-            for name in filenames:
-                abs_path = Path(dirpath) / name
-                rel = self._rel(abs_path, root)
-                if not rel or self._ignored(rel):
-                    continue
-                language = detect_language(name, all_text=self.config.index_all_text)
-                if language:
-                    yield abs_path, rel, language
+        # walk_files owns dir-pruning + ignore-glob + .gitignore matching, shared with
+        # fs_search so semantic and exact search see exactly the same files.
+        for abs_path, rel in walk_files(
+            target,
+            self.config.ignore_globs,
+            root=root,
+            use_gitignore=self.config.use_gitignore,
+        ):
+            language = detect_language(
+                abs_path.name, all_text=self.config.index_all_text
+            )
+            if language:
+                yield abs_path, rel, language
 
     @staticmethod
     def _rel(abs_path: Path, root: Path) -> Optional[str]:
diff --git a/coderag/install.py b/coderag/install.py
index a421a11..c544e34 100644
--- a/coderag/install.py
+++ b/coderag/install.py
@@ -21,6 +21,7 @@
 import json
 import shutil
 import sys
+import textwrap
 import tomllib
 from dataclasses import dataclass, field
 from pathlib import Path
@@ -149,6 +150,49 @@ def detect_targets() -> List[str]:
     return found
 
 
+# Whole-home / whole-system locations that are almost never the right thing to index: too
+# many files to finish in reasonable time, and (for "/") even pseudo-filesystems. The
+# wizard warns before committing to one. The natural unit is a single project/repo.
+_BROAD_ROOTS = {
+    "/home",
+    "/usr",
+    "/etc",
+    "/var",
+    "/opt",
+    "/mnt",
+    "/srv",
+    "/root",
+}
+
+
+def default_workspace(start: Optional[Path] = None) -> Path:
+    """Best default for the workspace prompt: the enclosing git repo root, else cwd.
+
+    Running the installer from inside a project should index that whole project — not a
+    stray subdirectory (too narrow to be useful) and certainly not the user's whole home
+    (too broad to finish). Walking up to the nearest ``.git`` finds the natural per-repo
+    root, which is the scope CodeRAG is tuned for.
+    """
+    start = (start or Path.cwd()).resolve()
+    for d in (start, *start.parents):
+        if (d / ".git").exists():
+            return d
+    return start
+
+
+def _is_broad_root(path: Path) -> bool:
+    """Heuristic: is ``path`` a whole-home/whole-system location rather than a project?"""
+    try:
+        rp = path.expanduser().resolve()
+    except OSError:  # pragma: no cover - resolve() rarely raises here
+        rp = path.expanduser()
+    if rp == Path(rp.anchor):  # the filesystem root, e.g. "/"
+        return True
+    if rp == Path.home().resolve():
+        return True
+    return str(rp) in _BROAD_ROOTS
+
+
 # --- per-target writers ---------------------------------------------------------------
 
 
@@ -336,13 +380,46 @@ def _ask_tools() -> List[str]:
     return picked or list(DEFAULT_TOOLS)
 
 
+def _describe_indexing(watched: Path) -> None:
+    """Tell the user what indexing this path entails — large trees are supported.
+
+    A broad root (``/home``, ``/``) is a legitimate choice — CodeRAG is meant to handle it.
+    We set expectations (the first pass takes longer and runs in the background) rather than
+    discourage it, and flag the one genuine footgun: ``/`` descends into pseudo-filesystems.
+    """
+    print(f"\n  → CodeRAG will index: {watched}")
+    if _is_broad_root(watched):
+        print(
+            textwrap.indent(
+                textwrap.dedent(
+                    """\
+                    This is a large tree (e.g. /home can be ~125k files). CodeRAG indexes it
+                    in the background and streams results as it goes — the first pass just
+                    takes longer. It skips version-control, build, and dependency directories
+                    (node_modules, .venv, __pycache__, …) automatically. For "/" specifically,
+                    exclude pseudo-filesystems like /proc and /sys."""
+                ),
+                "    ",
+            )
+        )
+    print(
+        "    It is indexed in the background the first time the agent's server starts, so\n"
+        "    search works right away and fills in as it goes. Check it anytime with\n"
+        "    `coderag status` or the index_status tool."
+    )
+
+
 def run_wizard(detected: List[str], default_watched: Path) -> List[Plan]:
     """Collect install choices interactively. Returns one :class:`Plan` per chosen target."""
     print("CodeRAG install wizard\n----------------------")
     targets = _ask_targets(detected)
     watched = Path(
-        _ask("Workspace directory to index", str(default_watched))
+        _ask(
+            "Workspace directory to index (a repo root, or a larger tree like ~/projects)",
+            str(default_watched),
+        )
     ).expanduser()
+    _describe_indexing(watched)
     plans: List[Plan] = []
     for t in targets:
         # Only Hermes supports per-server tool filtering in its config.
diff --git a/coderag/retrieval/graph.py b/coderag/retrieval/graph.py
index 296f6b6..7da2333 100644
--- a/coderag/retrieval/graph.py
+++ b/coderag/retrieval/graph.py
@@ -21,7 +21,7 @@
 from typing import TYPE_CHECKING, List, Mapping, Sequence
 
 if TYPE_CHECKING:
-    from coderag.store.sqlite_store import SQLiteStore
+    from coderag.store.lance_store import LanceStore
 
 # A bare identifier used as a call target, e.g. ``do_thing(`` (≥3 chars). The store's symbol
 # index only holds names the repo defines, so language builtins (len, str, …) never resolve.
@@ -44,7 +44,7 @@ def called_names(text: str) -> List[str]:
 
 
 def neighbor_ids(
-    store: "SQLiteStore",
+    store: "LanceStore",
     seed_ids: Sequence[int],
     seed_texts: Mapping[int, str],
     *,
diff --git a/coderag/retrieval/search.py b/coderag/retrieval/search.py
index c9ce924..13717ce 100644
--- a/coderag/retrieval/search.py
+++ b/coderag/retrieval/search.py
@@ -10,12 +10,11 @@
 from coderag.retrieval.fusion import reciprocal_rank_fusion
 from coderag.retrieval.graph import neighbor_ids
 from coderag.retrieval.query_type import fusion_weights
-from coderag.store.sqlite_store import SQLiteStore
-from coderag.store.vector_index import FaissVectorIndex
 from coderag.types import SearchHit
 
 if TYPE_CHECKING:
     from coderag.retrieval.rerank import Reranker
+    from coderag.store.lance_store import LanceStore
 
 logger = logging.getLogger(__name__)
 
@@ -25,14 +24,12 @@ def __init__(
         self,
         config: Config,
         provider: EmbeddingProvider,
-        store: SQLiteStore,
-        vectors: FaissVectorIndex,
+        store: "LanceStore",
         reranker: Optional["Reranker"] = None,
     ) -> None:
         self.config = config
         self.provider = provider
         self.store = store
-        self.vectors = vectors
         self.reranker = reranker
 
     def search(self, query: str, top_k: int) -> List[SearchHit]:
@@ -45,17 +42,16 @@ def search(self, query: str, top_k: int) -> List[SearchHit]:
             pool = max(self.config.rerank_candidates, top_k)
         fetch_k = max(self.config.fetch_k, pool)
 
-        # Dense retrieval.
+        # Dense retrieval (vector ANN over the store).
         qvec = self.provider.embed_query(query)
-        dense_ids, dense_scores = self.vectors.search(qvec, fetch_k)
+        dense = self.store.vector_search(qvec, fetch_k)
         similarity: Dict[int, float] = {
-            int(i): float(max(0.0, min(1.0, s)))
-            for i, s in zip(dense_ids, dense_scores, strict=False)
+            cid: float(max(0.0, min(1.0, s))) for cid, s in dense
         }
-        dense_ranked = [int(i) for i in dense_ids]
+        dense_ranked = [cid for cid, _ in dense]
 
-        # Lexical retrieval (BM25 over FTS5).
-        lexical_ranked = [cid for cid, _ in self.store.fts_search(query, fetch_k)]
+        # Lexical retrieval (BM25 over the store).
+        lexical_ranked = [cid for cid, _ in self.store.lexical_search(query, fetch_k)]
 
         # Fuse, then trim to the candidate pool (top_k, or deeper when reranking).
         # Weights may adapt to the query type (dense-up for NL, BM25-up for identifiers).
@@ -94,7 +90,7 @@ def search(self, query: str, top_k: int) -> List[SearchHit]:
                 SearchHit(
                     chunk_id=cid,
                     path=row["path"],
-                    symbol=row["symbol"],
+                    symbol=row["symbol"] or None,
                     kind=row["kind"],
                     language=row["language"],
                     start_line=int(row["start_line"]),
diff --git a/coderag/store/__init__.py b/coderag/store/__init__.py
index 5dc00ca..a816b55 100644
--- a/coderag/store/__init__.py
+++ b/coderag/store/__init__.py
@@ -1 +1 @@
-"""Persistent storage: SQLite as the source of truth, FAISS as a rebuildable cache."""
+"""Persistent storage: a single embedded LanceDB store (metadata + BM25 + vectors)."""
diff --git a/coderag/store/lance_store.py b/coderag/store/lance_store.py
new file mode 100644
index 0000000..dd21d6b
--- /dev/null
+++ b/coderag/store/lance_store.py
@@ -0,0 +1,527 @@
+"""The single embedded store: LanceDB holds chunk metadata, text (BM25), and vectors (ANN).
+
+This replaces the former SQLite store + separate FAISS index. One LanceDB database at
+``store_dir`` with two tables:
+
+* ``files``  — one row per indexed file (``path``, ``content_hash``, ``mtime``, ``size``,
+  ``language``): drives incremental change detection.
+* ``chunks`` — one row per chunk (``id``, ``path``, ``symbol``, ``kind``, ``language``,
+  ``start_line``, ``end_line``, ``text``, ``vector``): both BM25 (over ``text``) and vector
+  ANN (over ``vector``) live here, so there is no FAISS↔SQLite coordination to maintain.
+
+``chunks.path`` is denormalized so a file's chunks are deleted with a single ``delete(
+"path = …")`` (LanceDB has no foreign keys). The integer ``chunks.id`` is the fusion/hydrate
+key (it replaces the FAISS id). Writes are buffered and flushed in batches — LanceDB is
+columnar and many tiny appends create severe fragment/version bloat. Reads query committed
+data only; the writer owns the buffer (guarded by a lock), so a background index stays safe
+alongside live queries (partial results until ``optimize`` runs).
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import math
+import re
+import threading
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Sequence, Tuple
+
+import numpy as np
+
+from coderag.retrieval.fusion import reciprocal_rank_fusion
+from coderag.types import Chunk, IndexStats, SearchHit
+
+logger = logging.getLogger(__name__)
+
+_CHUNKS = "chunks"
+_FILES = "files"
+_META_FILE = "meta.json"
+_SCHEMA_VERSION = 1
+_FLUSH_ROWS = 8192
+# LanceDB needs enough rows to train a vector ANN index; below this, brute-force is exact
+# and fast, so we skip indexing (also keeps tiny test corpora on the exact path).
+_ANN_MIN_ROWS = 256
+_HYDRATE_COLS = [
+    "id",
+    "path",
+    "symbol",
+    "kind",
+    "language",
+    "start_line",
+    "end_line",
+    "text",
+]
+_FTS_TOKEN = re.compile(r"[A-Za-z0-9_]+")
+
+
+def _fts_query(query: str) -> str:
+    """Reduce an arbitrary query to space-separated tokens (defuses FTS operators)."""
+    return " ".join(_FTS_TOKEN.findall(query))
+
+
+class LanceStore:
+    """LanceDB-backed chunk + file store with vector ANN and BM25 search."""
+
+    def __init__(self, store_dir: Path, dim: int) -> None:
+        import lancedb
+
+        self.dim = dim
+        self._dir = Path(store_dir)
+        self._dir.mkdir(parents=True, exist_ok=True)
+        self._db = lancedb.connect(str(self._dir))
+        self._lock = threading.RLock()
+        self._chunks_buf: List[Dict[str, Any]] = []
+        self._files_buf: List[Dict[str, Any]] = []
+        self._next_id = 0
+        self._ann_built = False
+        # Symbol-index cache (callee graph expansion); invalidated by a write generation.
+        self._gen = 0
+        self._symbol_index: Optional[Dict[str, List[int]]] = None
+        self._symbol_index_gen = -1
+        if _CHUNKS in self._db.table_names():
+            self._next_id = self._max_id() + 1
+
+    # --- schema ---
+
+    def _chunks_schema(self) -> Any:
+        import pyarrow as pa
+
+        return pa.schema(
+            [
+                ("id", pa.int64()),
+                ("path", pa.string()),
+                ("symbol", pa.string()),
+                ("kind", pa.string()),
+                ("language", pa.string()),
+                ("start_line", pa.int32()),
+                ("end_line", pa.int32()),
+                ("text", pa.string()),
+                ("vector", pa.list_(pa.float32(), self.dim)),
+            ]
+        )
+
+    def _files_schema(self) -> Any:
+        import pyarrow as pa
+
+        return pa.schema(
+            [
+                ("path", pa.string()),
+                ("content_hash", pa.string()),
+                ("mtime", pa.float64()),
+                ("size", pa.int64()),
+                ("language", pa.string()),
+                ("indexed_at", pa.float64()),
+            ]
+        )
+
+    def _chunks_tbl(self) -> Any:
+        if _CHUNKS not in self._db.table_names():
+            return self._db.create_table(_CHUNKS, schema=self._chunks_schema())
+        return self._db.open_table(_CHUNKS)
+
+    def _files_tbl(self) -> Any:
+        if _FILES not in self._db.table_names():
+            return self._db.create_table(_FILES, schema=self._files_schema())
+        return self._db.open_table(_FILES)
+
+    def _max_id(self) -> int:
+        tbl = self._db.open_table(_CHUNKS)
+        n = tbl.count_rows()
+        if n == 0:
+            return -1
+        rows = tbl.search().select(["id"]).limit(n).to_list()
+        return max((int(r["id"]) for r in rows), default=-1)
+
+    # --- provenance / lifecycle ---
+
+    def bootstrap(self, embed_dim: int, embed_model: str) -> bool:
+        """Record the embedding model/dim; clear the store if they changed.
+
+        Returns True when a rebuild is required (model/dim changed) — the caller just
+        re-indexes into the now-empty tables (there is no separate index to rebuild).
+        """
+        meta_path = self._dir / _META_FILE
+        prev: Dict[str, Any] = {}
+        if meta_path.exists():
+            try:
+                prev = json.loads(meta_path.read_text(encoding="utf-8"))
+            except (OSError, json.JSONDecodeError):  # pragma: no cover - corrupt meta
+                prev = {}
+        changed = bool(prev) and (
+            int(prev.get("embed_dim", -1)) != embed_dim
+            or prev.get("embed_model") != embed_model
+        )
+        if changed:
+            logger.warning(
+                "Embedding model changed (%s/%s -> %s/%s); clearing index.",
+                prev.get("embed_model"),
+                prev.get("embed_dim"),
+                embed_model,
+                embed_dim,
+            )
+            with self._lock:
+                for name in (_CHUNKS, _FILES):
+                    if name in self._db.table_names():
+                        self._db.drop_table(name)
+                self._chunks_buf.clear()
+                self._files_buf.clear()
+                self._next_id = 0
+                self._ann_built = False
+                self._gen += 1
+        meta_path.write_text(
+            json.dumps(
+                {
+                    "embed_model": embed_model,
+                    "embed_dim": embed_dim,
+                    "schema_version": _SCHEMA_VERSION,
+                }
+            ),
+            encoding="utf-8",
+        )
+        return changed
+
+    def close(self) -> None:
+        with self._lock:
+            self._chunks_buf.clear()
+            self._files_buf.clear()
+
+    def clear(self) -> None:
+        """Drop all data (used by a full rebuild). Keeps the recorded provenance meta."""
+        with self._lock:
+            for name in (_CHUNKS, _FILES):
+                if name in self._db.table_names():
+                    self._db.drop_table(name)
+            self._chunks_buf.clear()
+            self._files_buf.clear()
+            self._next_id = 0
+            self._ann_built = False
+            self._gen += 1
+
+    # --- buffered writes ---
+
+    def _flush(self) -> None:
+        if self._chunks_buf:
+            self._chunks_tbl().add(self._chunks_buf)
+            self._chunks_buf = []
+        if self._files_buf:
+            self._files_tbl().add(self._files_buf)
+            self._files_buf = []
+
+    def flush(self) -> None:
+        with self._lock:
+            self._flush()
+
+    def _delete_path_rows(self, rel: str) -> int:
+        """Delete a file's chunk + file rows from the committed tables. Returns chunks gone."""
+        names = self._db.table_names()
+        removed = 0
+        pred = f"path = '{rel.replace(chr(39), chr(39) * 2)}'"
+        if _CHUNKS in names:
+            ctbl = self._db.open_table(_CHUNKS)
+            removed = len(
+                ctbl.search().where(pred).select(["id"]).limit(10**9).to_list()
+            )
+            if removed:
+                ctbl.delete(pred)
+        if _FILES in names:
+            self._db.open_table(_FILES).delete(pred)
+        return removed
+
+    def write_file(
+        self,
+        rel: str,
+        language: str,
+        content_hash: str,
+        mtime: float,
+        size: int,
+        chunks: Sequence[Chunk],
+        vectors: Optional[np.ndarray],
+        *,
+        replace: bool,
+    ) -> Tuple[int, int]:
+        """Index one file: (replace its old rows, if any) then buffer its new rows.
+
+        Returns ``(chunks_added, chunks_removed)``. New files take the fully-batched fast
+        path (no flush); replacing a changed file flushes + deletes its old rows first, so
+        the delete-before-add invariant holds on the single writer thread.
+        """
+        import time
+
+        with self._lock:
+            removed = 0
+            if replace:
+                self._flush()
+                removed = self._delete_path_rows(rel)
+            added = 0
+            if chunks and vectors is not None:
+                mat = np.ascontiguousarray(vectors, dtype="float32")
+                norms = np.linalg.norm(mat, axis=1, keepdims=True)
+                mat = mat / np.where(norms == 0.0, 1.0, norms)
+                for chunk, vec in zip(chunks, mat, strict=False):
+                    self._chunks_buf.append(
+                        {
+                            "id": self._next_id,
+                            "path": rel,
+                            "symbol": chunk.symbol or "",
+                            "kind": chunk.kind,
+                            "language": chunk.language,
+                            "start_line": int(chunk.start_line),
+                            "end_line": int(chunk.end_line),
+                            "text": chunk.text,
+                            "vector": vec.tolist(),
+                        }
+                    )
+                    self._next_id += 1
+                    added += 1
+            self._files_buf.append(
+                {
+                    "path": rel,
+                    "content_hash": content_hash,
+                    "mtime": float(mtime),
+                    "size": int(size),
+                    "language": language,
+                    "indexed_at": time.time(),
+                }
+            )
+            self._gen += 1
+            if len(self._chunks_buf) >= _FLUSH_ROWS:
+                self._flush()
+            return added, removed
+
+    def delete_file(self, rel: str) -> int:
+        with self._lock:
+            self._flush()
+            removed = self._delete_path_rows(rel)
+            if removed:
+                self._gen += 1
+            return removed
+
+    def optimize(self) -> None:
+        """Flush, compact, (re)build the BM25 index, and build the vector ANN index at scale."""
+        with self._lock:
+            self._flush()
+            if _CHUNKS not in self._db.table_names():
+                return
+            tbl = self._db.open_table(_CHUNKS)
+            try:
+                tbl.optimize()
+                tbl.cleanup_old_versions()
+            except Exception:  # pragma: no cover - compaction is best-effort
+                logger.exception("LanceDB optimize failed (continuing).")
+            try:
+                tbl.create_fts_index("text", replace=True)
+            except Exception:  # pragma: no cover
+                logger.exception("LanceDB FTS index build failed (continuing).")
+            n = tbl.count_rows()
+            if n >= _ANN_MIN_ROWS:
+                try:
+                    nlist = max(1, min(int(4 * math.sqrt(n)), n // 39))
+                    tbl.create_index(
+                        metric="cosine",
+                        vector_column_name="vector",
+                        num_partitions=nlist,
+                        replace=True,
+                    )
+                    self._ann_built = True
+                except Exception:  # pragma: no cover - falls back to brute-force search
+                    logger.exception("LanceDB vector index build failed (brute-force).")
+
+    @property
+    def index_kind(self) -> str:
+        return "lancedb-ann" if self._ann_built else "lancedb"
+
+    # --- file metadata / change detection ---
+
+    def get_file_meta(self, rel: str) -> Optional[Dict[str, Any]]:
+        self.flush()
+        if _FILES not in self._db.table_names():
+            return None
+        pred = f"path = '{rel.replace(chr(39), chr(39) * 2)}'"
+        rows = self._db.open_table(_FILES).search().where(pred).limit(1).to_list()
+        return rows[0] if rows else None
+
+    def all_file_metas(self) -> Dict[str, Dict[str, Any]]:
+        """Every file's change-detection metadata, in one scan (indexer preload)."""
+        self.flush()
+        if _FILES not in self._db.table_names():
+            return {}
+        tbl = self._db.open_table(_FILES)
+        rows = (
+            tbl.search()
+            .select(["path", "content_hash", "mtime", "size"])
+            .limit(max(1, tbl.count_rows()))
+            .to_list()
+        )
+        return {r["path"]: r for r in rows}
+
+    def all_file_paths(self) -> List[str]:
+        return list(self.all_file_metas().keys())
+
+    # --- retrieval ---
+
+    def vector_search(self, qvec: np.ndarray, k: int) -> List[Tuple[int, float]]:
+        if _CHUNKS not in self._db.table_names():
+            return []
+        q = np.asarray(qvec, dtype="float32").reshape(-1)
+        norm = np.linalg.norm(q)
+        if norm:
+            q = q / norm
+        tbl = self._db.open_table(_CHUNKS)
+        if tbl.count_rows() == 0:
+            return []
+        rows = tbl.search(q.tolist()).metric("cosine").select(["id"]).limit(k).to_list()
+        return [(int(r["id"]), 1.0 - float(r["_distance"])) for r in rows]
+
+    def lexical_search(self, query: str, k: int) -> List[Tuple[int, float]]:
+        if _CHUNKS not in self._db.table_names():
+            return []
+        match = _fts_query(query)
+        if not match:
+            return []
+        try:
+            rows = (
+                self._db.open_table(_CHUNKS)
+                .search(match, query_type="fts")
+                .select(["id"])
+                .limit(k)
+                .to_list()
+            )
+        except Exception:  # pragma: no cover - FTS index not built yet / query rejected
+            return []
+        return [(int(r["id"]), float(r["_score"])) for r in rows]
+
+    def chunk_ids_for_path(self, rel: str) -> List[int]:
+        """The chunk ids belonging to one file (for inspection/tests)."""
+        self.flush()
+        if _CHUNKS not in self._db.table_names():
+            return []
+        pred = f"path = '{rel.replace(chr(39), chr(39) * 2)}'"
+        tbl = self._db.open_table(_CHUNKS)
+        rows = (
+            tbl.search()
+            .where(pred)
+            .select(["id"])
+            .limit(max(1, tbl.count_rows()))
+            .to_list()
+        )
+        return [int(r["id"]) for r in rows]
+
+    def hydrate(self, ids: Sequence[int]) -> Dict[int, Dict[str, Any]]:
+        if not ids or _CHUNKS not in self._db.table_names():
+            return {}
+        csv = ",".join(str(int(i)) for i in ids)
+        rows = (
+            self._db.open_table(_CHUNKS)
+            .search()
+            .where(f"id IN ({csv})")
+            .select(_HYDRATE_COLS)
+            .limit(len(ids))
+            .to_list()
+        )
+        return {int(r["id"]): r for r in rows}
+
+    def symbol_index(self) -> Dict[str, List[int]]:
+        """Map each symbol's bare name -> chunk ids defining it (cached, gen-invalidated)."""
+        with self._lock:
+            if self._symbol_index is not None and self._symbol_index_gen == self._gen:
+                return self._symbol_index
+            gen = self._gen
+            self._flush()
+        index: Dict[str, List[int]] = {}
+        if _CHUNKS in self._db.table_names():
+            tbl = self._db.open_table(_CHUNKS)
+            rows = (
+                tbl.search()
+                .where("symbol != ''")
+                .select(["id", "symbol"])
+                .limit(max(1, tbl.count_rows()))
+                .to_list()
+            )
+            for r in rows:
+                bare = str(r["symbol"]).rsplit(".", 1)[-1].strip()
+                if len(bare) >= 3:
+                    index.setdefault(bare, []).append(int(r["id"]))
+        with self._lock:
+            self._symbol_index = index
+            self._symbol_index_gen = gen
+        return index
+
+    # --- stats / UI ---
+
+    def total_chunks(self) -> int:
+        self.flush()
+        if _CHUNKS not in self._db.table_names():
+            return 0
+        return int(self._db.open_table(_CHUNKS).count_rows())
+
+    def stats(self) -> IndexStats:
+        self.flush()
+        names = self._db.table_names()
+        files = int(self._db.open_table(_FILES).count_rows()) if _FILES in names else 0
+        chunks = (
+            int(self._db.open_table(_CHUNKS).count_rows()) if _CHUNKS in names else 0
+        )
+        return IndexStats(total_files=files, total_chunks=chunks)
+
+    def _distinct(self, column: str) -> List[str]:
+        self.flush()
+        if _CHUNKS not in self._db.table_names():
+            return []
+        tbl = self._db.open_table(_CHUNKS)
+        rows = tbl.search().select([column]).limit(max(1, tbl.count_rows())).to_list()
+        return sorted({r[column] for r in rows if r.get(column)})
+
+    def distinct_languages(self) -> List[str]:
+        return self._distinct("language")
+
+    def distinct_kinds(self) -> List[str]:
+        return self._distinct("kind")
+
+    # --- convenience hybrid search (used by the bake-off scripts; engine uses HybridSearcher) ---
+
+    def search(
+        self,
+        query: str,
+        provider: Any,
+        top_k: int = 8,
+        *,
+        fetch_k: int = 50,
+        dense_weight: float = 1.0,
+        lexical_weight: float = 1.0,
+        rrf_k: int = 60,
+    ) -> List[SearchHit]:
+        if not query.strip():
+            return []
+        fetch = max(fetch_k, top_k)
+        dense = self.vector_search(provider.embed_query(query), fetch)
+        lexical = self.lexical_search(query, fetch)
+        similarity = {cid: max(0.0, min(1.0, s)) for cid, s in dense}
+        fused = reciprocal_rank_fusion(
+            [[cid for cid, _ in dense], [cid for cid, _ in lexical]],
+            k=rrf_k,
+            weights=[dense_weight, lexical_weight],
+        )[:top_k]
+        if not fused:
+            return []
+        rows = self.hydrate([cid for cid, _ in fused])
+        hits: List[SearchHit] = []
+        for cid, score in fused:
+            r = rows.get(cid)
+            if r is None:
+                continue
+            hits.append(
+                SearchHit(
+                    chunk_id=cid,
+                    path=r["path"],
+                    symbol=r["symbol"] or None,
+                    kind=r["kind"],
+                    language=r["language"],
+                    start_line=int(r["start_line"]),
+                    end_line=int(r["end_line"]),
+                    text=r["text"],
+                    score=float(score),
+                    similarity=float(similarity.get(cid, 0.0)),
+                )
+            )
+        return hits
diff --git a/coderag/store/schema.py b/coderag/store/schema.py
deleted file mode 100644
index 6581c28..0000000
--- a/coderag/store/schema.py
+++ /dev/null
@@ -1,69 +0,0 @@
-"""SQLite schema for the CodeRAG store.
-
-Design notes:
-- ``chunks.id`` IS the FAISS id. It is ``AUTOINCREMENT`` so ids are *never reused*, which
-  is what keeps a stale FAISS cache from resurrecting deleted content under a recycled id.
-- ``chunks_fts`` is an external-content FTS5 table (no duplicated text) kept in sync by
-  triggers, giving us BM25 lexical search for free alongside dense vectors.
-- ``files.content_hash`` drives incremental indexing; ``meta`` records the embedding
-  model/dim so a provider switch can trigger a rebuild instead of crashing.
-"""
-
-from __future__ import annotations
-
-SCHEMA_VERSION = 1
-
-DDL = """
-CREATE TABLE IF NOT EXISTS files (
-    id           INTEGER PRIMARY KEY AUTOINCREMENT,
-    path         TEXT NOT NULL UNIQUE,
-    language     TEXT NOT NULL,
-    content_hash TEXT NOT NULL,
-    mtime        REAL,
-    indexed_at   REAL NOT NULL
-);
-CREATE INDEX IF NOT EXISTS idx_files_path ON files(path);
-
-CREATE TABLE IF NOT EXISTS chunks (
-    id             INTEGER PRIMARY KEY AUTOINCREMENT,
-    file_id        INTEGER NOT NULL REFERENCES files(id) ON DELETE CASCADE,
-    symbol         TEXT,
-    kind           TEXT NOT NULL DEFAULT 'window',
-    start_line     INTEGER NOT NULL,
-    end_line       INTEGER NOT NULL,
-    language       TEXT NOT NULL,
-    text           TEXT NOT NULL,
-    vector         BLOB NOT NULL,
-    embed_model    TEXT NOT NULL,
-    created_at     REAL NOT NULL
-);
-CREATE INDEX IF NOT EXISTS idx_chunks_file ON chunks(file_id);
-
-CREATE VIRTUAL TABLE IF NOT EXISTS chunks_fts USING fts5(
-    text,
-    symbol,
-    content='chunks',
-    content_rowid='id',
-    tokenize='unicode61 remove_diacritics 2'
-);
-
-CREATE TRIGGER IF NOT EXISTS chunks_ai AFTER INSERT ON chunks BEGIN
-    INSERT INTO chunks_fts(rowid, text, symbol) VALUES (new.id, new.text, new.symbol);
-END;
-
-CREATE TRIGGER IF NOT EXISTS chunks_ad AFTER DELETE ON chunks BEGIN
-    INSERT INTO chunks_fts(chunks_fts, rowid, text, symbol)
-        VALUES('delete', old.id, old.text, old.symbol);
-END;
-
-CREATE TRIGGER IF NOT EXISTS chunks_au AFTER UPDATE ON chunks BEGIN
-    INSERT INTO chunks_fts(chunks_fts, rowid, text, symbol)
-        VALUES('delete', old.id, old.text, old.symbol);
-    INSERT INTO chunks_fts(rowid, text, symbol) VALUES (new.id, new.text, new.symbol);
-END;
-
-CREATE TABLE IF NOT EXISTS meta (
-    key   TEXT PRIMARY KEY,
-    value TEXT
-);
-"""
diff --git a/coderag/store/sqlite_store.py b/coderag/store/sqlite_store.py
deleted file mode 100644
index 1bd603f..0000000
--- a/coderag/store/sqlite_store.py
+++ /dev/null
@@ -1,321 +0,0 @@
-"""SQLite-backed source of truth for files, chunks, vectors, and lexical search."""
-
-from __future__ import annotations
-
-import logging
-import re
-import sqlite3
-import threading
-import time
-from pathlib import Path
-from typing import Dict, Iterator, List, Optional, Sequence, Tuple
-
-import numpy as np
-
-from coderag.store.schema import DDL, SCHEMA_VERSION
-from coderag.types import Chunk, IndexStats
-
-logger = logging.getLogger(__name__)
-
-# Strip FTS5 operators so a raw code query (e.g. ``foo::bar*``) can't raise a syntax error.
-_FTS_TOKEN = re.compile(r"[A-Za-z0-9_]+")
-
-
-def _sanitize_fts(query: str) -> str:
-    """Turn an arbitrary query into a safe FTS5 MATCH expression (token OR token)."""
-    tokens = _FTS_TOKEN.findall(query)
-    if not tokens:
-        return ""
-    # Quote each token (defuses operators) and OR them for recall on identifiers.
-    return " OR ".join(f'"{t}"' for t in tokens)
-
-
-class SQLiteStore:
-    """Thread-safe store over a single shared connection.
-
-    Point reads and writes serialize on one reentrant lock: a single ``sqlite3``
-    connection is not safe for concurrent cross-thread use even under WAL, and the
-    watcher reindexes on a background thread while surfaces may read. WAL is still
-    enabled so separate *processes* don't block each other. (``iter_vectors`` is the
-    one unlocked bulk reader; it is only used during single-threaded rebuilds.)
-    """
-
-    def __init__(self, db_path: Path) -> None:
-        self.db_path = Path(db_path)
-        self.db_path.parent.mkdir(parents=True, exist_ok=True)
-        self._lock = threading.RLock()
-        # Bumped on every chunk write so caches derived from the chunk table (the symbol
-        # index used by callee expansion) can invalidate without a table scan per query.
-        self._gen = 0
-        self._symbol_index: Optional[Dict[str, List[int]]] = None
-        self._symbol_index_gen = -1
-        self._conn = sqlite3.connect(
-            str(self.db_path), check_same_thread=False, isolation_level=None
-        )
-        self._conn.row_factory = sqlite3.Row
-        self._conn.execute("PRAGMA journal_mode=WAL")
-        self._conn.execute("PRAGMA foreign_keys=ON")
-        self._conn.execute("PRAGMA synchronous=NORMAL")
-
-    # --- lifecycle ---
-
-    def bootstrap(self, embed_dim: int, embed_model: str) -> bool:
-        """Create schema and reconcile provenance.
-
-        Returns True if a full rebuild is required because the embedding model/dimension
-        changed since the store was last written (in which case existing chunks/files are
-        cleared so a reindex repopulates cleanly).
-        """
-        with self._lock:
-            self._conn.executescript(DDL)
-            self._set_meta("schema_version", str(SCHEMA_VERSION))
-            prev_dim = self._get_meta("embed_dim")
-            prev_model = self._get_meta("embed_model")
-            rebuild = False
-            if prev_dim is not None and (
-                int(prev_dim) != embed_dim or prev_model != embed_model
-            ):
-                logger.warning(
-                    "Embedding model changed (%s/%s -> %s/%s); clearing index for "
-                    "rebuild.",
-                    prev_model,
-                    prev_dim,
-                    embed_model,
-                    embed_dim,
-                )
-                self._conn.execute("DELETE FROM chunks")
-                self._conn.execute("DELETE FROM files")
-                self._gen += 1
-                rebuild = True
-            self._set_meta("embed_dim", str(embed_dim))
-            self._set_meta("embed_model", embed_model)
-            return rebuild
-
-    def close(self) -> None:
-        with self._lock:
-            self._conn.close()
-
-    # --- meta ---
-
-    def _get_meta(self, key: str) -> Optional[str]:
-        with self._lock:
-            row = self._conn.execute(
-                "SELECT value FROM meta WHERE key = ?", (key,)
-            ).fetchone()
-        return row["value"] if row else None
-
-    def _set_meta(self, key: str, value: str) -> None:
-        self._conn.execute(
-            "INSERT INTO meta(key, value) VALUES(?, ?) "
-            "ON CONFLICT(key) DO UPDATE SET value = excluded.value",
-            (key, value),
-        )
-
-    # --- file records ---
-
-    def get_file(self, path: str) -> Optional[sqlite3.Row]:
-        with self._lock:
-            return self._conn.execute(
-                "SELECT * FROM files WHERE path = ?", (path,)
-            ).fetchone()
-
-    def all_file_paths(self) -> List[str]:
-        with self._lock:
-            rows = self._conn.execute("SELECT path FROM files").fetchall()
-        return [r["path"] for r in rows]
-
-    def distinct_languages(self) -> List[str]:
-        """Languages present in the index, sorted — used to populate UI filters."""
-        with self._lock:
-            rows = self._conn.execute(
-                "SELECT DISTINCT language FROM chunks ORDER BY language"
-            ).fetchall()
-        return [r["language"] for r in rows]
-
-    def distinct_kinds(self) -> List[str]:
-        """Chunk kinds present in the index, sorted — used to populate UI filters."""
-        with self._lock:
-            rows = self._conn.execute(
-                "SELECT DISTINCT kind FROM chunks ORDER BY kind"
-            ).fetchall()
-        return [r["kind"] for r in rows]
-
-    def upsert_file(
-        self, path: str, language: str, content_hash: str, mtime: float
-    ) -> int:
-        with self._lock:
-            now = time.time()
-            self._conn.execute(
-                "INSERT INTO files(path, language, content_hash, mtime, indexed_at) "
-                "VALUES(?, ?, ?, ?, ?) "
-                "ON CONFLICT(path) DO UPDATE SET "
-                "  language=excluded.language, content_hash=excluded.content_hash, "
-                "  mtime=excluded.mtime, indexed_at=excluded.indexed_at",
-                (path, language, content_hash, mtime, now),
-            )
-            row = self._conn.execute(
-                "SELECT id FROM files WHERE path = ?", (path,)
-            ).fetchone()
-            return int(row["id"])
-
-    # --- chunk records ---
-
-    def chunk_ids_for_file(self, file_id: int) -> List[int]:
-        with self._lock:
-            rows = self._conn.execute(
-                "SELECT id FROM chunks WHERE file_id = ?", (file_id,)
-            ).fetchall()
-        return [int(r["id"]) for r in rows]
-
-    def delete_file(self, path: str) -> List[int]:
-        """Delete a file and its chunks. Returns the removed chunk ids (FAISS ids)."""
-        with self._lock:
-            row = self._conn.execute(
-                "SELECT id FROM files WHERE path = ?", (path,)
-            ).fetchone()
-            if row is None:
-                return []
-            file_id = int(row["id"])
-            ids = self.chunk_ids_for_file(file_id)
-            self._conn.execute("DELETE FROM chunks WHERE file_id = ?", (file_id,))
-            self._conn.execute("DELETE FROM files WHERE id = ?", (file_id,))
-            if ids:
-                self._gen += 1
-            return ids
-
-    def delete_chunks_for_file(self, file_id: int) -> List[int]:
-        with self._lock:
-            ids = self.chunk_ids_for_file(file_id)
-            self._conn.execute("DELETE FROM chunks WHERE file_id = ?", (file_id,))
-            if ids:
-                self._gen += 1
-            return ids
-
-    def add_chunks(
-        self,
-        file_id: int,
-        chunks: Sequence[Chunk],
-        vectors: np.ndarray,
-        embed_model: str,
-    ) -> List[int]:
-        """Insert chunks with their vectors. Returns the assigned chunk ids in order."""
-        if len(chunks) != len(vectors):
-            raise ValueError("chunks and vectors length mismatch")
-        ids: List[int] = []
-        now = time.time()
-        with self._lock:
-            for chunk, vec in zip(chunks, vectors, strict=False):
-                blob = np.asarray(vec, dtype="float32").tobytes()
-                cur = self._conn.execute(
-                    "INSERT INTO chunks(file_id, symbol, kind, start_line, end_line, "
-                    "language, text, vector, embed_model, created_at) "
-                    "VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
-                    (
-                        file_id,
-                        chunk.symbol,
-                        chunk.kind,
-                        chunk.start_line,
-                        chunk.end_line,
-                        chunk.language,
-                        chunk.text,
-                        blob,
-                        embed_model,
-                        now,
-                    ),
-                )
-                ids.append(int(cur.lastrowid or 0))
-            if ids:
-                self._gen += 1
-        return ids
-
-    # --- retrieval support ---
-
-    def fts_search(self, query: str, limit: int) -> List[Tuple[int, float]]:
-        """Lexical search via FTS5 BM25. Returns ``(chunk_id, bm25)`` best-first."""
-        match = _sanitize_fts(query)
-        if not match:
-            return []
-        try:
-            with self._lock:
-                rows = self._conn.execute(
-                    "SELECT rowid, bm25(chunks_fts) AS score FROM chunks_fts "
-                    "WHERE chunks_fts MATCH ? ORDER BY score LIMIT ?",
-                    (match, limit),
-                ).fetchall()
-        except sqlite3.OperationalError as exc:  # pragma: no cover - defensive
-            logger.warning("FTS query failed (%s); degrading to dense-only.", exc)
-            return []
-        return [(int(r["rowid"]), float(r["score"])) for r in rows]
-
-    def symbol_index(self) -> Dict[str, List[int]]:
-        """Map each symbol's bare name (last dotted component) -> chunk ids defining it.
-
-        Used by callee expansion (``retrieval.graph``) to resolve a called identifier to
-        its definition. Cached and invalidated on any chunk write, so it costs one table
-        scan after a change and is O(1) per query otherwise. Names shorter than three
-        characters are dropped (too common/ambiguous to be useful graph edges).
-        """
-        with self._lock:
-            if self._symbol_index is not None and self._symbol_index_gen == self._gen:
-                return self._symbol_index
-            rows = self._conn.execute(
-                "SELECT id, symbol FROM chunks WHERE symbol IS NOT NULL"
-            ).fetchall()
-            gen = self._gen
-        index: Dict[str, List[int]] = {}
-        for r in rows:
-            bare = r["symbol"].rsplit(".", 1)[-1].strip()
-            if len(bare) >= 3:
-                index.setdefault(bare, []).append(int(r["id"]))
-        with self._lock:
-            self._symbol_index = index
-            self._symbol_index_gen = gen
-        return index
-
-    def hydrate(self, chunk_ids: Sequence[int]) -> Dict[int, sqlite3.Row]:
-        """Fetch chunk + file rows for the given ids in one query."""
-        if not chunk_ids:
-            return {}
-        placeholders = ",".join("?" for _ in chunk_ids)
-        with self._lock:
-            rows = self._conn.execute(
-                "SELECT c.id, c.symbol, c.kind, c.start_line, c.end_line, c.language, "  # nosec B608 — IN-list is positional "?" placeholders; ids bound as params
-                "       c.text, f.path AS path "
-                "FROM chunks c JOIN files f ON f.id = c.file_id "
-                f"WHERE c.id IN ({placeholders})",
-                tuple(chunk_ids),
-            ).fetchall()
-        return {int(r["id"]): r for r in rows}
-
-    def iter_vectors(
-        self, batch: int = 1000
-    ) -> Iterator[Tuple[np.ndarray, np.ndarray]]:
-        """Yield ``(ids, vectors)`` batches for rebuilding the FAISS index."""
-        cur = self._conn.execute("SELECT id, vector FROM chunks ORDER BY id")
-        while True:
-            rows = cur.fetchmany(batch)
-            if not rows:
-                break
-            ids = np.array([int(r["id"]) for r in rows], dtype="int64")
-            vecs = np.vstack(
-                [np.frombuffer(r["vector"], dtype="float32") for r in rows]
-            )
-            yield ids, vecs
-
-    # --- stats ---
-
-    def stats(self) -> IndexStats:
-        with self._lock:
-            files = self._conn.execute("SELECT COUNT(*) AS n FROM files").fetchone()[
-                "n"
-            ]
-            chunks = self._conn.execute("SELECT COUNT(*) AS n FROM chunks").fetchone()[
-                "n"
-            ]
-        return IndexStats(total_files=int(files), total_chunks=int(chunks))
-
-    def total_chunks(self) -> int:
-        with self._lock:
-            return int(
-                self._conn.execute("SELECT COUNT(*) AS n FROM chunks").fetchone()["n"]
-            )
diff --git a/coderag/store/vector_index.py b/coderag/store/vector_index.py
deleted file mode 100644
index 5ae86a3..0000000
--- a/coderag/store/vector_index.py
+++ /dev/null
@@ -1,220 +0,0 @@
-"""FAISS vector index — a rebuildable cache over the vectors stored in SQLite.
-
-Two backends behind one interface, selected by corpus size:
-- **flat** (``IndexIDMap2(IndexFlatIP)``): exact cosine, ideal for small/medium repos.
-- **ivf** (``IndexIVFFlat``): approximate, stays fast at 100k+ vectors.
-
-Both support ``add_with_ids`` and ``remove_ids``, so incremental indexing (delete a file's
-old chunks, add the new ones) works identically regardless of backend. Because every vector
-also lives in SQLite, the on-disk ``.faiss`` file is disposable and can be rebuilt at any
-time (``rebuild_from_store``).
-"""
-
-from __future__ import annotations
-
-import logging
-import math
-import threading
-from pathlib import Path
-from typing import TYPE_CHECKING, Iterable, Tuple
-
-import faiss
-import numpy as np
-
-from coderag.config import Config
-
-if TYPE_CHECKING:
-    from coderag.store.sqlite_store import SQLiteStore
-
-logger = logging.getLogger(__name__)
-
-
-def _normalized(vectors: np.ndarray) -> np.ndarray:
-    """Return an L2-normalized float32 copy (cosine similarity via inner product)."""
-    mat = np.ascontiguousarray(vectors, dtype="float32")
-    if mat.size:
-        mat = mat.copy()
-        faiss.normalize_L2(mat)
-    return mat
-
-
-def _derive_nlist(n: int, configured: int) -> int:
-    if configured > 0:
-        return max(1, min(configured, n))
-    return max(1, min(int(4 * math.sqrt(n)), max(1, n // 39)))
-
-
-class FaissVectorIndex:
-    def __init__(self, index: faiss.Index, kind: str, config: Config, dim: int) -> None:
-        self._index = index
-        self.kind = kind
-        self.config = config
-        self.dim = dim
-        # A FAISS index is not safe for a write (add/remove/rebuild) concurrent with a
-        # read (search). The MCP server is the first surface to run the watcher (which
-        # writes) alongside live agent queries (which read), so serialize index access on
-        # a reentrant lock. Reads are fast, so contention is negligible.
-        self._lock = threading.RLock()
-
-    # --- construction / persistence ---
-
-    @classmethod
-    def _empty_flat(cls, dim: int) -> faiss.Index:
-        return faiss.IndexIDMap2(faiss.IndexFlatIP(dim))
-
-    @classmethod
-    def open(cls, config: Config, dim: int) -> "FaissVectorIndex":
-        path = config.faiss_path
-        meta_path = Path(str(path) + ".kind")
-        if path.exists() and meta_path.exists():
-            try:
-                index = faiss.read_index(str(path))
-                kind = meta_path.read_text().strip() or "flat"
-                if kind == "ivf":
-                    # read_index returns a base Index; reach the IVF sub-index to set
-                    # the search-time nprobe (the attribute lives on IndexIVF).
-                    faiss.extract_index_ivf(index).nprobe = config.ivf_nprobe
-                return cls(index, kind, config, dim)
-            except Exception as exc:  # pragma: no cover - corrupt cache
-                logger.warning("Failed to load FAISS index (%s); starting empty.", exc)
-        return cls(cls._empty_flat(dim), "flat", config, dim)
-
-    def save(self) -> None:
-        path = self.config.faiss_path
-        path.parent.mkdir(parents=True, exist_ok=True)
-        with self._lock:
-            faiss.write_index(self._index, str(path))
-            Path(str(path) + ".kind").write_text(self.kind)
-
-    # --- properties ---
-
-    @property
-    def ntotal(self) -> int:
-        with self._lock:
-            return int(self._index.ntotal)
-
-    # --- mutations ---
-
-    def add(self, ids: np.ndarray, vectors: np.ndarray) -> None:
-        if len(ids) == 0:
-            return
-        vecs = _normalized(vectors)
-        id_arr = np.ascontiguousarray(ids, dtype="int64")
-        with self._lock:
-            self._index.add_with_ids(vecs, id_arr)
-
-    def remove(self, ids: Iterable[int]) -> int:
-        ids = list(ids)
-        if not ids:
-            return 0
-        selector = faiss.IDSelectorBatch(np.asarray(ids, dtype="int64"))
-        with self._lock:
-            return int(self._index.remove_ids(selector))
-
-    def search(self, query: np.ndarray, k: int) -> Tuple[np.ndarray, np.ndarray]:
-        """Return ``(ids, scores)`` for the top-k, with FAISS ``-1`` padding stripped."""
-        with self._lock:
-            if self.ntotal == 0:
-                return np.empty(0, dtype="int64"), np.empty(0, dtype="float32")
-            q = _normalized(np.asarray(query, dtype="float32").reshape(1, -1))
-            k = min(k, self.ntotal)
-            scores, ids = self._index.search(q, k)
-        ids_row, scores_row = ids[0], scores[0]
-        mask = ids_row != -1
-        return ids_row[mask].astype("int64"), scores_row[mask].astype("float32")
-
-    # --- rebuild / consistency ---
-
-    def _choose_kind(self, n: int) -> str:
-        if self.config.index_type == "flat":
-            return "flat"
-        if self.config.index_type == "ivf":
-            return "ivf" if n > 0 else "flat"
-        # auto
-        return "ivf" if n > self.config.ivf_threshold else "flat"
-
-    def _build_ivf(self, ids: np.ndarray, vecs: np.ndarray) -> faiss.Index:
-        nlist = _derive_nlist(len(ids), self.config.ivf_nlist)
-        quantizer = faiss.IndexFlatIP(self.dim)
-        index = faiss.IndexIVFFlat(
-            quantizer, self.dim, nlist, faiss.METRIC_INNER_PRODUCT
-        )
-        index.train(vecs)
-        index.add_with_ids(vecs, ids)
-        # nprobe must not exceed nlist (FAISS clamps it, but keep it meaningful).
-        index.nprobe = max(1, min(self.config.ivf_nprobe, nlist))
-        logger.info("Built IVF index: %d vectors, nlist=%d", len(ids), nlist)
-        return index
-
-    def rebuild_from_store(self, store: "SQLiteStore") -> None:
-        """Discard the current index and rebuild it from the SQLite vectors.
-
-        Holds the index lock for the whole swap so a concurrent search never observes a
-        half-built index. This is rare (model change, or the one-time flat->ivf upgrade),
-        so briefly stalling reads is an acceptable price for correctness.
-        """
-        with self._lock:
-            n = store.total_chunks()
-            kind = self._choose_kind(n)
-            if n == 0:
-                self._index = self._empty_flat(self.dim)
-                self.kind = "flat"
-                self.save()
-                return
-
-            if kind == "ivf":
-                # IVF needs all training vectors up front.
-                all_ids, all_vecs = [], []
-                for ids, vecs in store.iter_vectors():
-                    all_ids.append(ids)
-                    all_vecs.append(_normalized(vecs))
-                ids = np.concatenate(all_ids)
-                vecs = np.vstack(all_vecs)
-                try:
-                    self._index = self._build_ivf(ids, vecs)
-                    self.kind = "ivf"
-                except Exception as exc:
-                    # Degenerate corpora (too few or many duplicate vectors) can make IVF
-                    # training fail; fall back to exact flat rather than aborting indexing.
-                    logger.warning(
-                        "IVF training failed (%s); falling back to flat index.", exc
-                    )
-                    index = self._empty_flat(self.dim)
-                    index.add_with_ids(vecs, np.ascontiguousarray(ids))
-                    self._index = index
-                    self.kind = "flat"
-            else:
-                index = self._empty_flat(self.dim)
-                for ids, vecs in store.iter_vectors():
-                    index.add_with_ids(_normalized(vecs), np.ascontiguousarray(ids))
-                self._index = index
-                self.kind = "flat"
-                logger.info("Built flat index: %d vectors", n)
-            self.save()
-
-    def ensure_consistent(self, store: "SQLiteStore") -> None:
-        """Rebuild from SQLite if the cached index disagrees with the store.
-
-        Triggers on a vector-count mismatch *or* a dimension mismatch — the latter
-        guards against loading an index built with a different embedding model.
-        """
-        if self._index.d != self.dim or self.ntotal != store.total_chunks():
-            logger.info(
-                "FAISS cache out of sync (dim %d vs %d, %d vs %d chunks); rebuilding.",
-                self._index.d,
-                self.dim,
-                self.ntotal,
-                store.total_chunks(),
-            )
-            self.rebuild_from_store(store)
-
-    def maybe_upgrade(self, store: "SQLiteStore") -> bool:
-        """Switch flat->ivf when an auto index grows past the threshold. Returns True
-        if a rebuild happened."""
-        if self.config.index_type != "auto" or self.kind == "ivf":
-            return False
-        if store.total_chunks() > self.config.ivf_threshold:
-            logger.info("Corpus exceeded IVF threshold; upgrading flat -> ivf.")
-            self.rebuild_from_store(store)
-            return True
-        return False
diff --git a/coderag/surfaces/cli.py b/coderag/surfaces/cli.py
index fc565ff..75116bf 100644
--- a/coderag/surfaces/cli.py
+++ b/coderag/surfaces/cli.py
@@ -11,6 +11,7 @@
 import os
 import sys
 import textwrap
+import time
 from pathlib import Path
 from typing import List, Optional
 
@@ -29,6 +30,8 @@ def _build_config(args: argparse.Namespace) -> Config:
         overrides["provider"] = args.provider
     if getattr(args, "model", None):
         overrides["model"] = args.model
+    if getattr(args, "use_gitignore", None) is not None:
+        overrides["use_gitignore"] = args.use_gitignore
     return Config.from_env(**overrides)
 
 
@@ -37,6 +40,16 @@ def _build_config(args: argparse.Namespace) -> Config:
 
 def cmd_index(args: argparse.Namespace) -> int:
     cr = CodeRAG(_build_config(args))
+    if not args.quiet:
+        # The provider/model loads on first index access; on a fresh install that is a
+        # one-off model download. Say so on stderr, so the wait isn't a mystery.
+        print(
+            f"Preparing to index {cr.config.watched_dir} "
+            "(first run downloads the embedding model)…",
+            file=sys.stderr,
+            flush=True,
+        )
+    started = time.monotonic()
     stats = cr.indexer.index(
         Path(args.path).expanduser() if args.path else None,
         full=args.full,
@@ -44,7 +57,7 @@ def cmd_index(args: argparse.Namespace) -> int:
     )
     print(
         f"Indexed {stats.files_indexed} file(s), skipped {stats.files_skipped}, "
-        f"removed {stats.files_removed}. "
+        f"removed {stats.files_removed} in {time.monotonic() - started:.1f}s. "
         f"Total: {stats.total_files} files / {stats.total_chunks} chunks."
     )
     return 0
@@ -235,7 +248,9 @@ def cmd_install(args: argparse.Namespace) -> int:
     from coderag import install as inst
 
     default_watched = (
-        Path(args.watched_dir).expanduser() if args.watched_dir else Path.cwd()
+        Path(args.watched_dir).expanduser()
+        if args.watched_dir
+        else inst.default_workspace()
     )
     explicit_watched = Path(args.watched_dir).expanduser() if args.watched_dir else None
     interactive = sys.stdin.isatty()
@@ -311,6 +326,21 @@ def cmd_install(args: argparse.Namespace) -> int:
         print("\nNext steps:")
         for s in sorted(steps):
             print(f"  - {s}")
+
+    wd = next((p.watched_dir for p in plans if p.watched_dir is not None), Path.cwd())
+    print("\nHow indexing works:")
+    print(
+        "  CodeRAG indexes your workspace the first time the agent starts its server. It\n"
+        "  runs in the background, so search works right away and fills in as it goes —\n"
+        "  seconds for a repo. Large trees (a whole home/system) are supported too; the\n"
+        "  first pass just takes longer. It skips version-control, build, and dependency\n"
+        "  directories automatically (see `coderag index --help` for throughput options)."
+    )
+    print("\nHandy commands:")
+    print(f"  coderag status --watched-dir {wd}   # totals + where the index lives")
+    print(
+        f"  coderag index  --watched-dir {wd}   # build/refresh it now, with progress"
+    )
     return 0 if all(r.action != "error" for r in final) else 1
 
 
@@ -346,6 +376,19 @@ def _add_common(p: argparse.ArgumentParser) -> None:
         "any OpenAI-compatible/local server via OPENAI_BASE_URL) | fake.",
     )
     p.add_argument("--model", help="Embedding model name.")
+    p.add_argument(
+        "--gitignore",
+        dest="use_gitignore",
+        action="store_true",
+        default=None,
+        help="Honor .gitignore files while indexing/searching (default).",
+    )
+    p.add_argument(
+        "--no-gitignore",
+        dest="use_gitignore",
+        action="store_false",
+        help="Do not honor .gitignore files.",
+    )
 
 
 def build_parser() -> argparse.ArgumentParser:
diff --git a/coderag/surfaces/mcp_server.py b/coderag/surfaces/mcp_server.py
index 3231527..44ded35 100644
--- a/coderag/surfaces/mcp_server.py
+++ b/coderag/surfaces/mcp_server.py
@@ -18,7 +18,9 @@
 
 import json
 import logging
+import sys
 import threading
+import time
 from typing import TYPE_CHECKING, List, Literal, Optional
 
 if TYPE_CHECKING:
@@ -29,6 +31,19 @@
 
 logger = logging.getLogger(__name__)
 
+
+def _notify(msg: str) -> None:
+    """Surface a server lifecycle line on stderr.
+
+    The stdio transport owns stdout (it is the MCP wire protocol), so anything meant for a
+    human watching the server has to go to stderr — which is where agents capture server
+    logs. Without this, a first run is silent through the model download and the initial
+    index, so the agent looks hung. Also logged for structured-logging setups.
+    """
+    print(f"[coderag] {msg}", file=sys.stderr, flush=True)
+    logger.info(msg)
+
+
 _INSTRUCTIONS = (
     "CodeRAG indexes this workspace for fast search. Two complementary search tools, both "
     "preferable to grep/glob/find/read loops:\n"
@@ -356,6 +371,8 @@ def run_mcp(
     state = _State()
     mcp = build_mcp(cr, state=state)
 
+    _notify(f"starting — workspace: {cr.config.watched_dir}")
+    _notify("loading the embedding model (first run downloads it; may take a minute)…")
     _warm_up(cr)
 
     if auto_index:
@@ -364,10 +381,21 @@ def run_mcp(
         state.indexing = True
 
         def _initial_index() -> None:
+            _notify(
+                "building the initial index in the background — search works now and "
+                "returns more as it finishes (call index_status to check progress)"
+            )
+            started = time.monotonic()
             try:
-                cr.index()
+                stats = cr.index()
             except Exception:  # pragma: no cover - defensive
                 logger.exception("Initial MCP index failed.")
+                _notify("initial index FAILED — results may be incomplete (see logs)")
+            else:
+                _notify(
+                    f"initial index ready: {stats.total_files} files / "
+                    f"{stats.total_chunks} chunks in {time.monotonic() - started:.0f}s"
+                )
             finally:
                 state.indexing = False
 
@@ -384,6 +412,7 @@ def _initial_index() -> None:
             daemon=True,
         ).start()
 
+    _notify("ready for requests")
     try:
         mcp.run(transport=transport)
     finally:
diff --git a/coderag/surfaces/webui.py b/coderag/surfaces/webui.py
index 3e45d05..fca0641 100644
--- a/coderag/surfaces/webui.py
+++ b/coderag/surfaces/webui.py
@@ -153,15 +153,15 @@ def _llm_status(config: "Config") -> Tuple[bool, str]:
 def _searcher_for(cr: "CodeRAG", dense: float, lexical: float) -> "HybridSearcher":
     """The facade's searcher, or an ad-hoc one when weights are tuned live.
 
-    Reuses the already-loaded provider/store/vectors so changing weights never reloads
-    the index — only the cheap fusion weighting differs.
+    Reuses the already-loaded provider/store so changing weights never reloads the index —
+    only the cheap fusion weighting differs.
     """
     if dense == cr.config.dense_weight and lexical == cr.config.lexical_weight:
         return cr.searcher
     from coderag.retrieval.search import HybridSearcher
 
     cfg = cr.config.with_overrides(dense_weight=dense, lexical_weight=lexical)
-    return HybridSearcher(cfg, cr.provider, cr.store, cr.vectors)
+    return HybridSearcher(cfg, cr.provider, cr.store)
 
 
 def _apply_filters(
diff --git a/deploy/README.md b/deploy/README.md
index 702d812..0bf78d5 100644
--- a/deploy/README.md
+++ b/deploy/README.md
@@ -27,8 +27,8 @@ workspace, scheduled re-indexing, and sensible security defaults.
 
 ## How it's designed (read this first)
 
-CodeRAG keeps its index in **SQLite** (the source of truth) plus a **FAISS** cache, and
-the engine is a **single writer** — the FAISS file is written non-atomically, so two
+CodeRAG keeps its index in a single embedded **LanceDB** store, and
+the engine is a **single writer** — the store is written non-atomically, so two
 processes writing one index would corrupt it. The chart is built around that fact:
 
 - **One replica, `Recreate` strategy, `ReadWriteOnce` PVC.** Never scale the writer
@@ -495,6 +495,6 @@ helm template coderag deploy/helm/coderag -f deploy/helm/coderag/ci/full-values.
 - **Single writer by design** — do not raise `replicas`. For higher search throughput,
   put a cache/load balancer in front of the read endpoints; the index itself stays
   single-writer.
-- **`ReadWriteOnce`** ties the index to one node at a time; that's expected for SQLite.
+- **`ReadWriteOnce`** ties the index to one node at a time; that's expected for the embedded store.
 - The **UI**, when enabled, maintains a *separate* index from the server. For a single
   shared index, run the server and point browsers/tools at its REST API.
diff --git a/deploy/helm/coderag/Chart.yaml b/deploy/helm/coderag/Chart.yaml
index 5a668bb..f4c9481 100644
--- a/deploy/helm/coderag/Chart.yaml
+++ b/deploy/helm/coderag/Chart.yaml
@@ -22,7 +22,7 @@ keywords:
   - rag
   - embeddings
   - semantic-search
-  - faiss
+  - lancedb
 maintainers:
   - name: Neverdecel
     url: https://github.com/Neverdecel
diff --git a/deploy/helm/coderag/templates/configmap.yaml b/deploy/helm/coderag/templates/configmap.yaml
index 34c2e25..7644156 100644
--- a/deploy/helm/coderag/templates/configmap.yaml
+++ b/deploy/helm/coderag/templates/configmap.yaml
@@ -7,8 +7,6 @@ metadata:
 data:
   CODERAG_PROVIDER: {{ .Values.config.provider | quote }}
   CODERAG_MODEL: {{ .Values.config.model | quote }}
-  CODERAG_INDEX_TYPE: {{ .Values.config.indexType | quote }}
-  CODERAG_IVF_THRESHOLD: {{ .Values.config.ivfThreshold | quote }}
   CODERAG_TOP_K: {{ .Values.config.topK | quote }}
   CODERAG_LLM_PROVIDER: {{ .Values.config.llmProvider | quote }}
   CODERAG_CHAT_MODEL: {{ .Values.config.chatModel | quote }}
diff --git a/deploy/helm/coderag/templates/server-deployment.yaml b/deploy/helm/coderag/templates/server-deployment.yaml
index 0a61bc3..bb38aa2 100644
--- a/deploy/helm/coderag/templates/server-deployment.yaml
+++ b/deploy/helm/coderag/templates/server-deployment.yaml
@@ -7,7 +7,7 @@ metadata:
     {{- include "coderag.labels" . | nindent 4 }}
     app.kubernetes.io/component: server
 spec:
-  # CodeRAG is a single SQLite/FAISS writer — keep this at 1. The FAISS index is
+  # CodeRAG is a single-writer store — keep this at 1. The LanceDB index is
   # written non-atomically, so two replicas on one volume would corrupt it.
   replicas: 1
   strategy:
diff --git a/deploy/helm/coderag/values.schema.json b/deploy/helm/coderag/values.schema.json
index f57bb0d..dc90a6b 100644
--- a/deploy/helm/coderag/values.schema.json
+++ b/deploy/helm/coderag/values.schema.json
@@ -86,8 +86,6 @@
       "properties": {
         "provider": { "type": "string", "enum": ["fastembed", "openai", "fake"] },
         "model": { "type": "string" },
-        "indexType": { "type": "string", "enum": ["auto", "flat", "ivf"] },
-        "ivfThreshold": { "type": "integer", "minimum": 0 },
         "topK": { "type": "integer", "minimum": 1 },
         "llmProvider": { "type": "string", "enum": ["openai", "anthropic"] },
         "chatModel": { "type": "string" },
diff --git a/deploy/helm/coderag/values.yaml b/deploy/helm/coderag/values.yaml
index 906a020..1783422 100644
--- a/deploy/helm/coderag/values.yaml
+++ b/deploy/helm/coderag/values.yaml
@@ -68,7 +68,7 @@ workspace:
   # -- PVC name to mount when source=existingClaim.
   existingClaim: ""
 
-# --- Persistent index (SQLite source-of-truth + FAISS cache + downloaded model) ---
+# --- Persistent index (LanceDB store + downloaded model) ---
 # CodeRAG is a single-writer engine, so each writer (the server, or the UI when
 # enabled) gets its own ReadWriteOnce volume. Do not point two writers at one claim.
 persistence:
@@ -109,8 +109,6 @@ config:
   # fastembed (local, no key) | openai | fake
   provider: fastembed
   model: BAAI/bge-small-en-v1.5
-  indexType: auto
-  ivfThreshold: 50000
   topK: 8
   # LLM answer backend (only used by the optional `--answer` / UI answer feature).
   llmProvider: openai
diff --git a/docs/configuration.md b/docs/configuration.md
index 846e7d8..1762959 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -53,7 +53,7 @@ export OPENAI_BASE_URL=http://localhost:11434/v1   # Ollama's OpenAI-compatible
 export CODERAG_CHAT_MODEL=llama3.1                  # the model name your server serves
 
 # 3. Search with a locally-generated answer:
-coderag search "how is the FAISS index persisted" --answer
+coderag search "how is the vector index persisted" --answer
 ```
 
 Other local servers expose the same OpenAI-compatible API — only the base URL and model
@@ -105,7 +105,7 @@ coderag index --provider openai
 ```
 
 > Changing the embedding model (its dimension) triggers a one-time index rebuild — that's
-> expected and safe (SQLite is the source of truth; the FAISS index is a rebuildable cache).
+> expected and safe (the LanceDB store is rebuildable — re-indexing recreates it from source).
 
 ## Local embedding models
 
@@ -158,7 +158,7 @@ optional.
 | Variable | Default | Meaning |
 | --- | --- | --- |
 | `CODERAG_WATCHED_DIR` | cwd | Codebase to index/search. |
-| `CODERAG_STORE_DIR` | `./.coderag` | Where the SQLite DB + FAISS index live. |
+| `CODERAG_STORE_DIR` | `./.coderag` | Where the LanceDB store lives. |
 | `CODERAG_INDEX_ALL_TEXT` | `false` | Index any UTF-8 text file (docs/config/extensionless), not just code. Binary files are always skipped. |
 
 ### Retrieval & quality
@@ -185,8 +185,6 @@ optional.
 
 | Variable | Default | Meaning |
 | --- | --- | --- |
-| `CODERAG_INDEX_TYPE` | `auto` | `auto` (Flat → IVF past the threshold) · `flat` (exact) · `ivf` (approximate). |
-| `CODERAG_IVF_THRESHOLD` | `50000` | Vectors before `auto` switches Flat → IVF. |
 | `CODERAG_WORKERS` | `4` | Worker threads for chunking + embedding (`1` = serial; a big lever for remote/OpenAI embeddings). |
 
 ### HTTP API server (`coderag serve`)
diff --git a/docs/research/lancedb-spike.md b/docs/research/lancedb-spike.md
new file mode 100644
index 0000000..3f8d9fc
--- /dev/null
+++ b/docs/research/lancedb-spike.md
@@ -0,0 +1,105 @@
+# LanceDB spike: can a dedicated embedded vector DB replace SQLite + FAISS?
+
+**Status:** ✅ **adopted** — CodeRAG now uses a single embedded LanceDB store
+(`coderag/store/lance_store.py`); the SQLite store + FAISS index were removed. This document
+is kept as the record of the investigation that led to that decision. The bake-off scripts
+referenced below (`scripts/bench_lance.py`, `scripts/eval_lance.py`) were removed once the
+migration landed; `scripts/bench_store.py` and `coderag eval` benchmark the live store.
+
+**Date:** 2026-06.
+
+## Question
+
+CodeRAG today stores everything in **SQLite** (chunk metadata + text + BM25 via FTS5 + the
+vectors as BLOBs) and keeps a separate **FAISS** index for ANN. Could a single dedicated,
+embedded, open-source vector DB be faster/better and simpler — replacing *both*?
+
+Constraint (fixed): the store must stay **embedded / zero-process** (CodeRAG ships via pipx
+and runs as an MCP stdio server; a separate DB server is out).
+
+## Candidates (mid-2026)
+
+| Engine | Embedded | License | Scale | Vec + BM25 in one store | Verdict |
+|---|---|---|---|---|---|
+| **LanceDB** 0.33 | ✅ wheels, no server | Apache-2.0 | millions+ | ✅ ANN + BM25 + hybrid | **spiked** |
+| sqlite-vec | ✅ (ext) | Apache/MIT | ~100K (brute-force) | vec only; keeps SQLite | ✗ scale / not a replacement |
+| Qdrant | ⚠ local ≤20K / server | Apache-2.0 | millions (server) | ✅ (server) | ✗ not embedded |
+| DuckDB VSS | ✅ | MIT | HNSW persistence *experimental* | + FTS ext | ✗ durability |
+| Chroma | ✅ | Apache-2.0 | — | built on SQLite + hnswlib | ✗ not a replacement |
+
+LanceDB is the only embedded, OSS engine that scales to millions and unifies vector ANN +
+BM25 + metadata/filtering in one store.
+
+## Method
+
+`coderag/store/lance_store.py` implements a self-contained LanceDB store (buffered/batched
+writes, cosine vector search, Tantivy BM25, and CodeRAG's own RRF fusion so retrieval is
+comparable to `HybridSearcher`, not to LanceDB's built-in hybrid). Two offline harnesses
+drive the comparison with the **same chunker and embedding provider**, isolating the
+storage/retrieval layer:
+
+- `scripts/bench_lance.py` — index throughput, on-disk size, query latency.
+- `scripts/eval_lance.py` — retrieval quality via the existing eval harness.
+
+## Results — throughput (synthetic corpus, `fake` provider, so it measures the *store*, not the model)
+
+| corpus | backend | index time | on-disk | query (hybrid) |
+|---|---|---|---|---|
+| 6k files / 48k chunks | sqlite+faiss | 9.6 s | 19 MB | 8.7 ms |
+| 6k files / 48k chunks | **lancedb** | **3.6 s** | **7.7 MB** | 19 ms |
+| 20k files / 160k chunks | sqlite+faiss | 34.7 s | 55 MB | 24.7 ms |
+| 20k files / 160k chunks | **lancedb** | **11.3 s** | **25.8 MB** | 30 ms |
+
+- **Index throughput: LanceDB ~3× faster, and the lead grows with scale** — its columnar
+  bulk write + single BM25 index build beats SQLite's per-file transactions plus per-chunk
+  FAISS adds.
+- **Disk: ~2× smaller.**
+- **Query latency: comparable** (within ~1.2–2× at these sizes). NOTE: measured with a tiny
+  16-dim fake embedding and **no ANN index** built on either side at <50k, so this is *not*
+  representative of the million-vector / 384-dim regime — see open questions.
+- **Critical fairness lesson:** per-file `add()` to LanceDB is pathological (120 s and
+  1.7 GB at 6k files — fragment/version bloat). Writes **must** be batched; with buffering
+  the numbers above hold. A production integration must batch writes + call `optimize()`.
+
+## Results — quality
+
+Blocked in this environment: the embedding model download (`huggingface.co`) is not in the
+network allowlist, so a real-embedding eval could not be run here. The pipeline is wired and
+runs end-to-end (`eval_lance.py`); with random `fake` vectors the two backends are
+neck-and-neck (BM25-dominated), a *hint* that LanceDB's Tantivy BM25 is in the same ballpark
+as FTS5 — but **this must be confirmed with real embeddings** before any adoption:
+
+```
+pip install 'coderag[lance]'
+# on a host where the embedding model is reachable (or pre-cached):
+python scripts/eval_lance.py --repo . \
+    --dataset coderag/eval/datasets/coderag_self.jsonl --level file
+python scripts/eval_lance.py --repo /path/to/bigger/repo \
+    --dataset <symbol-dataset>.jsonl --level symbol
+```
+
+Adoption gate: LanceDB must be **≥ parity** on recall@k / nDCG / MRR at both file and
+symbol level.
+
+## Open questions (before committing to a migration)
+
+1. **Quality parity** with real embeddings (the gate above) — Tantivy BM25 tokenization
+   differs from FTS5; verify identifier/code queries don't regress.
+2. **Million-vector query latency** with proper ANN indexes (FAISS IVF vs LanceDB
+   IVF-PQ/HNSW) and real 384-dim vectors — the regime the throughput bench could not probe.
+3. **Incremental freshness:** OSS LanceDB index maintenance is manual (`optimize()`); the
+   watcher path needs a sensible cadence (unindexed rows are flat-scanned until optimized).
+4. **Dependency weight / wheels:** `lancedb` + `pyarrow` across Python 3.11–3.13 and
+   Linux/macOS-arm/Windows (no macOS-Intel wheel) vs. dropping `faiss-cpu`.
+
+## Recommendation
+
+The spike **strengthens** the case: LanceDB indexes markedly faster, uses less disk, keeps
+latency competitive, and collapses two subsystems (SQLite store + FAISS + the hand-rolled
+consistency/IVF code) into one. It is the right embedded candidate. But adoption should
+remain **gated on the real-embedding quality eval and a million-scale latency check** — both
+runnable with the harnesses here once the model is available. Because the index is a
+rebuildable cache of the source tree, the migration stays low-risk and reversible.
+
+Next step if green: extract a shared `Store` interface and implement `LanceStore` against it
+(replacing `vector_index.py` + the FTS5 schema), then drop `faiss-cpu`.
diff --git a/example.env b/example.env
index 2f0a4ee..f83a8f2 100644
--- a/example.env
+++ b/example.env
@@ -14,15 +14,9 @@ CODERAG_MODEL=BAAI/bge-small-en-v1.5
 # --- Locations ---
 # The codebase to index/search (defaults to the current directory).
 CODERAG_WATCHED_DIR=/path/to/your/codebase
-# Where the index + database are stored (defaults to ./.coderag).
+# Where the LanceDB store is kept (defaults to ./.coderag).
 # CODERAG_STORE_DIR=./.coderag
 
-# --- Vector index (scale) ---
-# auto | flat | ivf. "auto" uses exact Flat search and switches to approximate
-# IVF automatically once the corpus grows past CODERAG_IVF_THRESHOLD vectors.
-# CODERAG_INDEX_TYPE=auto
-# CODERAG_IVF_THRESHOLD=50000
-
 # --- Retrieval ---
 # CODERAG_TOP_K=8
 # Structure-aware 1-hop call-graph expansion (opt-in): enrich results with the definitions
diff --git a/pyproject.toml b/pyproject.toml
index 93f6201..611c782 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -27,12 +27,15 @@ keywords = [
     "llm",
 ]
 dependencies = [
-    "faiss-cpu>=1.14.3,<1.15",
+    "lancedb>=0.33,<1",
+    "pylance>=0.10",
+    "pyarrow>=16,<25",
     "numpy>=2.4.6,<3",
     "python-dotenv>=1.2.2,<2",
     "tenacity>=9.1.4,<10",
     "watchdog>=6.0.0,<7",
     "fastembed>=0.8.0,<1",
+    "pathspec>=0.12,<2",
     "tree-sitter>=0.25.2,<0.26",
     "tree-sitter-python>=0.25.0,<0.26",
     "tree-sitter-javascript>=0.25.0,<0.26",
@@ -69,6 +72,12 @@ mcp = [
 openai = [
     "openai>=2.41.1,<3",
 ]
+# GPU embedding for the local fastembed backend on an NVIDIA box. Installs the CUDA
+# onnxruntime so `CODERAG_EMBED_DEVICE=auto` (or `cuda`) runs embeddings on the GPU —
+# typically 10-50x faster indexing. Install with: pip install 'coderag[gpu]'.
+gpu = [
+    "onnxruntime-gpu>=1.17,<2",
+]
 anthropic = [
     "anthropic>=0.109.2,<1",
 ]
diff --git a/scripts/bench_store.py b/scripts/bench_store.py
new file mode 100644
index 0000000..4aa9e30
--- /dev/null
+++ b/scripts/bench_store.py
@@ -0,0 +1,134 @@
+#!/usr/bin/env python
+"""Throughput + memory benchmark for the indexing store, fully offline.
+
+Generates a synthetic source tree (with junk dirs and a ``.gitignore`` to exercise
+filtering) and indexes it with the ``fake`` embedding provider, so the numbers reflect the
+store/pipeline path (LanceDB) rather than model speed or the network. Reports first-index
+wall-time, files/sec, peak RSS, and incremental re-index time.
+
+Usage:
+    python scripts/bench_store.py --files 5000
+    python scripts/bench_store.py --files 125000 --reindex-frac 0.01   # the real target
+"""
+
+from __future__ import annotations
+
+import argparse
+import resource
+import shutil
+import tempfile
+import time
+from pathlib import Path
+
+from coderag.api import CodeRAG
+from coderag.config import Config
+
+_FUNC = "def func_{i}(x):\n    '''do {i}'''\n    return x + {i}\n\n"
+_CLS = "class Thing{i}:\n    def m(self, v):\n        return v * {i}\n\n"
+
+
+def _peak_rss_mb() -> float:
+    # ru_maxrss is KiB on Linux, bytes on macOS — assume Linux (CI/servers).
+    return resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024.0
+
+
+def make_tree(root: Path, n_files: int) -> None:
+    """Write ``n_files`` small Python files across a few packages, plus ignorable junk."""
+    root.mkdir(parents=True, exist_ok=True)
+    (root / ".gitignore").write_text("build/\n*.log\n", encoding="utf-8")
+    # Junk that filtering must skip (should NOT be indexed).
+    for d in ("node_modules", ".cache", "site-packages", "build"):
+        p = root / d / "junk.py"
+        p.parent.mkdir(parents=True, exist_ok=True)
+        p.write_text("def junk():\n    return 0\n", encoding="utf-8")
+    (root / "noisy.log").write_text("log line\n" * 50, encoding="utf-8")
+    # Real source: spread across packages, a few functions/classes each (multi-chunk).
+    per_pkg = 500
+    for i in range(n_files):
+        pkg = root / f"pkg_{i // per_pkg:04d}"
+        pkg.mkdir(parents=True, exist_ok=True)
+        body = "".join(_FUNC.format(i=j) for j in range(i % 5 + 2))
+        body += "".join(_CLS.format(i=j) for j in range(i % 3 + 1))
+        (pkg / f"mod_{i:06d}.py").write_text(body, encoding="utf-8")
+
+
+def _cr(repo: Path, store: Path, workers: int, batch: int) -> CodeRAG:
+    return CodeRAG(
+        Config(
+            provider="fake",
+            watched_dir=repo,
+            store_dir=store,
+            index_workers=workers,
+            embed_batch_size=batch,
+        )
+    )
+
+
+def main() -> int:
+    ap = argparse.ArgumentParser(description=__doc__)
+    ap.add_argument("--files", type=int, default=2000, help="Synthetic source files.")
+    ap.add_argument("--workers", type=int, default=4)
+    ap.add_argument("--batch", type=int, default=64)
+    ap.add_argument(
+        "--reindex-frac",
+        type=float,
+        default=0.01,
+        help="Fraction of files to touch for the incremental re-index pass.",
+    )
+    ap.add_argument("--keep", action="store_true", help="Keep the temp tree.")
+    args = ap.parse_args()
+
+    work = Path(tempfile.mkdtemp(prefix="coderag-bench-"))
+    repo, store = work / "repo", work / "store"
+    try:
+        t0 = time.monotonic()
+        make_tree(repo, args.files)
+        gen_s = time.monotonic() - t0
+        print(f"Generated {args.files} files in {gen_s:.1f}s at {repo}")
+
+        cr = _cr(repo, store, args.workers, args.batch)
+        t0 = time.monotonic()
+        stats = cr.index()
+        full_s = time.monotonic() - t0
+
+        # Junk dirs / .gitignore'd files are pruned during the walk (never counted), so
+        # confirm none leaked into the index rather than reading it off files_skipped.
+        leaked = [
+            p
+            for p in cr.store.all_file_paths()
+            if any(
+                seg in p
+                for seg in ("node_modules", ".cache", "site-packages", "build/")
+            )
+        ]
+        print("\n--- first full index ---")
+        print(f"  files indexed : {stats.files_indexed}")
+        print(f"  junk leaked   : {len(leaked)} (expect 0 — filtering works)")
+        print(f"  chunks        : {stats.total_chunks}")
+        print(f"  wall time     : {full_s:.1f}s")
+        print(f"  throughput    : {stats.files_indexed / full_s:.0f} files/s")
+        print(f"  peak RSS      : {_peak_rss_mb():.0f} MiB")
+        size_mb = sum(p.stat().st_size for p in store.rglob("*") if p.is_file()) / 1e6
+        print(f"  index on disk : {size_mb:.1f} MB")
+
+        # Incremental: touch a fraction and re-index (exercises the stat fast-path).
+        touched = max(1, int(args.files * args.reindex_frac))
+        for i in range(touched):
+            mod = repo / f"pkg_{i // 500:04d}" / f"mod_{i:06d}.py"
+            mod.write_text(mod.read_text() + "\n# touch\n", encoding="utf-8")
+        t0 = time.monotonic()
+        rstats = cr.index()
+        reindex_s = time.monotonic() - t0
+        print(f"\n--- incremental re-index ({touched} changed) ---")
+        print(f"  files indexed : {rstats.files_indexed}")
+        print(f"  files skipped : {rstats.files_skipped}")
+        print(f"  wall time     : {reindex_s:.2f}s")
+        cr.close()
+        return 0
+    finally:
+        if not args.keep:
+            shutil.rmtree(work, ignore_errors=True)
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/conftest.py b/tests/conftest.py
index addc903..34997c6 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -21,7 +21,6 @@ def config(tmp_path: Path) -> Config:
         provider="fake",
         watched_dir=tmp_path / "repo",
         store_dir=tmp_path / "store",
-        ivf_threshold=20,  # tiny so IVF-path tests don't need huge corpora
     )
 
 
diff --git a/tests/test_config_and_providers.py b/tests/test_config_and_providers.py
index f60a0fb..a06cd2c 100644
--- a/tests/test_config_and_providers.py
+++ b/tests/test_config_and_providers.py
@@ -11,8 +11,7 @@
 def test_config_defaults_and_derived_paths(tmp_path):
     cfg = Config(store_dir=tmp_path / ".coderag")
     assert cfg.provider == "fastembed"
-    assert cfg.db_path == tmp_path / ".coderag" / "coderag.db"
-    assert cfg.faiss_path == tmp_path / ".coderag" / "index.faiss"
+    assert cfg.store_dir == tmp_path / ".coderag"
 
 
 def test_config_is_immutable_and_copies():
@@ -37,6 +36,30 @@ def test_from_env_ignores_bad_ints(monkeypatch):
     assert cfg.top_k == 8  # falls back to default
 
 
+def test_env_ignore_globs_append_to_defaults(monkeypatch):
+    from coderag.config import DEFAULT_IGNORE_GLOBS
+
+    monkeypatch.setenv("CODERAG_IGNORE_GLOBS", "secret/*, *.bin")
+    cfg = Config.from_env()
+    assert set(DEFAULT_IGNORE_GLOBS) <= set(cfg.ignore_globs)  # defaults kept
+    assert "secret/*" in cfg.ignore_globs and "*.bin" in cfg.ignore_globs
+
+
+def test_default_ignores_cover_dependency_and_cache_dirs():
+    from coderag.config import DEFAULT_IGNORE_GLOBS
+
+    for junk in ("site-packages/*", ".cache/*", "node_modules/*", "target/*"):
+        assert junk in DEFAULT_IGNORE_GLOBS
+
+
+def test_env_embed_device_and_threads(monkeypatch):
+    monkeypatch.setenv("CODERAG_EMBED_DEVICE", "cuda")
+    monkeypatch.setenv("CODERAG_EMBED_THREADS", "8")
+    cfg = Config.from_env()
+    assert cfg.embed_device == "cuda"
+    assert cfg.embed_threads == 8
+
+
 def test_secrets_are_kept_out_of_repr():
     cfg = Config(
         openai_api_key="sk-openai-secret",
diff --git a/tests/test_fastembed_provider.py b/tests/test_fastembed_provider.py
new file mode 100644
index 0000000..2db6c13
--- /dev/null
+++ b/tests/test_fastembed_provider.py
@@ -0,0 +1,44 @@
+"""Device/provider selection for the fastembed backend (no model load, no network)."""
+
+from __future__ import annotations
+
+from coderag.embeddings.fastembed_provider import FastEmbedProvider
+
+
+def _provider(device: str) -> FastEmbedProvider:
+    return FastEmbedProvider("BAAI/bge-small-en-v1.5", device=device)
+
+
+def test_cpu_device_forces_cpu_provider():
+    assert _provider("cpu")._providers() == ["CPUExecutionProvider"]
+
+
+def test_cuda_device_lists_cuda_then_cpu_fallback():
+    # CPU is listed second so onnxruntime degrades gracefully if CUDA init fails.
+    assert _provider("cuda")._providers() == [
+        "CUDAExecutionProvider",
+        "CPUExecutionProvider",
+    ]
+
+
+def test_auto_uses_cpu_when_no_gpu(monkeypatch):
+    import onnxruntime as ort
+
+    monkeypatch.setattr(
+        ort, "get_available_providers", lambda: ["CPUExecutionProvider"]
+    )
+    assert _provider("auto")._providers() is None  # library CPU default
+
+
+def test_auto_uses_gpu_when_available(monkeypatch):
+    import onnxruntime as ort
+
+    monkeypatch.setattr(
+        ort,
+        "get_available_providers",
+        lambda: ["CUDAExecutionProvider", "CPUExecutionProvider"],
+    )
+    assert _provider("auto")._providers() == [
+        "CUDAExecutionProvider",
+        "CPUExecutionProvider",
+    ]
diff --git a/tests/test_ignore.py b/tests/test_ignore.py
new file mode 100644
index 0000000..76dbdcb
--- /dev/null
+++ b/tests/test_ignore.py
@@ -0,0 +1,68 @@
+"""Tests for the shared ignore-aware walker (:func:`coderag._ignore.walk_files`)."""
+
+from __future__ import annotations
+
+from coderag._ignore import walk_files
+from coderag.config import DEFAULT_IGNORE_GLOBS
+from tests.conftest import write
+
+
+def _rels(root, **kw) -> set[str]:
+    return {rel for _, rel in walk_files(root, DEFAULT_IGNORE_GLOBS, **kw)}
+
+
+def test_gitignore_negation_and_dir_only(tmp_path):
+    write(tmp_path / ".gitignore", "*.log\nout/\n!keep.log\n")
+    write(tmp_path / "a.py", "x\n")
+    write(tmp_path / "debug.log", "x\n")
+    write(tmp_path / "keep.log", "x\n")
+    write(tmp_path / "out" / "x.py", "x\n")  # "out" is not a built-in default ignore
+    rels = _rels(tmp_path)
+    assert "a.py" in rels
+    assert "keep.log" in rels  # re-included by negation
+    assert "debug.log" not in rels  # ignored by *.log
+    assert not any(r.startswith("out/") for r in rels)  # dir pruned by "out/"
+
+
+def test_nested_gitignore_scopes_to_subtree(tmp_path):
+    write(tmp_path / "a.txt", "x\n")
+    write(tmp_path / "sub" / ".gitignore", "*.txt\n")
+    write(tmp_path / "sub" / "b.txt", "x\n")
+    write(tmp_path / "sub" / "c.py", "x\n")
+    rels = _rels(tmp_path)
+    assert "a.txt" in rels  # root .txt unaffected by sub/.gitignore
+    assert "sub/c.py" in rels
+    assert "sub/b.txt" not in rels  # ignored by the nested rule
+
+
+def test_gitignore_can_be_disabled(tmp_path):
+    write(tmp_path / ".gitignore", "*.log\n")
+    write(tmp_path / "debug.log", "x\n")
+    assert "debug.log" not in _rels(tmp_path, use_gitignore=True)
+    assert "debug.log" in _rels(tmp_path, use_gitignore=False)
+
+
+def test_indexer_and_fs_search_agree_on_gitignore(tmp_path):
+    # The shared-walker invariant: semantic index and exact search see the same files.
+    from coderag.api import CodeRAG
+    from coderag.config import Config
+
+    repo = tmp_path / "repo"
+    write(repo / ".gitignore", "ignored/\n*.log\n")
+    write(repo / "keep.py", "def k():\n    return 1\n")
+    write(repo / "ignored" / "x.py", "def x():\n    return 1\n")
+    write(repo / "note.log", "hi\n")
+    cr = CodeRAG(
+        Config(provider="fake", watched_dir=repo, store_dir=tmp_path / "store")
+    )
+    cr.index()
+    indexed = set(cr.store.all_file_paths())
+    assert "keep.py" in indexed
+    assert not any(p.startswith("ignored/") for p in indexed)
+
+    res = cr.search_files("*", target="files", use_ripgrep=False)
+    found = {row["path"] for row in res["results"]}
+    assert "keep.py" in found
+    assert not any(p.startswith("ignored/") for p in found)
+    assert "note.log" not in found  # .log ignored, so exact search skips it too
+    cr.close()
diff --git a/tests/test_indexer.py b/tests/test_indexer.py
index 91b78c4..1d53b3a 100644
--- a/tests/test_indexer.py
+++ b/tests/test_indexer.py
@@ -2,6 +2,8 @@
 
 from __future__ import annotations
 
+from pathlib import Path
+
 from coderag.api import CodeRAG
 from tests.conftest import write
 
@@ -18,7 +20,7 @@ def test_index_creates_chunks(config):
     stats = cr.index()
     assert stats.files_indexed == 2
     assert stats.total_chunks >= 2
-    assert cr.vectors.ntotal == stats.total_chunks
+    assert cr.store.total_chunks() == stats.total_chunks
 
 
 def test_unchanged_files_are_skipped(config):
@@ -36,19 +38,14 @@ def test_editing_a_file_does_not_duplicate(config):
     write(path, "def alpha():\n    return 1\n")
     cr.index()
     chunks_before = cr.store.total_chunks()
-    vectors_before = cr.vectors.ntotal
-    assert chunks_before == vectors_before
+    assert chunks_before >= 1
 
     # Edit and reindex.
     write(path, "def alpha():\n    return 100\n\ndef gamma():\n    return 3\n")
     stats = cr.index()
     assert stats.chunks_removed >= 1  # old chunks were deleted first
-    # Store and FAISS stay in lock-step (no stale/duplicate vectors).
-    assert cr.store.total_chunks() == cr.vectors.ntotal
-    # The new content is searchable; the stale content is gone.
-    rows = cr.store.hydrate(
-        cr.store.chunk_ids_for_file(cr.store.get_file("a.py")["id"])
-    )
+    # The new content is searchable; the stale content is gone (no duplicates).
+    rows = cr.store.hydrate(cr.store.chunk_ids_for_path("a.py"))
     joined = "\n".join(r["text"] for r in rows.values())
     assert "return 100" in joined
     assert "return 1\n" not in joined or "return 100" in joined
@@ -61,13 +58,13 @@ def test_deleted_file_is_pruned(config):
     write(a, "def alpha():\n    return 1\n")
     write(b, "def beta():\n    return 2\n")
     cr.index()
-    assert cr.store.total_chunks() == cr.vectors.ntotal
+    chunks_with_b = cr.store.total_chunks()
 
     b.unlink()
     stats = cr.index()
     assert stats.files_removed == 1
     assert "b.py" not in cr.store.all_file_paths()
-    assert cr.store.total_chunks() == cr.vectors.ntotal
+    assert cr.store.total_chunks() < chunks_with_b  # b's chunks are gone
 
 
 def test_ignored_dirs_are_skipped(config):
@@ -82,6 +79,18 @@ def test_ignored_dirs_are_skipped(config):
     assert not any(".git" in p for p in paths)
 
 
+def test_dependency_and_cache_dirs_are_skipped(config):
+    cr = _cr(config)
+    write(config.watched_dir / "src" / "a.py", "def alpha():\n    return 1\n")
+    write(config.watched_dir / "site-packages" / "dep.py", "def dep():\n    return 1\n")
+    write(config.watched_dir / ".cache" / "c.py", "def c():\n    return 1\n")
+    cr.index()
+    paths = cr.store.all_file_paths()
+    assert "src/a.py" in paths
+    assert not any("site-packages" in p for p in paths)
+    assert not any(".cache" in p for p in paths)
+
+
 def test_full_rebuild_resets(config):
     cr = _cr(config)
     write(config.watched_dir / "a.py", "def alpha():\n    return 1\n")
@@ -89,7 +98,7 @@ def test_full_rebuild_resets(config):
     n1 = cr.store.total_chunks()
     stats = cr.index(full=True)
     assert stats.total_chunks == n1  # same content, rebuilt cleanly
-    assert cr.store.total_chunks() == cr.vectors.ntotal
+    assert cr.store.total_chunks() == n1
 
 
 def test_get_file_line_numbers_use_chunk_convention(config):
@@ -106,6 +115,45 @@ def test_get_file_line_numbers_use_chunk_convention(config):
     assert cr.get_file("f.txt", 1, 2) == "\n".join(expected[:2])
 
 
+def test_stat_skip_avoids_reread_of_unchanged_files(config, monkeypatch):
+    # On a re-index, an untouched file must be skipped via the cheap (size, mtime) check
+    # WITHOUT reading its bytes — the dominant cost saver for a large tree.
+    cr = _cr(config)
+    write(config.watched_dir / "a.py", "def alpha():\n    return 1\n")
+    cr.index()
+
+    orig_read = Path.read_bytes
+
+    def boom(self: Path) -> bytes:
+        if self.name == "a.py":
+            raise AssertionError("re-read an unchanged file instead of stat-skipping")
+        return orig_read(self)
+
+    monkeypatch.setattr(Path, "read_bytes", boom)
+    stats = cr.index()
+    assert stats.files_indexed == 0
+    assert stats.files_skipped == 1
+
+
+def test_index_progress_is_reported(config, capsys):
+    # progress=True narrates the run on stderr so a long index isn't a silent wait.
+    cr = _cr(config)
+    write(config.watched_dir / "a.py", "def alpha():\n    return 1\n")
+    cr.indexer.index(progress=True)
+    err = capsys.readouterr().err
+    assert "Scanning" in err  # discovery phase is announced
+    assert "✓ Indexed" in err  # final summary line
+
+
+def test_index_progress_is_silent_when_off(config, capsys):
+    # progress=False (the default, and what the MCP background index uses) stays quiet.
+    cr = _cr(config)
+    write(config.watched_dir / "a.py", "def alpha():\n    return 1\n")
+    cr.indexer.index(progress=False)
+    err = capsys.readouterr().err
+    assert "Scanning" not in err and "✓ Indexed" not in err
+
+
 def test_index_survives_reopen(config, tmp_path):
     cr = _cr(config)
     write(config.watched_dir / "a.py", "def alpha():\n    return 1\n")
@@ -114,5 +162,4 @@ def test_index_survives_reopen(config, tmp_path):
     cr.close()
 
     cr2 = CodeRAG(config)
-    assert cr2.store.total_chunks() == n
-    assert cr2.vectors.ntotal == n  # FAISS cache reloaded, consistent
+    assert cr2.store.total_chunks() == n  # persisted across reopen
diff --git a/tests/test_install.py b/tests/test_install.py
index de75daf..6b2570d 100644
--- a/tests/test_install.py
+++ b/tests/test_install.py
@@ -123,6 +123,43 @@ def test_wizard_collects_choices(home, monkeypatch):
     assert plans[0].tools == inst.DEFAULT_TOOLS
 
 
+# --- workspace-scope guidance (large trees are supported) -----------------------------
+
+
+def test_default_workspace_prefers_git_root(home):
+    repo = Path.cwd()
+    (repo / ".git").mkdir()
+    deep = repo / "pkg" / "deep"
+    deep.mkdir(parents=True)
+    # Run from a subdirectory: the natural scope is the whole repo, not the subdir.
+    assert inst.default_workspace(deep) == repo.resolve()
+
+
+def test_default_workspace_falls_back_to_start(home):
+    start = Path.cwd() / "loose"
+    start.mkdir()
+    assert inst.default_workspace(start) == start.resolve()
+
+
+def test_is_broad_root_flags_home_and_system(home):
+    assert inst._is_broad_root(Path("/"))  # filesystem root
+    assert inst._is_broad_root(Path("/usr"))
+    assert inst._is_broad_root(Path.home())  # the user's whole home
+    assert not inst._is_broad_root(Path.cwd())  # a normal project dir
+
+
+def test_wizard_describes_large_tree_support(home, monkeypatch, capsys):
+    # Choosing "/" is a legitimate large-tree choice: the wizard sets expectations
+    # (background, takes longer) and flags the /proc footgun, without discouraging it.
+    answers = iter(["1", "/", ""])  # claude, watched=/, (no tools prompt for claude)
+    monkeypatch.setattr("builtins.input", lambda *_: next(answers))
+    inst.run_wizard([], Path.cwd())
+    out = capsys.readouterr().out
+    assert "large tree" in out
+    assert "/proc" in out  # the one genuine footgun for "/"
+    assert "almost always what you want" not in out  # no longer discourages
+
+
 # --- launcher resolution (the venv-activation footgun) --------------------------------
 #
 # An agent launches the server from its own shell, so the command written into its config
diff --git a/tests/test_lance_store.py b/tests/test_lance_store.py
new file mode 100644
index 0000000..1acc45e
--- /dev/null
+++ b/tests/test_lance_store.py
@@ -0,0 +1,143 @@
+"""Tests for the LanceDB store — the single backend (metadata + BM25 + vectors)."""
+
+from __future__ import annotations
+
+from coderag.embeddings.fake_provider import FakeEmbeddingProvider
+from coderag.store.lance_store import LanceStore
+from coderag.types import Chunk
+
+
+def _chunk(text: str, sym: str = "f", kind: str = "function", start: int = 1) -> Chunk:
+    return Chunk(
+        text=text,
+        start_line=start,
+        end_line=start + 1,
+        language="python",
+        symbol=sym,
+        kind=kind,
+    )
+
+
+def _store(tmp_path):
+    prov = FakeEmbeddingProvider()
+    return LanceStore(tmp_path / "store", prov.dim), prov
+
+
+def _add(st, prov, rel, chunks, *, replace=False, chash="h", mtime=1.0, size=10):
+    vecs = prov.embed_documents([c.text for c in chunks])
+    return st.write_file(
+        rel, "python", chash, mtime, size, chunks, vecs, replace=replace
+    )
+
+
+def test_write_stats_lexical_and_hybrid(tmp_path):
+    st, prov = _store(tmp_path)
+    _add(
+        st,
+        prov,
+        "auth.py",
+        [_chunk("def authenticate(token): retry backoff", "authenticate")],
+    )
+    _add(st, prov, "math.py", [_chunk("def add(a, b): return a + b", "add")])
+    st.optimize()
+
+    s = st.stats()
+    assert s.total_files == 2 and s.total_chunks == 2
+    assert st.total_chunks() == 2
+
+    lex = st.lexical_search("authenticate", 5)
+    assert lex
+    top = lex[0][0]
+    assert st.hydrate([top])[top]["path"] == "auth.py"
+
+    hits = st.search("authenticate token", prov, top_k=5)
+    assert hits and {h.path for h in hits} <= {"auth.py", "math.py"}
+    assert all(h.start_line >= 1 and h.text for h in hits)
+
+
+def test_change_detection_metadata(tmp_path):
+    st, prov = _store(tmp_path)
+    _add(st, prov, "a.py", [_chunk("x")], chash="h1", mtime=12.5, size=4096)
+    meta = st.get_file_meta("a.py")
+    assert meta is not None
+    assert meta["content_hash"] == "h1"
+    assert meta["size"] == 4096 and abs(meta["mtime"] - 12.5) < 1e-9
+    assert set(st.all_file_metas()) == {"a.py"}
+    assert st.all_file_paths() == ["a.py"]
+    assert st.get_file_meta("missing.py") is None
+
+
+def test_replace_does_not_duplicate(tmp_path):
+    st, prov = _store(tmp_path)
+    _add(st, prov, "a.py", [_chunk("def alpha(): return 1", "alpha")])
+    assert st.total_chunks() == 1
+    added, removed = _add(
+        st,
+        prov,
+        "a.py",
+        [
+            _chunk("def alpha(): return 100", "alpha"),
+            _chunk("def gamma(): return 3", "gamma"),
+        ],
+        replace=True,
+        chash="h2",
+    )
+    assert added == 2 and removed == 1
+    assert st.total_chunks() == 2
+    rows = st.hydrate(st.chunk_ids_for_path("a.py"))
+    joined = "\n".join(r["text"] for r in rows.values())
+    assert "return 100" in joined
+    assert "return 1\n" not in joined
+
+
+def test_delete_file(tmp_path):
+    st, prov = _store(tmp_path)
+    _add(st, prov, "a.py", [_chunk("a")])
+    _add(st, prov, "b.py", [_chunk("b")])
+    assert st.delete_file("a.py") == 1
+    assert st.all_file_paths() == ["b.py"]
+    assert st.total_chunks() == 1
+
+
+def test_bootstrap_clears_on_model_change(tmp_path):
+    st, prov = _store(tmp_path)
+    assert st.bootstrap(prov.dim, "fake-16") is False
+    _add(st, prov, "a.py", [_chunk("a")])
+    st.optimize()
+    assert st.total_chunks() == 1
+    assert st.bootstrap(prov.dim, "fake-16") is False  # unchanged
+    assert st.total_chunks() == 1
+    assert st.bootstrap(prov.dim, "other-model") is True  # model changed -> cleared
+    assert st.total_chunks() == 0
+
+
+def test_symbol_index_caches_and_invalidates(tmp_path):
+    st, prov = _store(tmp_path)
+    _add(st, prov, "a.py", [_chunk("def compute_tax(): pass", "compute_tax")])
+    idx = st.symbol_index()
+    assert "compute_tax" in idx
+    assert st.symbol_index() is idx  # cached while nothing changed
+    _add(st, prov, "b.py", [_chunk("def brand_new(): pass", "brand_new")])
+    idx2 = st.symbol_index()
+    assert idx2 is not idx and "brand_new" in idx2
+
+
+def test_distinct_and_fts_sanitization(tmp_path):
+    st, prov = _store(tmp_path)
+    _add(st, prov, "a.py", [_chunk("def parse_config(): return 1", "parse_config")])
+    st.optimize()
+    assert st.distinct_languages() == ["python"]
+    assert "function" in st.distinct_kinds()
+    assert st.lexical_search("parse_config", 5)  # plain token
+    assert st.lexical_search("parse_config::*", 5)  # operators sanitized, no raise
+    assert st.lexical_search("", 5) == []  # empty query
+
+
+def test_clear_empties_store(tmp_path):
+    st, prov = _store(tmp_path)
+    _add(st, prov, "a.py", [_chunk("a")])
+    st.optimize()
+    assert st.total_chunks() == 1
+    st.clear()
+    assert st.total_chunks() == 0
+    assert st.all_file_paths() == []
diff --git a/tests/test_mcp.py b/tests/test_mcp.py
index 208728a..912d9ff 100644
--- a/tests/test_mcp.py
+++ b/tests/test_mcp.py
@@ -194,7 +194,7 @@ def test_index_status_reports_totals_and_flag(tmp_path):
     cr, mcp, state, _ = _make(tmp_path, DEMO)
     r = _call(mcp, "index_status", {})
     assert r["total_files"] == 2
-    assert r["total_chunks"] == cr.vectors.ntotal
+    assert r["total_chunks"] == cr.store.total_chunks()
     assert r["indexing"] == "ready"
 
     state.indexing = True
@@ -207,7 +207,7 @@ def test_reindex_picks_up_new_file_and_guards_concurrency(tmp_path):
     write(repo / "extra.py", "def extra():\n    return 1\n")
     r = _call(mcp, "reindex", {})
     assert r["total_files"] == 3
-    assert cr.store.total_chunks() == cr.vectors.ntotal
+    assert cr.store.total_chunks() == cr.store.total_chunks()
 
     state.indexing = True  # a run already in progress -> guarded
     assert "error" in _call(mcp, "reindex", {})
@@ -220,6 +220,16 @@ def test_warm_up_is_safe(tmp_path):
     cr.close()
 
 
+def test_notify_keeps_stdout_clean(capsys):
+    # Lifecycle messages must go to stderr only — stdout is the stdio MCP wire protocol.
+    from coderag.surfaces.mcp_server import _notify
+
+    _notify("indexing started")
+    captured = capsys.readouterr()
+    assert "indexing started" in captured.err
+    assert captured.out == ""
+
+
 # --- all-text (general file-directory) indexing ---
 
 
@@ -273,7 +283,6 @@ def build(workers, sub):
         out = (
             stats.total_chunks,
             cr.store.total_chunks(),
-            cr.vectors.ntotal,
             sorted(cr.store.all_file_paths()),
         )
         cr.close()
@@ -281,9 +290,9 @@ def build(workers, sub):
 
     serial = build(1, "store_serial")
     parallel = build(4, "store_parallel")
+    assert serial[0] == parallel[0]  # stats agree
     assert serial[1] == parallel[1] > 0  # identical chunk count
-    assert serial[1] == serial[2] and parallel[1] == parallel[2]  # store == FAISS
-    assert serial[3] == parallel[3]  # identical file set
+    assert serial[2] == parallel[2]  # identical file set
 
 
 def test_search_is_safe_during_concurrent_indexing(tmp_path):
@@ -308,7 +317,7 @@ def hammer_search():
     t = threading.Thread(target=hammer_search)
     t.start()
     try:
-        # Re-index (FAISS add/remove) while searches (FAISS reads) run concurrently.
+        # Re-index (store writes) while searches (store reads) run concurrently.
         for _ in range(3):
             for i in range(25, 45):
                 write(repo / f"f{i}.py", "def g():\n    return 'more tokens here'\n")
@@ -321,7 +330,5 @@ def hammer_search():
         t.join(timeout=5)
 
     assert not errors, errors
-    assert (
-        cr.store.total_chunks() == cr.vectors.ntotal
-    )  # invariant holds after the race
+    assert cr.store.total_chunks() == 25  # the 20 churned files were pruned
     cr.close()
diff --git a/tests/test_rerank.py b/tests/test_rerank.py
index a2da22b..1cb5880 100644
--- a/tests/test_rerank.py
+++ b/tests/test_rerank.py
@@ -56,7 +56,7 @@ def test_get_reranker_built_when_enabled(config):
 def test_reranker_reorders_and_sets_score(config):
     cr = _indexed(config)
     searcher = HybridSearcher(
-        cr.config, cr.provider, cr.store, cr.vectors, reranker=KeywordReranker()
+        cr.config, cr.provider, cr.store, reranker=KeywordReranker()
     )
     hits = searcher.search("validate session token", top_k=2)
     assert hits
@@ -69,7 +69,7 @@ def test_reranker_reorders_and_sets_score(config):
 def test_rerank_trims_to_top_k(config):
     cr = _indexed(config)
     searcher = HybridSearcher(
-        cr.config, cr.provider, cr.store, cr.vectors, reranker=KeywordReranker()
+        cr.config, cr.provider, cr.store, reranker=KeywordReranker()
     )
     assert len(searcher.search("token", top_k=1)) == 1
 
@@ -77,7 +77,7 @@ def test_rerank_trims_to_top_k(config):
 def test_reranker_empty_query(config):
     cr = _indexed(config)
     searcher = HybridSearcher(
-        cr.config, cr.provider, cr.store, cr.vectors, reranker=KeywordReranker()
+        cr.config, cr.provider, cr.store, reranker=KeywordReranker()
     )
     assert searcher.search("   ", top_k=3) == []
 
diff --git a/tests/test_store.py b/tests/test_store.py
deleted file mode 100644
index 8014a2e..0000000
--- a/tests/test_store.py
+++ /dev/null
@@ -1,161 +0,0 @@
-"""P1 tests: SQLite store + pluggable FAISS vector index."""
-
-from __future__ import annotations
-
-import numpy as np
-
-from coderag.config import Config
-from coderag.store.sqlite_store import SQLiteStore
-from coderag.store.vector_index import FaissVectorIndex
-from coderag.types import Chunk
-
-
-def _store(tmp_path) -> SQLiteStore:
-    store = SQLiteStore(tmp_path / "coderag.db")
-    store.bootstrap(embed_dim=16, embed_model="fake-16")
-    return store
-
-
-def _chunk(text: str, start: int = 1) -> Chunk:
-    return Chunk(
-        text=text,
-        start_line=start,
-        end_line=start + 2,
-        language="python",
-        symbol="f",
-        kind="function",
-    )
-
-
-def test_add_and_hydrate_chunks(tmp_path):
-    store = _store(tmp_path)
-    fid = store.upsert_file("a.py", "python", "hash1", 1.0)
-    vecs = np.ones((2, 16), dtype="float32")
-    ids = store.add_chunks(
-        fid, [_chunk("def f(): pass"), _chunk("x = 1", 5)], vecs, "fake-16"
-    )
-    assert len(ids) == 2
-    rows = store.hydrate(ids)
-    assert rows[ids[0]]["path"] == "a.py"
-    assert rows[ids[0]]["text"] == "def f(): pass"
-
-
-def test_autoincrement_ids_never_reused(tmp_path):
-    store = _store(tmp_path)
-    fid = store.upsert_file("a.py", "python", "h", 1.0)
-    vecs = np.ones((1, 16), dtype="float32")
-    first = store.add_chunks(fid, [_chunk("a")], vecs, "fake-16")
-    store.delete_chunks_for_file(fid)
-    second = store.add_chunks(fid, [_chunk("b")], vecs, "fake-16")
-    assert second[0] > first[0]  # id advanced, not recycled
-
-
-def test_fts_search_finds_token_and_survives_operators(tmp_path):
-    store = _store(tmp_path)
-    fid = store.upsert_file("a.py", "python", "h", 1.0)
-    vecs = np.ones((1, 16), dtype="float32")
-    store.add_chunks(fid, [_chunk("def parse_config(): return 1")], vecs, "fake-16")
-    hits = store.fts_search("parse_config", limit=5)
-    assert len(hits) == 1
-    # Operators in the query must not raise.
-    assert store.fts_search("parse_config::*", limit=5)
-    assert store.fts_search("", limit=5) == []
-
-
-def test_iter_vectors_round_trips(tmp_path):
-    store = _store(tmp_path)
-    fid = store.upsert_file("a.py", "python", "h", 1.0)
-    vecs = np.random.default_rng(0).standard_normal((3, 16)).astype("float32")
-    ids = store.add_chunks(
-        fid, [_chunk("a"), _chunk("b"), _chunk("c")], vecs, "fake-16"
-    )
-    got_ids, got_vecs = next(store.iter_vectors())
-    assert list(got_ids) == ids
-    np.testing.assert_allclose(got_vecs, vecs)
-
-
-def test_model_change_triggers_rebuild_flag(tmp_path):
-    store = SQLiteStore(tmp_path / "coderag.db")
-    assert store.bootstrap(16, "fake-16") is False
-    store.upsert_file("a.py", "python", "h", 1.0)
-    # Re-bootstrap with a different dim/model: should clear and request rebuild.
-    assert store.bootstrap(384, "bge-small") is True
-    assert store.all_file_paths() == []
-
-
-def _vec_index(tmp_path, **cfg) -> tuple:
-    config = Config(store_dir=tmp_path, **cfg)
-    store = _store(tmp_path)
-    idx = FaissVectorIndex.open(config, dim=16)
-    return config, store, idx
-
-
-def test_vector_add_search_remove(tmp_path):
-    _, _, idx = _vec_index(tmp_path)
-    rng = np.random.default_rng(1)
-    vecs = rng.standard_normal((5, 16)).astype("float32")
-    ids = np.array([10, 20, 30, 40, 50], dtype="int64")
-    idx.add(ids, vecs)
-    assert idx.ntotal == 5
-    got_ids, scores = idx.search(vecs[2], k=3)
-    assert got_ids[0] == 30  # closest to itself
-    assert scores[0] > 0.99
-    removed = idx.remove([30])
-    assert removed == 1
-    got_ids, _ = idx.search(vecs[2], k=3)
-    assert 30 not in got_ids
-
-
-def test_rebuild_from_store_and_consistency(tmp_path):
-    config, store, idx = _vec_index(tmp_path)
-    fid = store.upsert_file("a.py", "python", "h", 1.0)
-    vecs = np.random.default_rng(2).standard_normal((4, 16)).astype("float32")
-    store.add_chunks(fid, [_chunk(str(i)) for i in range(4)], vecs, "fake-16")
-    # Index is empty but store has 4 chunks -> ensure_consistent rebuilds.
-    idx.ensure_consistent(store)
-    assert idx.ntotal == 4
-    assert idx.kind == "flat"
-
-
-def test_auto_upgrade_flat_to_ivf(tmp_path):
-    # ivf_threshold tiny so a small corpus crosses it.
-    config, store, idx = _vec_index(tmp_path, ivf_threshold=10)
-    fid = store.upsert_file("a.py", "python", "h", 1.0)
-    n = 30
-    vecs = np.random.default_rng(3).standard_normal((n, 16)).astype("float32")
-    ids = store.add_chunks(fid, [_chunk(str(i)) for i in range(n)], vecs, "fake-16")
-    idx.add(np.array(ids, dtype="int64"), vecs)
-    assert idx.kind == "flat"
-    upgraded = idx.maybe_upgrade(store)
-    assert upgraded is True
-    assert idx.kind == "ivf"
-    assert idx.ntotal == n
-    # IVF still returns the self-match.
-    got_ids, _ = idx.search(vecs[0], k=1)
-    assert got_ids[0] == ids[0]
-
-
-def test_index_persists_across_open(tmp_path):
-    config, store, idx = _vec_index(tmp_path)
-    vecs = np.random.default_rng(4).standard_normal((3, 16)).astype("float32")
-    idx.add(np.array([1, 2, 3], dtype="int64"), vecs)
-    idx.save()
-    reopened = FaissVectorIndex.open(config, dim=16)
-    assert reopened.ntotal == 3
-    assert reopened.kind == "flat"
-
-
-def test_rebuild_ivf_handles_degenerate_corpus(tmp_path):
-    # Forcing IVF over many identical vectors must not raise and must stay searchable
-    # (degenerate training falls back to flat rather than aborting indexing).
-    config, store, idx = _vec_index(tmp_path, index_type="ivf", ivf_threshold=1)
-    fid = store.upsert_file("a.py", "python", "h", 1.0)
-    n = 40
-    vecs = np.ones((n, 16), dtype="float32")  # all identical -> degenerate clustering
-    ids = store.add_chunks(fid, [_chunk(str(i)) for i in range(n)], vecs, "fake-16")
-    idx.rebuild_from_store(store)  # must not raise
-    assert idx.ntotal == n
-    assert idx.kind in ("ivf", "flat")
-    got_ids, _ = idx.search(vecs[0], k=1)
-    assert len(got_ids) == 1
-    assert got_ids[0] in ids
diff --git a/tests/test_surfaces.py b/tests/test_surfaces.py
index 3b8d52b..f0afe9c 100644
--- a/tests/test_surfaces.py
+++ b/tests/test_surfaces.py
@@ -148,12 +148,11 @@ def test_watcher_apply_handles_edit_and_delete(repo_with_code):
     write(new, "def extra():\n    return 1\n")
     _apply(cr, str(new))
     assert cr.store.total_chunks() > n0
-    assert cr.store.total_chunks() == cr.vectors.ntotal
 
     new.unlink()
     _apply(cr, str(new))
     assert "extra.py" not in cr.store.all_file_paths()
-    assert cr.store.total_chunks() == cr.vectors.ntotal
+    assert cr.store.total_chunks() == n0  # back to the pre-edit count
 
 
 def test_watcher_handler_collects_only_code_paths():