Simplify the DataProvider contract for graph search#1067
Draft
hildebrandmw wants to merge 6 commits into
Draft
Conversation
Contributor
|
get_element is basically unused in garnet, except for implementing VEMB (return an vector) and other debugging-style functions. This sounds like a very promising direction, and I am very supportive. |
DataProvider contract for graph search
This was referenced May 15, 2026
hildebrandmw
added a commit
that referenced
this pull request
May 16, 2026
Remove `flat_search` from `DiskANNIndex` and the `IdIterator` trait from `diskann`. Since the only caller was from `diskann-disk`, add a new `flat_search` inherent method to `DiskIndexSearcher`. The flat search method is not compatible with the experimental direction in #1067 and with #983 on the horizon, this is safe to move for now.
hildebrandmw
added a commit
that referenced
this pull request
May 21, 2026
Paged search has been causing all kinds of issues for our code base and is actively getting in the way of simplifications in #1067 due to interactions with the `PagedSearchState`. The TLDR of the issue is that `PagedSearchState` requires types to be `'static` and introduces the need to "pause" and "resume" search state in a way that is complex to describe in trait bounds. Since our code is already async, we can lean into that and use the usual Rust machinery to embed non-`'static` paged searcher inside an otherwise `'static` future. The recommended way to now interact with paged search is via [channels](https://docs.rs/tokio/latest/tokio/sync/mpsc/fn.channel.html). [Rendered RFC](https://github.com/microsoft/DiskANN/blob/b63d009893f362e07ace518314f7c85733cba212/rfcs/01078-paged-search.md) ## API Migration Guide | Old pattern | New pattern | |---|---| | `index.start_paged_search(s, ctx, q, l).await` | `index.paged_search(s, ctx, q, l).await` | | `index.next_search_results(ctx, &mut state, k, &mut buf).await` | `search.next_page(k).await` | | `SearchState<Id, (S, C)>` | `PagedSearch<'a, DP, S, T>` | | `PagedSearchState<DP, S, C>` | `PagedSearch<'a, DP, S, T>` | | Check return count for exhaustion | Check `page.is_empty()` | If existing code embedded the `SearchState` in some `'static` container, that is no longer viable because of the borrow. Instead, channels can be used for this communication: ```rust // Types are illustrative — adapt names to your crate. type PageResult = ANNResult<Vec<Neighbor<ExternalId>>>; /// Spawn a paged search session. The index is held by Arc so the task is 'static. /// /// Returns a request channel and a result channel. The caller sends the desired /// page size (`k`) and awaits the corresponding result on the other end. fn spawn_paged_session( index: Arc<DiskANNIndex<DP>>, context: Arc<DP::Context>, query: T, l: usize, ) -> (mpsc::Sender<usize>, mpsc::Receiver<PageResult>) { let (req_tx, mut req_rx) = mpsc::channel::<usize>(1); let (res_tx, res_rx) = mpsc::channel::<PageResult>(1); tokio::spawn(async move { // Borrow from the Arc — these references are scoped to the task. let mut search = index.paged_search(strategy, &*context, query, l).await.unwrap(); while let Some(k) = req_rx.recv().await { let page = search.next_page(k).await; if res_tx.send(page).await.is_err() { break; // caller dropped the result receiver } } // Request channel closed -> caller dropped sender -> clean shutdown. }); (req_tx, res_rx) } ``` If code was already explicitly using a `.await` loop with `SearchState`, then minimal changes should be needed. ## For Users of Paged Search via `wrapped_async` Users of paged search via `wrapped_async::DiskANNIndex` that know their inner futures will never suspend can use the new `wrapped_async::DiskANNIndex::paged_search_no_await`. This will use the new API transparently via `wrapped_async::noawait::PagedSearch` and efficiently run paged searches with minimal synchronization overhead. This should **only** be used if the implementation of `Accessor`, `BuildQueryComputer`, `SearchExt`, `DataProvider`, and `ExpandBeam` are known to never yield and always complete with `Poll::Ready`. --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
hildebrandmw
added a commit
that referenced
this pull request
May 21, 2026
In preparation for #1067, switch the implementation of `DynIndex::search_element` from using the `Accessor` trait to an inherent method on the underlying provider.
hildebrandmw
added a commit
that referenced
this pull request
May 21, 2026
Refactor `DiskIndexSearcher::flat_search` to use the bulk `pq_distances` method instead of one-by-one computation. This does two things: 1. The bulk method is presumably a little more efficient. 2. This moves the implementation away from `Accessor`/`BuildQueryComputer` to help with #1067. The small tricky bit is deriving the maximum batch size from the size of scratch. To that end, I added a private accessor for `PQScratch` to return this bound. --------- Co-authored-by: Mark Hildebrand <mhildebrand@microsoft.com>
arkrishn94
added a commit
that referenced
this pull request
May 22, 2026
This PR introduces a trait interface and a light index to support brute-force search for providers that can be used as/are a flat-index. There is an associated RFC that walks through the interface and associated implementation in `diskann` as a new `flat` module. Rendered RFC [link](https://github.com/microsoft/DiskANN/blob/u/adkrishnan/flat-index/rfcs/00983-flat-search.md). ## Motivation The repo has no first-class surface for brute-force search. This PR adds a small trait hierarchy that gives flat search the same provider-agnostic shape that graph search has, so any backend (in-memory, quantized, disk, remote) can plug in once and reuse a shared algorithm. ## Traits (`flat/strategy.rs`) **`DistancesUnordered<C>`** — the single trait a backend must implement. Fuses iteration and scoring into one method: the implementation drives a full scan, scoring each element with a precomputed query computer `C`, and invokes a callback with `(id, distance)` pairs. Key associated types: - `ElementRef<'a>` -- the reference shape `C` scores against. - `Id` -- the id type yielded to the callback (decoupled from `HasId` so visitors can yield any id shape). - `C : for<'a> PreprocessedDistanceFunction<Self::ElementRef<'a>, f32>` -- the precomputer query computer. ```rust pub trait DistancesUnordered<C>: Send + Sync where C: for<'a> PreprocessedDistanceFunction<Self::ElementRef<'a>, f32>, { type ElementRef<'a>; type Id; type Error: ToRanked + Debug + Send + Sync + 'static; fn distances_unordered<F>( &mut self, computer: &C, f: F, ) -> impl SendFuture<Result<(), Self::Error>> where F: Send + FnMut(Self::Id, f32); } ``` **`SearchStrategy<P, T>`** — factory that creates a `DistancesUnordered` visitor from a provider + context, and builds the per-query computer. Mirrors the graph-side strategy pattern. Two fallible methods: - `create_visitor` — borrows provider + context, returns a `Visitor` - `build_query_computer` — preprocesses the query `T` into a `QueryComputer` ```rust pub trait SearchStrategy<P, T>: Send + Sync where P: DataProvider, { type ElementRef<'a>; type Id; type QueryComputer: for<'a> PreprocessedDistanceFunction<Self::ElementRef<'a>, f32>; type QueryComputerError: StandardError; type Visitor<'a>: for<'b> DistancesUnordered< Self::QueryComputer, ElementRef<'b> = Self::ElementRef<'b>, Id = Self::Id, >; type Error: StandardError; fn create_visitor<'a>(&'a self, provider: &'a P, context: &'a P::Context) -> Result<Self::Visitor<'a>, Self::Error>; fn build_query_computer(&self, query: T) -> Result<Self::QueryComputer, Self::QueryComputerError>; } ``` ## Index (`flat/index.rs`) **`FlatIndex<P>`** — thin `'static` wrapper around a `DataProvider`. Currently we have implemented the naive kNN search algorithm for the flat index. `knn_search` asks the strategy for a visitor, builds the query computer, drives `distances_unordered` through a priority queue, and writes results via `SearchPostProcess`. ## Test infrastructure (`flat/test/`) A self-contained test provider with dimension-validated `Strategy`, transient-error injection, and a `KnnOracleRun` harness that compares `knn_search` results against a brute-force reference with baseline caching for regression detection. ## Future work - **Post-processing** — `knn_search` currently uses the graph-side `SearchPostProcess` trait to write results into the output buffer. #1067 will introduce a flat-specific post-processing step that decouples flat search from the graph module's output machinery. - **Vector ids** - The vector id over which `DistancesUnordered` acts is an associated type, instead of the `InternalId` of the provider. This is due to overly restrictive trait bounds on `VectorId` trait. We plan on relaxing this allowing for more generic id types. --------- Co-authored-by: Alex Razumov (from Dev Box) <alrazu@microsoft.com>
56122a5 to
8f7d80a
Compare
added 6 commits
May 27, 2026 15:15
Life the lifetime into SearchStrategy and InsertStrategy, removing it from the `SearchAccessor` GAT. This also requires changing the signature of `DiskANNIndex::insert` to accept the `InsertStrategy` by reference to avoid lifetime issues.
The reload tests in `index_storage.rs` currently rely on Accessor to function. With the pending removal of this trait, this PR instead implements comparison using inherent methods of the underlying structs instead.
Remove the blanket implementation of `Fill<workingset::Map<_>>` for accessors, instead requiring users to implement `Fill` themselves. To make the process easier, a new `Map::fill` method is added with a much simpler implementation than the old async method and documentation included to point users toward this method. Includes three more related changes: * Fill implementation for `diskann-bftree` that reuse allocations upon access to deleted IDs. This is an improvement over the provided implementation which would first copy, and then allocate (and copy again). Instead, we avoid that first copy. * Simplify the `Fill` implementation for `diskann-garnet` now that we no longer have a blanket implementation to dodge. * Remove the `Flaky` tests from `diskann-providers`. Since we're moving to a model with coarser grained traits, the need to test the fine-grained error handling of the various `diskann` provided implementations is significantly reduced.
This is an awkward stepping-stone in preparation for simplifying our traits. It removes the `BuildQueryComputer` trait, instead requiring Accessors to handle distance computations themselves. The contract for Accessor is updated to: * `get_distance(&mut self, id) -> Result<f32, Error>` * `distances_unordered(...). In response to this, accessors used for pruning need to be distinct from the accessors used for search, which is where most of the code churn comes from. There are a few upsides here: * The test provider becomes a little more efficient as it no longer needs to copy out data into a temporary buffer. * Inline beta-filtering can now be done without cloning out the RoaringTreeMap, removing a performance footgun. Overall, though, this makes the code a bit worse, but is in preparation for future cleanup.
This drops `Accessor`, `SearchExt`, and `ExpandBeam` in favor of a new `SearchAccessor` trait. This trait encompases what `SearchExt` and `ExpandBeam` used to do with the main function being `expand_beam` pretty much unaltered as before. High level changes per crate: * `diskann-providers`: Consolidate to the new `SearchAccessor` which is almost entirely just shuffling around existing implementations. The tests for `betafilter` have been rewritten to test the observable behavior of `expand_beam` directly. * `diskann-disk`: Consolidate into `SearchAccessor` and strip out now unused functions and structs. * `diskann-bftree`: More consolidation. * `diskann-label-filter`: Simplify the implementation now that we only need to focus on `expand_beam`.
d4f04f4 to
d832ede
Compare
Merged
arkrishn94
added a commit
that referenced
this pull request
May 28, 2026
# DiskANN v0.53.0 Release Notes ## Breaking Changes An AI generated, human reviewed list of changes is summarized below. ### Paged search overhauled — channel-based API ([#1078](#1078)) `PagedSearchState` and its `'static`-bound pause/resume model have been replaced with an async, channel-based interface. The recommended way to drive paged search is now via a `tokio::sync::mpsc` channel, with the searcher embedded in an otherwise-`'static` future. See the [rendered RFC](https://github.com/microsoft/DiskANN/blob/main/rfcs/01078-paged-search.md) for the new shape. Callers wired against `PagedSearchState` must migrate to the channel API. Users of paged search via `wrapped_async::DiskANNIndex` that know their inner futures will never suspend can use the new `wrapped_async::DiskANNIndex::paged_search_no_await`; this will efficiently run paged searches with minimal synchronization overhead. ### `DiskANNIndex::flat_search` removed ([#1076](#1076)) `DiskANNIndex::flat_search` and the `IdIterator` trait have been removed from the `diskann` crate. Equivalent functionality lives on the new inherent method `DiskIndexSearcher::flat_search` in `diskann-disk`. This unblocks the experimental directions in #1067 and #983. ```rust // Before diskann_index.flat_search(query, ...)?; // After disk_index_searcher.flat_search(query, ...).await?; ``` ### `DiskIndexSearcher::flat_search` now batched ([#1097](#1097)) The new `DiskIndexSearcher::flat_search` uses the bulk `pq_distances` path instead of one-vector-at-a-time `Accessor::build_query_computer` + `evaluate_similarity`. Downstream behavior is equivalent but tighter resource bounds apply. ### `centroid` removed from PQ interfaces ([#1010](#1010)) The dataset-centroid argument has been removed from `FixedChunkPQTable` construction, `populate`, and most other PQ APIs. The shift only ever worked for L2 distance and was silently ignored for inner-product / cosine, so passing it was a footgun. When an L2 shift is required, fold it into the PQ pivots instead (the library now does this internally). ```rust // Before let table = FixedChunkPQTable::new(.., centroid, ..); // After — drop the centroid argument let table = FixedChunkPQTable::new(.., ..); ``` ### Flat search interface ([#983](#983)) A new `flat` module in `diskann` adds a provider-agnostic brute-force search surface, mirroring the shape of graph search. Backends implement a single trait, `DistancesUnordered<C>` (in `flat/strategy.rs`), which fuses iteration and distance computation, allowing any backend (in-memory, quantized, disk, remote) to plug into a shared algorithm. See the [rendered RFC](https://github.com/microsoft/DiskANN/blob/main/rfcs/00983-flat-search.md). This is additive but is the new canonical surface — direct ad-hoc flat-search call sites should migrate. ### `bf_tree` extracted into `diskann-bftree` crate ([#1020](#1020)) The bf_tree provider has been moved out of `diskann-providers` (previously at `diskann-providers/src/model/graph/provider/async_/bf_tree/`) into a new standalone `diskann-bftree` crate. Along with the move: - Switched from PQ to spherical quantization. - Dropped dependencies on `DeletionCheck`, `AsDeletionCheck`, and `RemoveDeletedIdsAndCopy`. - Simplified generics. Consumers must update their `Cargo.toml` to depend on `diskann-bftree` and update import paths. ### `direct_distance_impl` and `inner_product_raw` re-exposed ([#1081](#1081)) `direct_distance_impl` (free function) and `FixedChunkPQTable::inner_product_raw` are `pub` again after being privatized in #1044. Restored to unblock a downstream user. Not breaking in the typical direction — this restores previously available API surface. ### MinMax `recompress` takes a grid-scale parameter ([#1109](#1109)) The MinMax `recompress` API now accepts a grid-scale parameter. ## New Features - SIMD-optimized L2-squared norm ([#1107](#1107)) - Significantly faster bitmap computation ([#1099](#1099)) - Large speedup on the bitmap construction path used by filtered search. - LLVM IR bloat regression check in CI ([#1083](#1083)) - CI now flags regressions in generated LLVM IR size, helping catch unintended monomorphization blow-ups. - Recall computation fix for under-k groundtruth ([#1069](#1069)) ## Merged PRs * Revise README for DiskANN3 by @harsha-simhadri in #1046 * [CI] Try to fix publishing step by @hildebrandmw in #1057 * [benchmark] Remove `DispatchRule` by @hildebrandmw in #1064 * [benchmark] Automatic Input Registration by @hildebrandmw in #1066 * Remove centroid from most PQ interfaces by @hildebrandmw in #1010 * [diskann/disk] Remove `flat_search` from `DiskANNIndex` by @hildebrandmw in #1076 * macos build and miri check to nightly by @harsha-simhadri in #1058 * [API] Make some methods public again by @hildebrandmw in #1081 * [benchmark] Simply `Inputs` more by @hildebrandmw in #1077 * Turn on stack protection for the diskann-garnet NuGet build by @jackmoffitt in #1082 * Fix options for diskann-garnet nuget pipeline by @jackmoffitt in #1091 * [CI] add LLVM IR bloat regression check by @arazumov in #1083 * Bump openssl from 0.10.79 to 0.10.80 by @dependabot[bot] in #1093 * [Disk CI benchmarks] Use 1ES.Pool=diskann-github by @arazumov in #869 * Fix recall computation for fewer than k groundtruth results by @magdalendobson in #1069 * bf_tree migration away from diskann-providers by @JordanMaples in #1020 * [RFC/diskann] Overhaul paged search by @hildebrandmw in #1078 * Remove unsafe code from compute_vec_l2sq by @arazumov in #1094 * Remove direct accessor call in `diskann-garnet` by @hildebrandmw in #1098 * Refactor `DiskIndexSearcher::flat_search` to use batching by @hildebrandmw in #1097 * [flat index] Flat Search Interface by @arkrishn94 in #983 * migrating multi-hop tests from diskann-providers to diskann by @JordanMaples in #928 * Significantly speed up bitmap computation by @magdalendobson in #1099 * `compute_vecs_l2sq`: Replace scalar L2 Squared norm with SIMD-optimized FastL2NormSquared by @arazumov in #1107 * [minmax] Add grid scaling to recompress API by @arkrishn94 in #1109 **Full Changelog**: v0.52.0...v0.53.0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Executive Summary: What happens if we start simplifying our trait hierarchy and rely on
ExpandBeamto do the heavy lifting?This is controversial and I have no intention (yet) of turning this into a full PR, but it demonstrates the feasibility of something I've come to realize recently. The whole
Accessor->BuildQueryComputer->ExpandBeamoriginated from earlier patterns in the library of building up indexing algorithms from tiny pieces, enabling a user to implement a small set of methods and get the whole indexing algorithm for free.This breaks down, though, when we consider performance: the more information we provide a backend database about the operation we want to perform, the better the database can optimize. This lead to
on_elements_unordered(bulk element retrieval, allowing for example prefetching),distances_unordered(bulk retrieval with a concrete distance computations), and finallyExpandBeam. Of these, the latter has proved by far the most useful, being used bydiskann-diskanddiskann-garnetto specialize the algorithm to their preferred data access pattern.I've been exploring how to make filtered search more of a first-class citizen to allow us to batch together predicate queries and potentially fuse predicate evaluation with data access. I was halfway through writing a parallel Accessor/Computer/ExpandBeam/Strategy hierarchy for predicate aware search (and struggling with expressing the more complicated type relationships required), when I realized that all we really needed were a few slightly different flavors of
ExpandBeam. Instead of building a complex nest of interrelated traits to express these subtly differentExpandBeammethods, why not just require users to implement the necessaryExpandBeam? The required semantics are not that complicated and it gives the implementor full control of the data access.This PR is an experiment in removing all but the
ElementRefGAT of theAccessortrait and instead makingExpandBeamthe most relevant method. There's more simplification that can be done: fusingSearchExtwithExpandBeam, getting rid of the peskyAsNeighborbound fromExpandBeam, simplifyingBuildQueryComputerto maybe not even need thePreprocessedDistanceFunctionbound. However, this is already a large reduction in code and has a couple extra benefits:ExpandBeamimplementations for inmem are actually more efficient than the previous provided implementations because we can more aggressively evaluate the visited predicate (and merge the neighbor buffer in the prefetchid_bufferto save an allocation on eachExpandBeamcall. This is not a significant increase in code because we simply move the oldon_elements_unordered.diskanntesting the provided implementations that are now no longer needed.ExpandBeamwithSearchExt, then all the relevant bits for search are grouped in a single place instead of spread out for easier discovery.diskann, with potentially more simplification if we continue down this path.ExpandBeamis perhaps a little more prone to error. This is a distinct negative, but if everything is collected into a single place, it's easier to audit.Now for some tradeoffs:
flat_searchfunction. I kind of view this as a bonus: the existing flat search relies on an ID iterator (not easy for most backends) and uses an inefficient loop. The call indiskann-diskwould likely be much more efficient if it was just done directly on the PQ dataset.diskann-label-filter. Coarsening the granularity of data access methods means the wrapping done by the label provider is less effective. Again, though, this is probably a feature: since real implementations are moving towardsExpandBeamfor performance, the label provider may not be a good approach anyways. We certainly had this experience in Remove the Caching Provider #1052.Accessor's get-element. This can be grouped into the largerdiskann-providerscleanup.Fillfordiskann::graph::workingset::Mapis no longer applicable. Users have to write their own relatively easyFillmethod. I don't think this is as big of a problem as it sounds because I don't think any serious implementation uses the provided implementation: it's too slow.Why is this relevant now? Aside from the filtered search work, #983 is working towards adding flat search and introduces an even finer granularity of traits in the hierarchy, when my gut tells me we should be moving the other direction towards slightly coarser traits. Also, if we have an opportunity to simplify our codebase, I think we should take it.
Why is this possible now? #859 moved one of the last big
get_elementrelated operations behind a bulk operation. Surprisingly, that removed all calls to the rawget_elementfrom insert or inplace delete.