Skip to content

[RFC/diskann] Overhaul paged search#1078

Merged
hildebrandmw merged 12 commits into
mainfrom
mhildebr/paged
May 21, 2026
Merged

[RFC/diskann] Overhaul paged search#1078
hildebrandmw merged 12 commits into
mainfrom
mhildebr/paged

Conversation

@hildebrandmw
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw commented May 15, 2026

Paged search has been causing all kinds of issues for our code base and is actively getting in the way of simplifications in #1067 due to interactions with the PagedSearchState. The TLDR of the issue is that PagedSearchState requires types to be 'static and introduces the need to "pause" and "resume" search state in a way that is complex to describe in trait bounds.

Since our code is already async, we can lean into that and use the usual Rust machinery to embed non-'static paged searcher inside an otherwise 'static future. The recommended way to now interact with paged search is via channels.

Rendered RFC

API Migration Guide

Old pattern New pattern
index.start_paged_search(s, ctx, q, l).await index.paged_search(s, ctx, q, l).await
index.next_search_results(ctx, &mut state, k, &mut buf).await search.next_page(k).await
SearchState<Id, (S, C)> PagedSearch<'a, DP, S, T>
PagedSearchState<DP, S, C> PagedSearch<'a, DP, S, T>
Check return count for exhaustion Check page.is_empty()

If existing code embedded the SearchState in some 'static container, that is no longer viable because of the borrow. Instead, channels can be used for this communication:

// Types are illustrative — adapt names to your crate.

type PageResult = ANNResult<Vec<Neighbor<ExternalId>>>;

/// Spawn a paged search session. The index is held by Arc so the task is 'static.
///
/// Returns a request channel and a result channel. The caller sends the desired
/// page size (`k`) and awaits the corresponding result on the other end.
fn spawn_paged_session(
    index: Arc<DiskANNIndex<DP>>,
    context: Arc<DP::Context>,
    query: T,
    l: usize,
) -> (mpsc::Sender<usize>, mpsc::Receiver<PageResult>) {
    let (req_tx, mut req_rx) = mpsc::channel::<usize>(1);
    let (res_tx, res_rx) = mpsc::channel::<PageResult>(1);

    tokio::spawn(async move {
        // Borrow from the Arc — these references are scoped to the task.
        let mut search = index.paged_search(strategy, &*context, query, l).await.unwrap();

        while let Some(k) = req_rx.recv().await {
            let page = search.next_page(k).await;
            if res_tx.send(page).await.is_err() {
                break; // caller dropped the result receiver
            }
        }
        // Request channel closed -> caller dropped sender -> clean shutdown.
    });

    (req_tx, res_rx)
}

If code was already explicitly using a .await loop with SearchState, then minimal changes should be needed.

For Users of Paged Search via wrapped_async

Users of paged search via wrapped_async::DiskANNIndex that know their inner futures will never suspend can use the new wrapped_async::DiskANNIndex::paged_search_no_await. This will use the new API transparently via wrapped_async::noawait::PagedSearch and efficiently run paged searches with minimal synchronization overhead.

This should only be used if the implementation of Accessor, BuildQueryComputer, SearchExt, DataProvider, and ExpandBeam are known to never yield and always complete with Poll::Ready.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 15, 2026

Codecov Report

❌ Patch coverage is 81.69935% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.57%. Comparing base (5443ca0) to head (8598804).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
diskann-providers/src/index/wrapped_async.rs 75.42% 43 Missing ⚠️
diskann/src/graph/search/paged.rs 86.36% 12 Missing ⚠️
diskann-providers/src/index/diskann_async.rs 80.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1078      +/-   ##
==========================================
+ Coverage   89.46%   90.57%   +1.10%     
==========================================
  Files         473      474       +1     
  Lines       89653    89740      +87     
==========================================
+ Hits        80212    81278    +1066     
+ Misses       9441     8462     -979     
Flag Coverage Δ
miri 90.57% <81.69%> (+1.10%) ⬆️
unittests 90.53% <81.69%> (+1.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann/src/graph/index.rs 96.15% <100.00%> (+0.84%) ⬆️
diskann/src/graph/search/scratch.rs 98.21% <ø> (ø)
diskann/src/graph/test/cases/paged_search.rs 95.09% <100.00%> (-0.85%) ⬇️
diskann/src/provider.rs 95.14% <ø> (ø)
diskann-providers/src/index/diskann_async.rs 95.98% <80.00%> (-0.02%) ⬇️
diskann/src/graph/search/paged.rs 86.36% <86.36%> (ø)
diskann-providers/src/index/wrapped_async.rs 59.87% <75.42%> (+13.99%) ⬆️

... and 45 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hildebrandmw hildebrandmw changed the title [diskann] Overhaul paged search [RFC/diskann] Overhaul paged search May 18, 2026
@hildebrandmw hildebrandmw marked this pull request as ready for review May 18, 2026 19:35
@hildebrandmw hildebrandmw requested review from a team and Copilot May 18, 2026 19:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR overhauls DiskANN’s paged (iterative) search API to remove the SearchState<..., ExtraState: 'static> pattern and instead return a lifetime-bound PagedSearch<'a, ...> handle, enabling non-'static query computers/strategies and reducing trait-bound complexity. It also updates downstream wrappers/tests and adds an RFC documenting a channel-based pattern for crossing tokio::spawn/FFI boundaries.

Changes:

  • Remove the 'static bound from BuildQueryComputer::QueryComputer.
  • Replace the start_paged_search/next_search_results API with DiskANNIndex::paged_search{_with_init_ids} returning a PagedSearch handle with next_page.
  • Update diskann-providers sync wrapper + test cases, and add an RFC describing the new model and migration guidance.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
rfcs/01078-paged-search.md RFC describing the motivation, new API shape, and a channel-based spawned-task usage pattern.
diskann/src/provider.rs Drops 'static from BuildQueryComputer::QueryComputer to allow borrowed query computers.
diskann/src/graph/test/cases/paged_search.rs Updates paged-search tests to use PagedSearch::next_page.
diskann/src/graph/search/scratch.rs Gates SearchScratch::search_l() behind #[cfg(test)].
diskann/src/graph/search/paged.rs Introduces the new PagedSearch handle implementation and paging logic.
diskann/src/graph/search/mod.rs Wires the new paged module and re-exports PagedSearch.
diskann/src/graph/index.rs Removes old SearchState/paged-search API and adds paged_search{_with_init_ids} constructors returning PagedSearch.
diskann-providers/src/index/wrapped_async.rs Updates synchronous wrapper to return a blocking PagedSearch wrapper around the async handle.
diskann-providers/src/index/diskann_async.rs Updates async provider tests/helpers to use PagedSearch::next_page.
Comments suppressed due to low confidence (2)

diskann-providers/src/index/wrapped_async.rs:356

  • These synchronous wrapper methods still require S: SearchStrategy<DP, T> + 'static, but the underlying DiskANNIndex::paged_search no longer needs 'static. Keeping this bound unnecessarily restricts callers from using non-'static strategies (the main goal of this RFC). Consider dropping the + 'static bound here as well.
    pub fn paged_search<'a, S, T>(
        &'a self,
        strategy: S,
        context: &'a DP::Context,
        query: T,
        l_value: usize,
    ) -> ANNResult<PagedSearch<'a, DP, S, T>>
    where
        S: SearchStrategy<DP, T> + 'static,
        T: Copy + Send + 'a,
    {

diskann/src/graph/index.rs:2211

  • computed_result is initialized with vec![Neighbor::default(); l_value] and next_result_index is set to l_value to represent an empty cache. Since PagedSearch::next_page now returns an owned Vec, you can avoid the O(l_value) initialization cost by using Vec::with_capacity(l_value) (or Vec::new()) and starting next_result_index at 0.
            ANNResult::Ok(PagedSearch {
                index: self,
                context,
                scratch,
                computed_result: vec![Neighbor::default(); l_value],
                next_result_index: l_value,
                search_param_l: l_value,
                strategy,
                computer,
                _query: std::marker::PhantomData,
            })

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-providers/src/index/wrapped_async.rs Outdated
Comment thread diskann/src/graph/test/cases/paged_search.rs Outdated
Comment thread diskann/src/graph/search/paged.rs Outdated
Comment thread diskann/src/graph/index.rs
Comment thread diskann/src/graph/search/paged.rs Outdated
Comment thread diskann/src/graph/search/paged.rs
@hildebrandmw hildebrandmw enabled auto-merge (squash) May 21, 2026 18:05
@hildebrandmw hildebrandmw merged commit c667a3c into main May 21, 2026
23 of 24 checks passed
@hildebrandmw hildebrandmw deleted the mhildebr/paged branch May 21, 2026 18:08
arkrishn94 added a commit that referenced this pull request May 28, 2026
# DiskANN v0.53.0 Release Notes

## Breaking Changes

An AI generated, human reviewed list of changes is summarized below.

### Paged search overhauled — channel-based API
([#1078](#1078))

`PagedSearchState` and its `'static`-bound pause/resume model have been
replaced with an async, channel-based interface. The recommended way to
drive paged search is now via a `tokio::sync::mpsc` channel, with the
searcher embedded in an otherwise-`'static` future. See the [rendered
RFC](https://github.com/microsoft/DiskANN/blob/main/rfcs/01078-paged-search.md)
for the new shape. Callers wired against `PagedSearchState` must migrate
to the channel API.

Users of paged search via `wrapped_async::DiskANNIndex` that know their
inner futures will never suspend can use the new
`wrapped_async::DiskANNIndex::paged_search_no_await`; this will
efficiently run paged searches with minimal synchronization overhead.

### `DiskANNIndex::flat_search` removed
([#1076](#1076))

`DiskANNIndex::flat_search` and the `IdIterator` trait have been removed
from the `diskann` crate. Equivalent functionality lives on the new
inherent method `DiskIndexSearcher::flat_search` in `diskann-disk`. This
unblocks the experimental directions in #1067 and #983.

```rust
// Before
diskann_index.flat_search(query, ...)?;

// After
disk_index_searcher.flat_search(query, ...).await?;
```

### `DiskIndexSearcher::flat_search` now batched
([#1097](#1097))

The new `DiskIndexSearcher::flat_search` uses the bulk `pq_distances`
path instead of one-vector-at-a-time `Accessor::build_query_computer` +
`evaluate_similarity`. Downstream behavior is equivalent but tighter
resource bounds apply.

### `centroid` removed from PQ interfaces
([#1010](#1010))

The dataset-centroid argument has been removed from `FixedChunkPQTable`
construction, `populate`, and most other PQ APIs. The shift only ever
worked for L2 distance and was silently ignored for inner-product /
cosine, so passing it was a footgun. When an L2 shift is required, fold
it into the PQ pivots instead (the library now does this internally).

```rust
// Before
let table = FixedChunkPQTable::new(.., centroid, ..);

// After — drop the centroid argument
let table = FixedChunkPQTable::new(.., ..);
```

### Flat search interface
([#983](#983))

A new `flat` module in `diskann` adds a provider-agnostic brute-force
search surface, mirroring the shape of graph search. Backends implement
a single trait, `DistancesUnordered<C>` (in `flat/strategy.rs`), which
fuses iteration and distance computation, allowing any backend
(in-memory, quantized, disk, remote) to plug into a shared algorithm.
See the [rendered
RFC](https://github.com/microsoft/DiskANN/blob/main/rfcs/00983-flat-search.md).
This is additive but is the new canonical surface — direct ad-hoc
flat-search call sites should migrate.

### `bf_tree` extracted into `diskann-bftree` crate
([#1020](#1020))

The bf_tree provider has been moved out of `diskann-providers`
(previously at
`diskann-providers/src/model/graph/provider/async_/bf_tree/`) into a new
standalone `diskann-bftree` crate. Along with the move:

- Switched from PQ to spherical quantization.
- Dropped dependencies on `DeletionCheck`, `AsDeletionCheck`, and
`RemoveDeletedIdsAndCopy`.
- Simplified generics.

Consumers must update their `Cargo.toml` to depend on `diskann-bftree`
and update import paths.

### `direct_distance_impl` and `inner_product_raw` re-exposed
([#1081](#1081))

`direct_distance_impl` (free function) and
`FixedChunkPQTable::inner_product_raw` are `pub` again after being
privatized in #1044. Restored to unblock a downstream user. Not breaking
in the typical direction — this restores previously available API
surface.

### MinMax `recompress` takes a grid-scale parameter
([#1109](#1109))

The MinMax `recompress` API now accepts a grid-scale parameter. 

## New Features

- SIMD-optimized L2-squared norm
([#1107](#1107))
- Significantly faster bitmap computation
([#1099](#1099))
- Large speedup on the bitmap construction path used by filtered search.
- LLVM IR bloat regression check in CI
([#1083](#1083))
- CI now flags regressions in generated LLVM IR size, helping catch
unintended monomorphization blow-ups.
- Recall computation fix for under-k groundtruth
([#1069](#1069))

## Merged PRs

* Revise README for DiskANN3 by @harsha-simhadri in
#1046
* [CI] Try to fix publishing step by @hildebrandmw in
#1057
* [benchmark] Remove `DispatchRule` by @hildebrandmw in
#1064
* [benchmark] Automatic Input Registration by @hildebrandmw in
#1066
* Remove centroid from most PQ interfaces by @hildebrandmw in
#1010
* [diskann/disk] Remove `flat_search` from `DiskANNIndex` by
@hildebrandmw in #1076
* macos build and miri check to nightly by @harsha-simhadri in
#1058
* [API] Make some methods public again by @hildebrandmw in
#1081
* [benchmark] Simply `Inputs` more by @hildebrandmw in
#1077
* Turn on stack protection for the diskann-garnet NuGet build by
@jackmoffitt in #1082
* Fix options for diskann-garnet nuget pipeline by @jackmoffitt in
#1091
* [CI] add LLVM IR bloat regression check by @arazumov in
#1083
* Bump openssl from 0.10.79 to 0.10.80 by @dependabot[bot] in
#1093
* [Disk CI benchmarks] Use 1ES.Pool=diskann-github by @arazumov in
#869
* Fix recall computation for fewer than k groundtruth results by
@magdalendobson in #1069
* bf_tree migration away from diskann-providers by @JordanMaples in
#1020
* [RFC/diskann] Overhaul paged search by @hildebrandmw in
#1078
* Remove unsafe code from compute_vec_l2sq by @arazumov in
#1094
* Remove direct accessor call in `diskann-garnet` by @hildebrandmw in
#1098
* Refactor `DiskIndexSearcher::flat_search` to use batching by
@hildebrandmw in #1097
* [flat index] Flat Search Interface by @arkrishn94 in
#983
* migrating multi-hop tests from diskann-providers to diskann by
@JordanMaples in #928
* Significantly speed up bitmap computation by @magdalendobson in
#1099
* `compute_vecs_l2sq`: Replace scalar L2 Squared norm with
SIMD-optimized FastL2NormSquared by @arazumov in
#1107
* [minmax] Add grid scaling to recompress API by @arkrishn94 in
#1109

**Full Changelog**:
v0.52.0...v0.53.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants