Skip to content

fix(storage): verify ClientPut receiver against over-query window#142

Open
jacderida wants to merge 1 commit into
WithAutonomi:rc-2026.6.2from
jacderida:fix/receiver-membership-hybrid
Open

fix(storage): verify ClientPut receiver against over-query window#142
jacderida wants to merge 1 commit into
WithAutonomi:rc-2026.6.2from
jacderida:fix/receiver-membership-hybrid

Conversation

@jacderida

Copy link
Copy Markdown
Collaborator

Problem

After #140 (XOR-only verification) and #141 (issuer-closeness over-query window + network fallback), upload success improved dramatically — large files 100%, small files ~92–98%. The residual few-percent failures are now dominated by the receiver storage-responsibility check (AntProtocol::validate_store_membership):

ClientPut receiver <peer> is not among this node's local 9 closest peers (close group plus storage margin)

They surface most visibly as Failed to store public DataMap — the public DataMap is a single critical chunk per upload, so one receiver-check rejection of it fails the whole upload regardless of file size.

Cause

This is the same divergence #141 fixed for the issuer check, on the receiver side. The uploader queries 2 * CLOSE_GROUP_SIZE peers and PUTs each chunk to the CLOSE_GROUP_SIZE closest successful responders (ant-client get_store_quotes). When closer peers are slow or NAT-stuck, the storer it legitimately PUT to sits at XOR positions up to 2 * close_group_size. But the receiver check verified only the bare close_group_size + storage margin (9) of the node's local routing table, with exact self-membership — so it rejected honest PUTs.

#141 only touched the issuer check (PaymentVerifier); the receiver check still had the strict local-only / width-9 / exact-membership logic.

Fix (symmetric with #141)

  • Widen to 2 * close_group_size, matching the uploader's over-query window. This does not amplify replica count — the uploader still PUTs to only its selected storers — it just lets a legitimately-selected storer at position 10..14 accept.
  • Keep XOR-only ordering (find_closest_nodes_local_with_self reranks by reachability and would demote XOR-close relay-only / NAT'd storers).
  • Hybrid source: try the local routing-table view first; only on a local miss fall back to an authoritative find_closest_nodes_network lookup (the same view the uploader used to choose the storers), wrapped in the shared CLOSENESS_LOOKUP_TIMEOUT. Reject only if absent from both.

CLOSENESS_LOOKUP_TIMEOUT is promoted to pub(crate) so the receiver check reuses the merkle/issuer timeout.

Note

Same cost caveat as #141: the fallback can issue a per-chunk network lookup when the local view misses; the local fast-path keeps that off the hot path for storers we already know. One storage-specific consideration: a node may now accept a chunk it is the ~10th–14th-closest to, which the replication/pruning close-group logic could later treat as borderline; replica count is unaffected since the uploader controls the PUT targets.

Builds on #140 and #141.

🤖 Generated with Claude Code

Mirror the paid-quote issuer fix (WithAutonomi#141) on the receiver
storage-responsibility check. After WithAutonomi#140/WithAutonomi#141 removed the reachability
re-rank and widened the issuer check, uploads recovered substantially, but a
residual few percent still fail on:

  ClientPut receiver <peer> is not among this node's local 9 closest peers
  (close group plus storage margin)

most visibly as "Failed to store public DataMap" — the DataMap is a single
critical chunk, so one receiver-check rejection fails the whole upload
regardless of file size.

Cause is the same divergence the issuer check had: the uploader queries
2 * CLOSE_GROUP_SIZE peers and PUTs each chunk to the CLOSE_GROUP_SIZE
closest *successful responders* (ant-client get_store_quotes), so when closer
peers are slow or NAT-stuck the storer it legitimately PUT to sits at XOR
positions up to 2 * close_group_size. The receiver check verified only the
bare close_group_size + storage margin (9) of the node's *local* routing
table with exact self-membership, so it rejected honest PUTs.

Bring it in line with the issuer check:

- Widen to 2 * close_group_size, matching the uploader's over-query window.
  This does not amplify replica count — the uploader still PUTs to only its
  selected storers — it just lets a legitimately-selected storer at position
  10..14 accept.
- Keep the XOR-only lookup (find_closest_nodes_local_with_self reranks by
  reachability and would demote XOR-close relay-only / NAT'd storers).
- Hybrid source: try the local routing-table view first, and only on a local
  miss fall back to an authoritative find_closest_nodes_network lookup (the
  same view the uploader used to choose the storers), wrapped in the shared
  CLOSENESS_LOOKUP_TIMEOUT. Reject only if absent from both.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant