Skip to content

fix(payment): use XOR-only local lookup for close-group verification#140

Merged
jacderida merged 1 commit into
WithAutonomi:rc-2026.6.2from
jacderida:fix/xor-only-closeness-verification
Jun 13, 2026
Merged

fix(payment): use XOR-only local lookup for close-group verification#140
jacderida merged 1 commit into
WithAutonomi:rc-2026.6.2from
jacderida:fix/xor-only-closeness-verification

Conversation

@jacderida

Copy link
Copy Markdown
Collaborator

Problem

On a testnet with NAT-simulated nodes, ~100% of uploads fail. Each failed upload reports N-1/N chunks stored, 1 failed, dominated by:

ClientPut receiver <peer> is not among this node's local 9 closest peers (close group plus storage margin)
Paid quote issuer <peer> is not among this node's local 7 closest peers for <chunk>

Because one un-storable chunk fails the whole file, the failure rate scales multiplicatively with file size.

Cause

Both local-admission verification checks call find_closest_nodes_local_with_self:

  • AntProtocol::validate_store_membership (src/storage/handler.rs) — is the receiver responsible for this chunk?
  • PaymentVerifier::validate_paid_quote_issuer_close_group (src/payment/verifier.rs) — is the paid quote's issuer in the close group?

find_closest_nodes_local_with_self ranks the local routing table by reachability (directly-reachable peers first, XOR distance only as a tiebreaker). That ordering demotes an XOR-close relay-only / NAT'd peer out of the compared window. The client, however, selects its quoted close group by pure XOR distance (network lookup). So on a network with NAT'd nodes the two views diverge and the node rejects honest payments — including the receiver wrongly deciding it is not responsible for a chunk it is XOR-close to.

Fix

Closeness verification must mirror the uploader's pure XOR-distance peer selection. Switch both checks to the XOR-only sibling find_closest_nodes_local_by_distance_with_self (which exists for exactly this purpose). No dependency change.

  • Receiver check: XOR-only, keeps its storage-admission width.
  • Issuer check: XOR-only, verifies against the configured close group.

Note

This supersedes and reverts the earlier issuer-check width-widening (#139, close_group_sizestorage_admission_width). That change targeted the wrong mechanism: widening a reachability-reranked window cannot recover an XOR-close peer that the re-rank demoted, and the receiver check (unchanged, already at the wider width) was failing just as hard — which is what pinpointed the re-rank, not the width, as the cause.

🤖 Generated with Claude Code

Both local-admission verification checks — the ClientPut receiver
storage-responsibility check and the paid-quote issuer close-group check —
called `find_closest_nodes_local_with_self`, which ranks the local routing
table by reachability (preferring directly-reachable peers, XOR distance
only as a tiebreaker). That ordering demotes an XOR-close relay-only / NAT'd
peer out of the compared window, so on a network with NAT'd nodes the
verifying node's close-group view diverges from the client's pure
XOR-distance quote selection and honest payments are rejected:

  ClientPut receiver <peer> is not among this node's local 9 closest peers
  Paid quote issuer <peer> is not among this node's local 7 closest peers

One un-storable chunk fails the whole upload, so the failure rate scales
multiplicatively with file size — on a 30%-NAT testnet uploads fail ~100%.

Closeness *verification* must mirror the uploader's pure XOR-distance peer
selection, so switch both checks to the XOR-only sibling
`find_closest_nodes_local_by_distance_with_self` (added for exactly this
purpose). The receiver check keeps its storage-admission width; the issuer
check verifies against the configured close group.

This supersedes the earlier width-widening of the issuer check
(close_group_size + STORAGE_ADMISSION_MARGIN), which targeted the wrong
mechanism — widening a reachability-reranked window cannot recover a demoted
XOR-close peer — and reverts that change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dirvine

dirvine commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Hermes review

Verdict: LGTM / no code blockers found. The change is well-scoped and matches the staging failure mode described.

What I checked:

  • Diff is only two files:
    • src/storage/handler.rs
    • src/payment/verifier.rs
  • ClientPut receiver admission now uses find_closest_nodes_local_by_distance_with_self(..., storage_admission_width(...))
  • paid quote issuer verification now uses find_closest_nodes_local_by_distance_with_self(..., close_group_size)
  • This keeps the important split intact:
    • receiver/storage admission: close group + storage margin
    • paid issuer authority: strict configured close group
  • Confirmed the saorsa-core method is explicitly documented as XOR-only, self-inclusive, and intended for closeness verification rather than reachability-ranked storage selection.

Local checks run:

cargo test --no-run -q
cargo test issuer_close_group --lib --no-fail-fast
cargo test test_put_rejects_out_of_range_receiver_before_payment_cache --lib --no-fail-fast
cargo test --lib --no-fail-fast
cargo fmt --all -- --check
cargo clippy --lib -- -D warnings

Results:

  • cargo test --lib: 540 passed
  • targeted issuer/receiver tests: passed
  • fmt: passed
  • clippy: passed

CI status when checked:

  • Build: ubuntu/mac/windows passed
  • Clippy, format, docs, security audit, no-logging tests: passed
  • OS test matrix still had ubuntu/mac/windows test jobs pending, so I’d wait for those before merge.

Only minor note: there is no new ant-node-level regression test directly modelling “XOR-close relay-only peer demoted by reachability ranking”. The saorsa-core dependency does have tests/documentation for the XOR-only comparator, and this PR is mostly wiring to that intended API, so I don’t see that as blocking.

@jacderida jacderida merged commit 2cf8ad2 into WithAutonomi:rc-2026.6.2 Jun 13, 2026
11 checks passed
jacderida added a commit that referenced this pull request Jun 13, 2026
Includes PR #140 (XOR-only local lookup for close-group verification).
jacderida added a commit that referenced this pull request Jun 13, 2026
…window

The single-node (legacy) median payment path rejects honest uploads on a
network with NAT-stuck or slow peers, while the merkle batch path does not.
On a 30%-NAT testnet this leaves small (single-node-paid) uploads failing a
few percent per chunk — multiplicatively per file — with:

  Paid quote issuer <peer> is not among this node's local 7 closest peers

The uploader selects single-node quotes by querying 2 * CLOSE_GROUP_SIZE
peers and keeping the CLOSE_GROUP_SIZE closest *successful responders*
(ant-client get_store_quotes). When closer peers are slow or NAT-stuck the
honestly-paid issuer therefore sits anywhere in the top 2 * close_group_size
by XOR distance. The verifier checked only the bare close_group_size of the
node's *local* routing table with exact membership, so it rejected those
honest payments — the same divergence the merkle path already tolerates via
a 2 * CANDIDATES_PER_POOL window, an authoritative network lookup, and a
majority threshold.

Bring the single-node issuer check in line:

- Widen to 2 * close_group_size, mirroring the uploader's over-query window.
- Keep the XOR-only lookup (find_closest_nodes_local_with_self reranks by
  reachability and would demote XOR-close relay-only / NAT'd peers).
- Hybrid source: try the cheap local routing-table view first, and only on a
  local miss fall back to an authoritative find_closest_nodes_network lookup
  (the same view the uploader used to choose the quotes), wrapped in the
  existing CLOSENESS_LOOKUP_TIMEOUT. Reject only if the issuer is in neither.

This builds on #140 (which removed the reachability re-rank from these
verification checks); that fix landed the bulk of the recovery, this closes
the residual single-node-path gap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants