Skip to content

fix(payment): verify single-node issuer closeness against over-query window#141

Merged
jacderida merged 2 commits into
WithAutonomi:rc-2026.6.2from
jacderida:fix/single-node-issuer-closeness-hybrid
Jun 13, 2026
Merged

fix(payment): verify single-node issuer closeness against over-query window#141
jacderida merged 2 commits into
WithAutonomi:rc-2026.6.2from
jacderida:fix/single-node-issuer-closeness-hybrid

Conversation

@jacderida

Copy link
Copy Markdown
Collaborator

Problem

After #140 removed the reachability re-rank from the closeness-verification checks, most uploads recovered — but on a 30%-NAT testnet, uploads paid via the single-node (legacy) median path still fail a few percent per chunk (multiplicatively per file), while uploads paid via the merkle batch path succeed cleanly.

Observed on a staging testnet: a 1000 MB upload (merkle-dominated) had zero closeness rejections, while 20 MB uploads (single-node) failed ~60% of the time, entirely on:

Paid quote issuer <peer> is not among this node's local 7 closest peers for <chunk>

Cause

The two payment paths verify issuer/candidate closeness very differently:

Single-node path Merkle batch path
Lookup source node's local routing table authoritative find_closest_nodes_network
Window width close_group_size (7) 2 * CANDIDATES_PER_POOL (32)
Match rule exact membership of one issuer 9-of-16 majority

The uploader selects single-node quotes by querying 2 * CLOSE_GROUP_SIZE peers and keeping the CLOSE_GROUP_SIZE closest successful responders (ant-client get_store_quotes). When closer peers are slow or NAT-stuck, the honestly-paid issuer legitimately lands at positions 8–14 by XOR distance. Verifying against only the node's local top-7 with exact membership rejects those honest payments — the same divergence the merkle path was already hardened against (its code comment: such peers appear at "positions 17–32 … when the closer peers are slow or NAT-stuck. The storer must look at the same window or it will reject honest pools with no security benefit").

Fix

Bring the single-node issuer check in line with the merkle path:

  • Widen to 2 * close_group_size, mirroring the uploader's over-query window.
  • Keep XOR-only ordering (find_closest_nodes_local_with_self reranks by reachability and would demote the XOR-close relay-only / NAT'd peers the uploader legitimately quoted — the fix(payment): use XOR-only local lookup for close-group verification #140 fix).
  • Hybrid source: check the cheap local routing-table view first; only on a local miss fall back to an authoritative find_closest_nodes_network lookup (the same view the uploader used to pick the quotes), wrapped in the existing CLOSENESS_LOOKUP_TIMEOUT. Reject only if the issuer is in neither view.

Note on cost

The fallback can issue a per-chunk network lookup on the single-node path when the local view misses. The local fast-path keeps that off the hot path for issuers we already know. The merkle path amortizes its network lookups with a single-flight + pass-cache keyed by pool hash; this change does not add caching (each chunk is a distinct address, so it is less reusable), but that is an option if lookup load proves high in practice.

Builds on #140.

🤖 Generated with Claude Code

…window

The single-node (legacy) median payment path rejects honest uploads on a
network with NAT-stuck or slow peers, while the merkle batch path does not.
On a 30%-NAT testnet this leaves small (single-node-paid) uploads failing a
few percent per chunk — multiplicatively per file — with:

  Paid quote issuer <peer> is not among this node's local 7 closest peers

The uploader selects single-node quotes by querying 2 * CLOSE_GROUP_SIZE
peers and keeping the CLOSE_GROUP_SIZE closest *successful responders*
(ant-client get_store_quotes). When closer peers are slow or NAT-stuck the
honestly-paid issuer therefore sits anywhere in the top 2 * close_group_size
by XOR distance. The verifier checked only the bare close_group_size of the
node's *local* routing table with exact membership, so it rejected those
honest payments — the same divergence the merkle path already tolerates via
a 2 * CANDIDATES_PER_POOL window, an authoritative network lookup, and a
majority threshold.

Bring the single-node issuer check in line:

- Widen to 2 * close_group_size, mirroring the uploader's over-query window.
- Keep the XOR-only lookup (find_closest_nodes_local_with_self reranks by
  reachability and would demote XOR-close relay-only / NAT'd peers).
- Hybrid source: try the cheap local routing-table view first, and only on a
  local miss fall back to an authoritative find_closest_nodes_network lookup
  (the same view the uploader used to choose the quotes), wrapped in the
  existing CLOSENESS_LOOKUP_TIMEOUT. Reject only if the issuer is in neither.

This builds on WithAutonomi#140 (which removed the reachability re-rank from these
verification checks); that fix landed the bulk of the recovery, this closes
the residual single-node-path gap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dirvine

dirvine commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Hermes review

Thanks — I reviewed the diff and ran focused local checks.

Verdict: changes requested before merge, primarily because formatting is currently red.

Blocking

  • cargo fmt --all -- --check fails on src/payment/verifier.rs around the new match tokio::time::timeout(...) block. This matches the failing GitHub Format Check job. Running cargo fmt should fix it.

Logic review

The hotfix direction looks broadly sound:

  • the single-node paid issuer check remains after content-address and peer/pubkey binding;
  • receiver/storage admission is still checked before payment verification in storage/handler.rs;
  • the check keeps XOR-only ordering, avoiding the reachability re-rank issue fixed in fix(payment): use XOR-only local lookup for close-group verification #140;
  • the network fallback mirrors the Merkle-path authoritative-view rationale, while keeping a local fast path.

Documentation / invariant clarity

One thing I would tighten before merge: nearby comments still describe this as checking the issuer against the configured close group, but this PR deliberately widens single-node issuer locality to 2 * close_group_size.

Suggested places to update:

  • src/payment/verifier.rs: PaymentVerifierConfig.close_group_size comment
  • src/payment/verifier.rs: VerificationContext comment around the paid-issuer locality invariant

It would be clearer to state that the legacy/single-node issuer check uses the uploader over-query window, not strict close-group width. Given this is an economic/security boundary, a small regression test or explicit comment explaining why issuer width can be wider than local storage-admission width would also help future reviewers avoid accidentally re-tightening or over-widening the wrong side of the invariant.

Local checks run

  • cargo fmt --all -- --checkfailed on formatting
  • cargo test test_legacy_paid_median_issuer_close_group_rejection --lib --no-fail-fastpassed
  • cargo test payment::verifier::tests --lib --no-fail-fastpassed, 74 tests

CI at review time: format failing; docs/clippy/security audit and several build/test jobs passing; some OS matrix jobs still pending.

Pure formatting; no behaviour change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jacderida jacderida merged commit d797cb8 into WithAutonomi:rc-2026.6.2 Jun 13, 2026
11 checks passed
jacderida added a commit that referenced this pull request Jun 13, 2026
Includes PR #141 (verify single-node issuer closeness against over-query window).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants