fix(payment): verify single-node issuer closeness against over-query window#141
Conversation
…window The single-node (legacy) median payment path rejects honest uploads on a network with NAT-stuck or slow peers, while the merkle batch path does not. On a 30%-NAT testnet this leaves small (single-node-paid) uploads failing a few percent per chunk — multiplicatively per file — with: Paid quote issuer <peer> is not among this node's local 7 closest peers The uploader selects single-node quotes by querying 2 * CLOSE_GROUP_SIZE peers and keeping the CLOSE_GROUP_SIZE closest *successful responders* (ant-client get_store_quotes). When closer peers are slow or NAT-stuck the honestly-paid issuer therefore sits anywhere in the top 2 * close_group_size by XOR distance. The verifier checked only the bare close_group_size of the node's *local* routing table with exact membership, so it rejected those honest payments — the same divergence the merkle path already tolerates via a 2 * CANDIDATES_PER_POOL window, an authoritative network lookup, and a majority threshold. Bring the single-node issuer check in line: - Widen to 2 * close_group_size, mirroring the uploader's over-query window. - Keep the XOR-only lookup (find_closest_nodes_local_with_self reranks by reachability and would demote XOR-close relay-only / NAT'd peers). - Hybrid source: try the cheap local routing-table view first, and only on a local miss fall back to an authoritative find_closest_nodes_network lookup (the same view the uploader used to choose the quotes), wrapped in the existing CLOSENESS_LOOKUP_TIMEOUT. Reject only if the issuer is in neither. This builds on WithAutonomi#140 (which removed the reachability re-rank from these verification checks); that fix landed the bulk of the recovery, this closes the residual single-node-path gap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hermes reviewThanks — I reviewed the diff and ran focused local checks. Verdict: changes requested before merge, primarily because formatting is currently red. Blocking
Logic reviewThe hotfix direction looks broadly sound:
Documentation / invariant clarityOne thing I would tighten before merge: nearby comments still describe this as checking the issuer against the configured close group, but this PR deliberately widens single-node issuer locality to Suggested places to update:
It would be clearer to state that the legacy/single-node issuer check uses the uploader over-query window, not strict close-group width. Given this is an economic/security boundary, a small regression test or explicit comment explaining why issuer width can be wider than local storage-admission width would also help future reviewers avoid accidentally re-tightening or over-widening the wrong side of the invariant. Local checks run
CI at review time: format failing; docs/clippy/security audit and several build/test jobs passing; some OS matrix jobs still pending. |
Pure formatting; no behaviour change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Includes PR #141 (verify single-node issuer closeness against over-query window).
Problem
After #140 removed the reachability re-rank from the closeness-verification checks, most uploads recovered — but on a 30%-NAT testnet, uploads paid via the single-node (legacy) median path still fail a few percent per chunk (multiplicatively per file), while uploads paid via the merkle batch path succeed cleanly.
Observed on a staging testnet: a 1000 MB upload (merkle-dominated) had zero closeness rejections, while 20 MB uploads (single-node) failed ~60% of the time, entirely on:
Cause
The two payment paths verify issuer/candidate closeness very differently:
find_closest_nodes_networkclose_group_size(7)2 * CANDIDATES_PER_POOL(32)The uploader selects single-node quotes by querying
2 * CLOSE_GROUP_SIZEpeers and keeping theCLOSE_GROUP_SIZEclosest successful responders (ant-clientget_store_quotes). When closer peers are slow or NAT-stuck, the honestly-paid issuer legitimately lands at positions 8–14 by XOR distance. Verifying against only the node's local top-7 with exact membership rejects those honest payments — the same divergence the merkle path was already hardened against (its code comment: such peers appear at "positions 17–32 … when the closer peers are slow or NAT-stuck. The storer must look at the same window or it will reject honest pools with no security benefit").Fix
Bring the single-node issuer check in line with the merkle path:
2 * close_group_size, mirroring the uploader's over-query window.find_closest_nodes_local_with_selfreranks by reachability and would demote the XOR-close relay-only / NAT'd peers the uploader legitimately quoted — the fix(payment): use XOR-only local lookup for close-group verification #140 fix).find_closest_nodes_networklookup (the same view the uploader used to pick the quotes), wrapped in the existingCLOSENESS_LOOKUP_TIMEOUT. Reject only if the issuer is in neither view.Note on cost
The fallback can issue a per-chunk network lookup on the single-node path when the local view misses. The local fast-path keeps that off the hot path for issuers we already know. The merkle path amortizes its network lookups with a single-flight + pass-cache keyed by pool hash; this change does not add caching (each chunk is a distinct address, so it is less reusable), but that is an option if lookup load proves high in practice.
Builds on #140.
🤖 Generated with Claude Code