fix(payment): use XOR-only local lookup for close-group verification#140
Merged
jacderida merged 1 commit intoJun 13, 2026
Conversation
Both local-admission verification checks — the ClientPut receiver storage-responsibility check and the paid-quote issuer close-group check — called `find_closest_nodes_local_with_self`, which ranks the local routing table by reachability (preferring directly-reachable peers, XOR distance only as a tiebreaker). That ordering demotes an XOR-close relay-only / NAT'd peer out of the compared window, so on a network with NAT'd nodes the verifying node's close-group view diverges from the client's pure XOR-distance quote selection and honest payments are rejected: ClientPut receiver <peer> is not among this node's local 9 closest peers Paid quote issuer <peer> is not among this node's local 7 closest peers One un-storable chunk fails the whole upload, so the failure rate scales multiplicatively with file size — on a 30%-NAT testnet uploads fail ~100%. Closeness *verification* must mirror the uploader's pure XOR-distance peer selection, so switch both checks to the XOR-only sibling `find_closest_nodes_local_by_distance_with_self` (added for exactly this purpose). The receiver check keeps its storage-admission width; the issuer check verifies against the configured close group. This supersedes the earlier width-widening of the issuer check (close_group_size + STORAGE_ADMISSION_MARGIN), which targeted the wrong mechanism — widening a reachability-reranked window cannot recover a demoted XOR-close peer — and reverts that change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
Hermes reviewVerdict: LGTM / no code blockers found. The change is well-scoped and matches the staging failure mode described. What I checked:
Local checks run: Results:
CI status when checked:
Only minor note: there is no new ant-node-level regression test directly modelling “XOR-close relay-only peer demoted by reachability ranking”. The saorsa-core dependency does have tests/documentation for the XOR-only comparator, and this PR is mostly wiring to that intended API, so I don’t see that as blocking. |
jacderida
added a commit
that referenced
this pull request
Jun 13, 2026
Includes PR #140 (XOR-only local lookup for close-group verification).
jacderida
added a commit
that referenced
this pull request
Jun 13, 2026
…window The single-node (legacy) median payment path rejects honest uploads on a network with NAT-stuck or slow peers, while the merkle batch path does not. On a 30%-NAT testnet this leaves small (single-node-paid) uploads failing a few percent per chunk — multiplicatively per file — with: Paid quote issuer <peer> is not among this node's local 7 closest peers The uploader selects single-node quotes by querying 2 * CLOSE_GROUP_SIZE peers and keeping the CLOSE_GROUP_SIZE closest *successful responders* (ant-client get_store_quotes). When closer peers are slow or NAT-stuck the honestly-paid issuer therefore sits anywhere in the top 2 * close_group_size by XOR distance. The verifier checked only the bare close_group_size of the node's *local* routing table with exact membership, so it rejected those honest payments — the same divergence the merkle path already tolerates via a 2 * CANDIDATES_PER_POOL window, an authoritative network lookup, and a majority threshold. Bring the single-node issuer check in line: - Widen to 2 * close_group_size, mirroring the uploader's over-query window. - Keep the XOR-only lookup (find_closest_nodes_local_with_self reranks by reachability and would demote XOR-close relay-only / NAT'd peers). - Hybrid source: try the cheap local routing-table view first, and only on a local miss fall back to an authoritative find_closest_nodes_network lookup (the same view the uploader used to choose the quotes), wrapped in the existing CLOSENESS_LOOKUP_TIMEOUT. Reject only if the issuer is in neither. This builds on #140 (which removed the reachability re-rank from these verification checks); that fix landed the bulk of the recovery, this closes the residual single-node-path gap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On a testnet with NAT-simulated nodes, ~100% of uploads fail. Each failed upload reports
N-1/N chunks stored, 1 failed, dominated by:Because one un-storable chunk fails the whole file, the failure rate scales multiplicatively with file size.
Cause
Both local-admission verification checks call
find_closest_nodes_local_with_self:AntProtocol::validate_store_membership(src/storage/handler.rs) — is the receiver responsible for this chunk?PaymentVerifier::validate_paid_quote_issuer_close_group(src/payment/verifier.rs) — is the paid quote's issuer in the close group?find_closest_nodes_local_with_selfranks the local routing table by reachability (directly-reachable peers first, XOR distance only as a tiebreaker). That ordering demotes an XOR-close relay-only / NAT'd peer out of the compared window. The client, however, selects its quoted close group by pure XOR distance (network lookup). So on a network with NAT'd nodes the two views diverge and the node rejects honest payments — including the receiver wrongly deciding it is not responsible for a chunk it is XOR-close to.Fix
Closeness verification must mirror the uploader's pure XOR-distance peer selection. Switch both checks to the XOR-only sibling
find_closest_nodes_local_by_distance_with_self(which exists for exactly this purpose). No dependency change.Note
This supersedes and reverts the earlier issuer-check width-widening (
#139,close_group_size→storage_admission_width). That change targeted the wrong mechanism: widening a reachability-reranked window cannot recover an XOR-close peer that the re-rank demoted, and the receiver check (unchanged, already at the wider width) was failing just as hard — which is what pinpointed the re-rank, not the width, as the cause.🤖 Generated with Claude Code