feat(add,import): resolve --data-set-metadata to existing dataset IDs#438
Merged
feat(add,import): resolve --data-set-metadata to existing dataset IDs#438
Conversation
Closes #435 synapse-sdk's metadataMatches requires exact key/value equality, so passing --data-set-metadata source=storacha-migration against datasets that also carry space-did/space-name fails to match and the SDK creates a new dataset. Resolve --data-set-metadata locally before invoking the upload pipeline when the caller has not pinned dataSetIds/providerIds: - Match: requested keys are a subset of an existing dataset's metadata. - Outcome by match count vs --copies (default 2): - == copies: route to those dataset IDs, drop metadata from the upload request to avoid SDK exact-match interference. - 0 matches: pass metadata through unchanged (SDK creates a new dataset tagged with the requested metadata). - other: error with the matched IDs and expected count, suggest --copies <n> or --data-set-ids. Verified against migration datasets 13260 + 13261 on calibration: `add ... --data-set-metadata source=storacha-migration` reuses both without creating new datasets. Default `add` and `import` flows (no --data-set-metadata) are unchanged.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds local subset-based resolution for --data-set-metadata so add/import can route uploads into existing datasets (instead of creating new ones) when the requested metadata keys are a subset of an existing dataset’s metadata.
Changes:
- Introduces
resolveDataSetIdsByMetadatato resolve dataset IDs via local subset matching against the caller’s datasets. - Updates
filecoin-pin addandfilecoin-pin importflows to use resolveddataSetIds(and stop forwarding user metadata) when match count equals--copies. - Adds unit tests for the resolver and updates
add/importunit test mocks to support dataset listing.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/core/data-set/resolve-by-metadata.ts | New resolver to subset-match requested metadata to existing live datasets and return IDs / ambiguity. |
| src/core/data-set/index.ts | Exports the new resolver from the core data-set module. |
| src/add/add.ts | Adds pre-upload resolution step for --data-set-metadata to dataSetIds when not pinned by IDs. |
| src/import/import.ts | Same resolution behavior as add, applied to CAR import. |
| src/test/unit/resolve-by-metadata.test.ts | Adds unit tests for resolver outcomes (no-match/matched/ambiguous). |
| src/test/unit/add.test.ts | Updates Synapse mock to include getClientAddress and findDataSets for resolver path. |
| src/test/unit/import.test.ts | Updates Synapse mock to include getClientAddress and findDataSets for resolver path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Resolver: require key presence on dataset metadata (not value-only comparison). Previous `(metadata?.[key] ?? '') === value` falsely matched datasets missing the requested key when the requested value was empty string. - Upload layer (src/core/upload/synapse.ts): suppress filecoin-pin's default metadata injection when the caller passes dataSetIds or pre-resolved contexts. SDK doesn't consult metadata on the dataset-id path today; suppressing here keeps intent honest and prevents future SDK changes from silently mismatching. - Tests: rewire resolver tests so the mock applies the resolver's filter callback to raw fixtures, exercising the actual subset-match logic. Without this, the matching code never ran in tests. - Tests: add integration coverage in add.test.ts and import.test.ts for the matched (drops metadata, sets dataSetIds) and ambiguous (throws) resolution paths.
rvagg
reviewed
May 4, 2026
rvagg
approved these changes
May 4, 2026
Member
rvagg
left a comment
There was a problem hiding this comment.
I guess this'll do for a start; I just have issue with the use of "ambiguous" here, both its inapplication and its lossy nature; I'll leave that one with you to ponder though, near enough is good enough for now
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #435.
What changed
--data-set-metadatapreviously had no effect on dataset selection because synapse-sdk'smetadataMatchesrequires exact key/value equality. Migration datasets carry 4 keys (source, space-did, space-name, withIPFSIndexing); a request forsource=storacha-migrationalone failed to match and the SDK created a new dataset.Both
addandimportnow resolve--data-set-metadatalocally before invoking the upload pipeline:--copies(default 2):== copies→ route to those dataset IDs, drop metadata from the upload request.0 matches→ pass metadata through unchanged (current behavior creates a new dataset).Resolver runs only when the caller has not pinned
--data-set-idsor--provider-ids. Default flows (no--data-set-metadata) are unchanged.How to verify
Live verification on calibration with migration datasets 13260 + 13261:
Negative path: