Tree node listing helpers + server-authoritative S3 tagging#403
Tree node listing helpers + server-authoritative S3 tagging#403
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds configurable file upload options to support S3-compatible storage systems that may not fully implement all S3 features, particularly object tagging. The implementation introduces a FilesConfig class for runtime configuration with three options: useS3Tagging (to disable S3 tagging headers for incompatible storage), maxMultipartRetries (configurable retry count), and fileUploadTimeoutMs (configurable timeout).
- Introduces
FilesConfigclass with static configuration pattern for file upload settings - Refactors
DirectUploadClientconstructor to accept a configuration object instead of a plain number - Implements conditional S3 tagging header inclusion based on configuration
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
test/unit/files/FilesConfig.test.ts |
New test suite covering FilesConfig initialization and configuration retrieval |
test/unit/files/DirectUploadClient.test.ts |
Updated tests to verify S3 tagging behavior and use new config object pattern |
src/files/infra/clients/DirectUploadClient.ts |
Implements DirectUploadClientConfig interface and conditional S3 tagging in single-part uploads |
src/files/index.ts |
Adds FilesConfig class with lazy initialization pattern for DirectUploadClient and UploadFile instances |
CHANGELOG.md |
Documents new features and breaking changes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
FYI - there's already a dataverse.files..disable-tagging setting - is that the same as the new one here? |
If there is one already, then we do not need the new one. However, I know there is one in Java, I did not know there is one in Javascript too? I made this PR a draft, it is a part of a bigger idea: reusing the SPA parts in other UI's. I think it is something for the tech hours to discuss first. There are other 3 PR's coming to illustrate what I mean (one in the new SPA, one in dvwebloader and one in the JSF frontend). The dvwebloader that uses the SPA file upload can function without these PR's ever being merged, and the dvwebloader one is also optional (it is something that we want to use at KU Leuven, but it is fine if it is not mainstream). |
|
A tech hour makes sense for the bigger picture. re: the tagging - the store is configured on the server and the signed URLs either will/won't allow a tag header based on that (calls will fail if they don't match the server setting). Do we need an API so you can discover that setting? (We've recently been adding more settings to the storageDriver GET API). |
The x-amz-tagging header value is now taken from destination.tagging, which is populated by the server when tagging is enabled. This removes the client-side FilesConfig/useS3Tagging flag that duplicated the backend DISABLE_S3_TAGGING JVM setting and would drift silently. - FileUploadDestination: add tagging?: string - fileUploadDestinationsTransformers: pass tagging through both paths - DirectUploadClient: use destination.tagging as header value; remove useS3Tagging field and config option - files/index.ts: remove FilesConfig class and lazy-init pattern; uploadFile is now a plain UploadFile instance - package.json: scope prettier/eslint scripts to ./src to avoid permission errors scanning test/environment/docker-dev-volumes
This reverts commit 5a9f204.
|
I reverted the template functional test isolation change from this PR to keep the upload PR scoped. The failing test appears to be shared-state/flakiness around That commit can be reused in a separate PR dedicated to functional test isolation. |
`npm run format` and `npm run lint:eslint` traverse the whole repo by default, which fails on systems where `test/environment/docker-dev-volumes` contains directories owned by container users (e.g. solr/data) and is not readable by the developer's UID. The pre-commit hook then aborts. Narrowing the globs to `./src` keeps the formatters and linters running on what we actually care about — application source — and lets the pre-commit hook succeed regardless of how container volumes are provisioned.
Adds DataverseApiAuthMechanism to the existing core/index.ts re-export alongside ApiConfig so consumers don't have to deep-import it from `@iqss/dataverse-client-javascript/dist/core/infra/repositories/ApiConfig`. This is the SDK side of a small two-line change agreed with the dataverse-frontend reusable-components track: once a prerelease ships this export, the standalone uploader can import the enum from the package's public surface. Until then, consumers can keep the deep import. Non-breaking additive change.
New use cases backing the paginated dataset version tree endpoint:
GET /api/datasets/{id}/versions/{versionId}/tree
- listDatasetTreeNode: single-page lookup. Accepts path, limit,
cursor, include (all/folders/files), order (NameAZ/NameZA),
includeDeaccessioned, originals.
- iterateDatasetTreeNode: async generator that walks the cursor
chain so callers can consume one folder's children without
driving pagination by hand.
Wire format mirrors the backend response 1:1 (folder items carry
optional `counts`, file items add id/size/contentType/access/
checksum/downloadUrl). Order/include parsing falls back to
defaults on unknown values for forward-compat.
Includes Jest unit tests for the use cases and the transformer.
9771c3b to
397d662
Compare
PR #12182 merged on dataverse develop and moved the per-collection
storage-driver endpoint:
OLD: PUT /api/admin/dataverse/{alias}/storageDriver
NEW: PUT /api/dataverses/{alias}/storageDriver
The CI integration tests on PR #403 now run against a Dataverse
container that includes the move, so setStorageDriverViaApi was
hitting the old admin path and getting 404, which cascaded into
every dataset/file test that depends on the directUploadTestCollection
having LocalStack as its storage driver.
Fix: update setStorageDriverViaApi to use the new public endpoint.
The endpoint still requires X-Dataverse-Key for write operations
(superuser only), so authentication is unchanged.
- docs/useCases.md: add 'List a Folder of a Dataset Version (Tree View)' and 'Iterate a Folder of a Dataset Version (Tree View)' under Datasets read use cases, with example calls and notes on cursor / ETag / ordering. Adds matching TOC entries. - CHANGELOG.md (Unreleased): add a one-line note about re-exporting DataverseApiAuthMechanism from the public surface so the standalone reusable-component bundles can import it without a deep path.
What this PR does / why we need it:
Two coordinated additions to the SDK that back the reusable React components shipping in
IQSS/dataverse-frontend#898andIQSS/dataverse#12382:useS3Taggingflag and reads thetaggingfield from the upload-destination response instead. The matching server PR adds that field and exposes it from the per-storagedataverse.files.<id>.disable-taggingsetting. This unblocks deployments on S3-compatible storage that doesn't support the tagging API.DirectUploadClientconstructor signature now takes aDirectUploadClientConfigobject instead of a positionalretriesarg.listDatasetTreeNode(single page) anditerateDatasetTreeNode(async generator that walks pagination via the opaque server-issued cursor). Wraps the new/api/datasets/{id}/versions/{vid}/treeendpoint added in the matching server PR.Plus a small ergonomics fix: re-export
DataverseApiAuthMechanismfrom the public surface so consumers don't have to import it from a deep submodule.Which issue(s) this PR closes:
IQSS/dataverse#6691(lazy tree view) — closes nothing standalone in this repo, the closing happens via the consumer PRs.Related Dataverse PRs:
IQSS/dataverse#12382(server-side tree endpoint +taggingresponse field)IQSS/dataverse-frontend#898(lazy tree view + standalone uploader bundles)IQSS/dataverse#12188(session-cookie API hardening) — the consumers authenticate via session cookies; #12188 adds CSRF mitigations on top.The frontend PR pins the SDK to a GitHub Packages prerelease (
2.2.0-pr403.3d6f638) for testing. The recommended landing order is this PR → frontend → server, with this PR's stable 2.x release published before the frontend flips off the prerelease pin.Special notes for your reviewer:
Reviewer's guide
The diff is ~770 LOC across 21 files. Two near-orthogonal features; review them as two separate sub-PRs.
Tagging-config story (~15 min):
src/files/infra/clients/DirectUploadClient.ts—Content-Typeandx-amz-taggingare now read fromdestination.taggingfor single-part uploads (line 72-74). Constructor takesDirectUploadClientConfiginstead of a positionalretries.src/files/infra/repositories/transformers/fileUploadDestinationsTransformers.ts— plumbstaggingthrough both single + multipart payloads.src/files/domain/clients/DirectUploadClientConfig.ts— new config interface.Tree-listing use cases (~15 min):
src/datasets/domain/useCases/ListDatasetTreeNode.ts— wraps a single tree-page request.src/datasets/domain/useCases/IterateDatasetTreeNode.ts— async generator (async *) that walksnextCursorlazily; only one page in flight at a time, yields each item before fetching the next page.src/datasets/domain/models/FileTreeNode.ts— discriminated union viaFileTreeNodeTypeenum + type guards.src/datasets/infra/repositories/transformers/fileTreeTransformers.ts— wire-format → domain mapping. Includes a payload-envelope unwrap and a defensive fallback for unknown enum values from older / future server versions.Tests:
IterateDatasetTreeNode.test.ts,ListDatasetTreeNode.test.ts,fileTreeTransformers.test.ts). Cursor walk,ReadErrorpropagation, transformer fallback, payload-envelope unwrapping.DirectUploadClient.test.tsfor the tagging/timeout config story.Known limitations & open ends
destination.tagging.uploadMultipartFile(DirectUploadClient.ts) hard-codesContent-Typeonly; the transformer plumbstaggingthrough but the multipart code path doesn't read it. This is a pre-existing gap ondevelop— the previous code already didn't set tagging on multipart, so this PR doesn't introduce a regression. We chose not to expand scope in this PR; tracked as a follow-up. Worth a CHANGELOG note that lifecycle policies depending ondv-state=tempfor multipart-uploaded objects continue to depend on bucket-level rules rather than per-object tags.taggingfield (older Dataverse without the matching#12382server change), the SDK falls back tox-amz-tagging: dv-state=temp— the same tag every earlier SDK version hard-coded — so existing AWS-S3 deployments keep working without a server upgrade. Operators who need to opt out (storage that doesn't accept S3 tags) setdataverse.files.<id>.disable-tagging=trueon the matching server release, which makes the server return an emptytaggingvalue and the client skip the header.DirectUploadClientconstructor signature change. Was(repo, retries=5), now(repo, config = {}). CHANGELOG documents under "Changed". This is a public API break for direct consumers ofDirectUploadClient(uncommon — typical SDK users go through the use-case layer); flagged for the team.package.jsonstill reads2.2.0(same as develop). The consumer pin (2.2.0-pr403.3d6f638) is a snapshot; the stable target is2.2.0or2.3.0depending on how we treat the constructor break. Open for the maintainers' call.Suggestions on how to test this:
npm install npm run test:unit # full unit suite npm run lintManual smoke against a real Dataverse with the matching server PR:
For the tagging story, the easiest manual check is to upload a file via the matching frontend PR against a Dataverse with
disable-tagging=false/trueon the storage and verify the presence/absence ofx-amz-taggingin the browser's Network tab.Is there a release notes or changelog update needed for this change?:
Yes —
CHANGELOG.mdcovers the new use cases, theDataverseApiAuthMechanismre-export, the tagging behaviour change (the SDK still defaults tox-amz-tagging: dv-state=tempwhen the server omits the field, so older-server deployments are unaffected; servers that explicitly return an emptytaggingvalue tell the client to skip the header), theDirectUploadClientConfigexport, and theDirectUploadClientconstructor signature change.Additional documentation:
docs/useCases.md— adds full sections forlistDatasetTreeNodeanditerateDatasetTreeNodewith example code.doc/sphinx-guides/source/api/native-api.rstdocuments the underlying endpoint, including the cursor opacity contract.AI-assistance disclosure
Parts of this work — including the async-generator shape of
iterateDatasetTreeNode, the transformer's defensive enum fallback, and theuseCases.mddocumentation — were developed with the help of Claude (Anthropic) via Claude Code. The model was particularly useful for keeping the new code consistent with the SDK's existing use-case / repository / transformer layering and for spotting backwards-compatibility implications of the tagging-from-server change.Reviewer attention is still required: AI-assisted code is still author-owned, and we've reviewed every diff that landed. Flagging this so reviewers can apply whatever scrutiny they reserve for AI-touched changes.