Skip to content

Tree node listing helpers + server-authoritative S3 tagging#403

Draft
ErykKul wants to merge 19 commits intodevelopfrom
feature/configurable-uploads
Draft

Tree node listing helpers + server-authoritative S3 tagging#403
ErykKul wants to merge 19 commits intodevelopfrom
feature/configurable-uploads

Conversation

@ErykKul
Copy link
Copy Markdown
Contributor

@ErykKul ErykKul commented Dec 5, 2025

What this PR does / why we need it:

Two coordinated additions to the SDK that back the reusable React components shipping in IQSS/dataverse-frontend#898 and IQSS/dataverse#12382:

  1. Server-driven file-upload tagging. Drops the client-side useS3Tagging flag and reads the tagging field from the upload-destination response instead. The matching server PR adds that field and exposes it from the per-storage dataverse.files.<id>.disable-tagging setting. This unblocks deployments on S3-compatible storage that doesn't support the tagging API. DirectUploadClient constructor signature now takes a DirectUploadClientConfig object instead of a positional retries arg.
  2. Tree-listing use cases. New listDatasetTreeNode (single page) and iterateDatasetTreeNode (async generator that walks pagination via the opaque server-issued cursor). Wraps the new /api/datasets/{id}/versions/{vid}/tree endpoint added in the matching server PR.

Plus a small ergonomics fix: re-export DataverseApiAuthMechanism from the public surface so consumers don't have to import it from a deep submodule.

Which issue(s) this PR closes:

  • Implements the SDK side of IQSS/dataverse#6691 (lazy tree view) — closes nothing standalone in this repo, the closing happens via the consumer PRs.

Related Dataverse PRs:

  • Depends on IQSS/dataverse#12382 (server-side tree endpoint + tagging response field)
  • Consumed by IQSS/dataverse-frontend#898 (lazy tree view + standalone uploader bundles)
  • Soft-coupled with IQSS/dataverse#12188 (session-cookie API hardening) — the consumers authenticate via session cookies; #12188 adds CSRF mitigations on top.

The frontend PR pins the SDK to a GitHub Packages prerelease (2.2.0-pr403.3d6f638) for testing. The recommended landing order is this PR → frontend → server, with this PR's stable 2.x release published before the frontend flips off the prerelease pin.

Special notes for your reviewer:

Reviewer's guide

The diff is ~770 LOC across 21 files. Two near-orthogonal features; review them as two separate sub-PRs.

Tagging-config story (~15 min):

  • src/files/infra/clients/DirectUploadClient.tsContent-Type and x-amz-tagging are now read from destination.tagging for single-part uploads (line 72-74). Constructor takes DirectUploadClientConfig instead of a positional retries.
  • src/files/infra/repositories/transformers/fileUploadDestinationsTransformers.ts — plumbs tagging through both single + multipart payloads.
  • src/files/domain/clients/DirectUploadClientConfig.ts — new config interface.

Tree-listing use cases (~15 min):

  • src/datasets/domain/useCases/ListDatasetTreeNode.ts — wraps a single tree-page request.
  • src/datasets/domain/useCases/IterateDatasetTreeNode.ts — async generator (async *) that walks nextCursor lazily; only one page in flight at a time, yields each item before fetching the next page.
  • src/datasets/domain/models/FileTreeNode.ts — discriminated union via FileTreeNodeType enum + type guards.
  • src/datasets/infra/repositories/transformers/fileTreeTransformers.ts — wire-format → domain mapping. Includes a payload-envelope unwrap and a defensive fallback for unknown enum values from older / future server versions.

Tests:

  • 3 new test files for the tree (IterateDatasetTreeNode.test.ts, ListDatasetTreeNode.test.ts, fileTreeTransformers.test.ts). Cursor walk, ReadError propagation, transformer fallback, payload-envelope unwrapping.
  • 3 new test cases in DirectUploadClient.test.ts for the tagging/timeout config story.

Known limitations & open ends

  • Multipart upload doesn't currently consume destination.tagging. uploadMultipartFile (DirectUploadClient.ts) hard-codes Content-Type only; the transformer plumbs tagging through but the multipart code path doesn't read it. This is a pre-existing gap on develop — the previous code already didn't set tagging on multipart, so this PR doesn't introduce a regression. We chose not to expand scope in this PR; tracked as a follow-up. Worth a CHANGELOG note that lifecycle policies depending on dv-state=temp for multipart-uploaded objects continue to depend on bucket-level rules rather than per-object tags.
  • Backwards-compat against older Dataverse servers. When the server omits the tagging field (older Dataverse without the matching #12382 server change), the SDK falls back to x-amz-tagging: dv-state=temp — the same tag every earlier SDK version hard-coded — so existing AWS-S3 deployments keep working without a server upgrade. Operators who need to opt out (storage that doesn't accept S3 tags) set dataverse.files.<id>.disable-tagging=true on the matching server release, which makes the server return an empty tagging value and the client skip the header.
  • DirectUploadClient constructor signature change. Was (repo, retries=5), now (repo, config = {}). CHANGELOG documents under "Changed". This is a public API break for direct consumers of DirectUploadClient (uncommon — typical SDK users go through the use-case layer); flagged for the team.
  • Version bump. package.json still reads 2.2.0 (same as develop). The consumer pin (2.2.0-pr403.3d6f638) is a snapshot; the stable target is 2.2.0 or 2.3.0 depending on how we treat the constructor break. Open for the maintainers' call.

Suggestions on how to test this:

npm install
npm run test:unit       # full unit suite
npm run lint

Manual smoke against a real Dataverse with the matching server PR:

import { listDatasetTreeNode, iterateDatasetTreeNode, ApiConfig, DataverseApiAuthMechanism } from '@iqss/dataverse-client-javascript'

ApiConfig.init('http://localhost:8080/api/v1', DataverseApiAuthMechanism.SESSION_COOKIE)

// Single page
const page = await listDatasetTreeNode.execute({
  datasetId: 'doi:10.5072/FK2/AAAAAA',
  datasetVersionId: ':latest',
  limit: 50
})

// Async-iterating generator
for await (const node of iterateDatasetTreeNode.execute({
  datasetId: 'doi:10.5072/FK2/AAAAAA',
  datasetVersionId: ':latest'
})) {
  console.log(node.name, node.nodeType)
}

For the tagging story, the easiest manual check is to upload a file via the matching frontend PR against a Dataverse with disable-tagging=false/true on the storage and verify the presence/absence of x-amz-tagging in the browser's Network tab.

Is there a release notes or changelog update needed for this change?:

Yes — CHANGELOG.md covers the new use cases, the DataverseApiAuthMechanism re-export, the tagging behaviour change (the SDK still defaults to x-amz-tagging: dv-state=temp when the server omits the field, so older-server deployments are unaffected; servers that explicitly return an empty tagging value tell the client to skip the header), the DirectUploadClientConfig export, and the DirectUploadClient constructor signature change.

Additional documentation:

  • docs/useCases.md — adds full sections for listDatasetTreeNode and iterateDatasetTreeNode with example code.
  • The matching server PR's doc/sphinx-guides/source/api/native-api.rst documents the underlying endpoint, including the cursor opacity contract.

AI-assistance disclosure

Parts of this work — including the async-generator shape of iterateDatasetTreeNode, the transformer's defensive enum fallback, and the useCases.md documentation — were developed with the help of Claude (Anthropic) via Claude Code. The model was particularly useful for keeping the new code consistent with the SDK's existing use-case / repository / transformer layering and for spotting backwards-compatibility implications of the tagging-from-server change.

Reviewer attention is still required: AI-assisted code is still author-owned, and we've reviewed every diff that landed. Flagging this so reviewers can apply whatever scrutiny they reserve for AI-touched changes.

Copilot AI review requested due to automatic review settings December 5, 2025 14:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds configurable file upload options to support S3-compatible storage systems that may not fully implement all S3 features, particularly object tagging. The implementation introduces a FilesConfig class for runtime configuration with three options: useS3Tagging (to disable S3 tagging headers for incompatible storage), maxMultipartRetries (configurable retry count), and fileUploadTimeoutMs (configurable timeout).

  • Introduces FilesConfig class with static configuration pattern for file upload settings
  • Refactors DirectUploadClient constructor to accept a configuration object instead of a plain number
  • Implements conditional S3 tagging header inclusion based on configuration

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/unit/files/FilesConfig.test.ts New test suite covering FilesConfig initialization and configuration retrieval
test/unit/files/DirectUploadClient.test.ts Updated tests to verify S3 tagging behavior and use new config object pattern
src/files/infra/clients/DirectUploadClient.ts Implements DirectUploadClientConfig interface and conditional S3 tagging in single-part uploads
src/files/index.ts Adds FilesConfig class with lazy initialization pattern for DirectUploadClient and UploadFile instances
CHANGELOG.md Documents new features and breaking changes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CHANGELOG.md Outdated
Comment thread src/files/index.ts Outdated
Comment thread test/unit/files/FilesConfig.test.ts Outdated
Comment thread test/unit/files/DirectUploadClient.test.ts Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@qqmyers
Copy link
Copy Markdown
Member

qqmyers commented Dec 5, 2025

FYI - there's already a dataverse.files..disable-tagging setting - is that the same as the new one here?

@ErykKul
Copy link
Copy Markdown
Contributor Author

ErykKul commented Dec 5, 2025

FYI - there's already a dataverse.files..disable-tagging setting - is that the same as the new one here?

If there is one already, then we do not need the new one. However, I know there is one in Java, I did not know there is one in Javascript too? I made this PR a draft, it is a part of a bigger idea: reusing the SPA parts in other UI's. I think it is something for the tech hours to discuss first. There are other 3 PR's coming to illustrate what I mean (one in the new SPA, one in dvwebloader and one in the JSF frontend). The dvwebloader that uses the SPA file upload can function without these PR's ever being merged, and the dvwebloader one is also optional (it is something that we want to use at KU Leuven, but it is fine if it is not mainstream).

@qqmyers
Copy link
Copy Markdown
Member

qqmyers commented Dec 5, 2025

A tech hour makes sense for the bigger picture.

re: the tagging - the store is configured on the server and the signed URLs either will/won't allow a tag header based on that (calls will fail if they don't match the server setting). Do we need an API so you can discover that setting? (We've recently been adding more settings to the storageDriver GET API).

ErykKul added 7 commits May 4, 2026 12:24
The x-amz-tagging header value is now taken from destination.tagging,
which is populated by the server when tagging is enabled. This removes
the client-side FilesConfig/useS3Tagging flag that duplicated the
backend DISABLE_S3_TAGGING JVM setting and would drift silently.

- FileUploadDestination: add tagging?: string
- fileUploadDestinationsTransformers: pass tagging through both paths
- DirectUploadClient: use destination.tagging as header value; remove
  useS3Tagging field and config option
- files/index.ts: remove FilesConfig class and lazy-init pattern;
  uploadFile is now a plain UploadFile instance
- package.json: scope prettier/eslint scripts to ./src to avoid
  permission errors scanning test/environment/docker-dev-volumes
@ErykKul
Copy link
Copy Markdown
Contributor Author

ErykKul commented May 4, 2026

I reverted the template functional test isolation change from this PR to keep the upload PR scoped.

The failing test appears to be shared-state/flakiness around :root templates, not caused by the upload changes. A proposed fix existed in commit 5a9f204, which isolated SetTemplateAsDefault.test.ts by creating a temporary collection and cleaning it up.

That commit can be reused in a separate PR dedicated to functional test isolation.

@ErykKul ErykKul marked this pull request as ready for review May 4, 2026 18:35
@ErykKul ErykKul marked this pull request as draft May 5, 2026 08:20
ErykKul added 3 commits May 5, 2026 11:55
`npm run format` and `npm run lint:eslint` traverse the whole repo by
default, which fails on systems where `test/environment/docker-dev-volumes`
contains directories owned by container users (e.g. solr/data) and is
not readable by the developer's UID. The pre-commit hook then aborts.

Narrowing the globs to `./src` keeps the formatters and linters running
on what we actually care about — application source — and lets the
pre-commit hook succeed regardless of how container volumes are
provisioned.
Adds DataverseApiAuthMechanism to the existing core/index.ts re-export
alongside ApiConfig so consumers don't have to deep-import it from
`@iqss/dataverse-client-javascript/dist/core/infra/repositories/ApiConfig`.

This is the SDK side of a small two-line change agreed with the
dataverse-frontend reusable-components track: once a prerelease ships
this export, the standalone uploader can import the enum from the
package's public surface. Until then, consumers can keep the deep
import.

Non-breaking additive change.
New use cases backing the paginated dataset version tree endpoint:

  GET /api/datasets/{id}/versions/{versionId}/tree

- listDatasetTreeNode: single-page lookup. Accepts path, limit,
  cursor, include (all/folders/files), order (NameAZ/NameZA),
  includeDeaccessioned, originals.
- iterateDatasetTreeNode: async generator that walks the cursor
  chain so callers can consume one folder's children without
  driving pagination by hand.

Wire format mirrors the backend response 1:1 (folder items carry
optional `counts`, file items add id/size/contentType/access/
checksum/downloadUrl). Order/include parsing falls back to
defaults on unknown values for forward-compat.

Includes Jest unit tests for the use cases and the transformer.
@ErykKul ErykKul force-pushed the feature/configurable-uploads branch from 9771c3b to 397d662 Compare May 5, 2026 10:28
PR #12182 merged on dataverse develop and moved the per-collection
storage-driver endpoint:

  OLD: PUT /api/admin/dataverse/{alias}/storageDriver
  NEW: PUT /api/dataverses/{alias}/storageDriver

The CI integration tests on PR #403 now run against a Dataverse
container that includes the move, so setStorageDriverViaApi was
hitting the old admin path and getting 404, which cascaded into
every dataset/file test that depends on the directUploadTestCollection
having LocalStack as its storage driver.

Fix: update setStorageDriverViaApi to use the new public endpoint.
The endpoint still requires X-Dataverse-Key for write operations
(superuser only), so authentication is unchanged.
@ErykKul ErykKul changed the title feat: add configurable file upload options and related tests Tree node listing helpers + server-authoritative S3 tagging May 5, 2026
- docs/useCases.md: add 'List a Folder of a Dataset Version (Tree View)'
  and 'Iterate a Folder of a Dataset Version (Tree View)' under Datasets
  read use cases, with example calls and notes on cursor / ETag /
  ordering. Adds matching TOC entries.
- CHANGELOG.md (Unreleased): add a one-line note about re-exporting
  DataverseApiAuthMechanism from the public surface so the standalone
  reusable-component bundles can import it without a deep path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants