Skip to content

Conversation

frcroth
Copy link
Contributor

@frcroth frcroth commented Sep 10, 2025

Description

  • If configured, uploaded datasets are forwarded to S3 after unzipping
  • layersToLink now reference the source dataset by datasetId (this introduces api version 11)
  • All newly uploaded datasets are virtual (meaning that the on-disk datasource-properties.json is removed and the database entry is the source of truth)
  • layersToLink no longer creates symlinks but rather just entries in the DB, just like it works for uploadToPaths
  • Raw uploaded data is backed up inside of the .trash folder for debugging purposes. It gets cleared regularly anyway.

Steps to test:

  • Configure S3 uploads using appropriate credentials in the application.conf
  • Upload a dataset
  • View that dataset
  • Also test that local filesystem case did not break
  • Test uploading zipped dataset and non-zipped dataset
  • Test tif upload with yarn enable-jobs and a worker
  • test upload from libs (ideally with layersToLink)

TODOs:

  • Handle uploading larger files with multipart upload
  • Fix uploaded files having "aws-chunked" encoding
  • Push updated application.conf without credentials
  • Test with more datasets
  • Do not upload files that are not referenced in layers (e.g. datasource properties.json files)
  • use UPath for everything
  • is there still a case where uploaded datasets should not be virtual? → no!
  • Make all uploaded datasets virtual
  • unify layers to link mechanism
  • api versioning (layersToLink adapter)
  • re-test with s3
  • delete local stuff or move to .trash?
  • make sure both old and new convert_to_wkw variants are supported

Issues:


  • Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
  • Added migration guide entry if applicable (edit the same file as for the changelog)
  • Removed dev-only changes
  • Considered common edge cases
  • Needs datastore update after deployment

Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

📝 Walkthrough

Walkthrough

Refactors upload and linking flows to be datasetId-centric, introduces LayerToLinkService, adds S3 upload config and AWS transfer-manager dependency, removes symlink-based linking and DatasetSymlinkService, adds Slack failure notification, bumps API to v12, and updates controllers, services, routes, models, and frontend.

Changes

Cohort / File(s) Summary of changes
WK remote controller
app/controllers/WKRemoteDataStoreController.scala
Switched reserve/report/delete handlers to use datasetId: ObjectId; replaced local layer validation with layerToLinkService.validateLayerToLink; changed constructor deps and JSON shapes; removed legacy imports and unused analytics/mailchimp/slack dependencies.
Dataset controllers & creation
app/controllers/DatasetController.scala, app/models/dataset/ComposeService.scala, app/models/dataset/Dataset.scala
Replaced createVirtualDataset usage with createAndSetUpDataset (folderId → ObjectId, added isVirtual), injected UsedStorageService, added DatasetDAO.makeVirtual, and added post-create tracking/storage refresh calls.
Dataset services & linking
app/models/dataset/DatasetService.scala, app/models/dataset/DatasetUploadToPathsService.scala, app/models/dataset/LayerToLinkService.scala
Added LayerToLinkService to validate/resolve/link layers by datasetId; delegated linking to it; removed symlink helpers; DatasetService signature changed and trackNewDataset added.
Upload service & models
webknossos-datastore/.../services/uploading/UploadService.scala, webknossos-datastore/.../services/uploading/*
Made upload flows datasetId-aware (reserve/finish/cancel), persist datasetId in Redis, introduced LinkedLayerIdentifier, ReportDatasetUploadParameters, UploadInformation, CancelUploadInformation; finishUpload now returns ObjectId; added S3 transfer-manager wiring and S3-related helpers; removed LegacyLinkedLayerIdentifier usage.
Datastore controllers & remote client
webknossos-datastore/.../controllers/DataSourceController.scala, webknossos-datastore/.../services/DSRemoteWebknossosClient.scala
Replaced DataSource terminology with datasetId across reserve/finish/cancel/delete flows; updated remote client signatures to use ObjectId and ReportDatasetUploadParameters; added Slack failure notification hook in finish flow.
Access & tokens
webknossos-datastore/.../services/AccessTokenService.scala
Adjusted UserAccessRequest factory surface (added administrateDataSources/readDataSources, removed deleteDataSource/writeDataSource) and removed assertUserAccess helper.
Datastore config & deps
conf/application.conf, webknossos-datastore/.../DataStoreConfig.scala, project/Dependencies.scala, unreleased_changes/8912.md
Added datastore.s3Upload config and DataStoreConfig mapping; centralized AWS SDK version (awsVersion) and added s3-transfer-manager dependency; docs updated for S3 upload config.
Datastore internals & helpers
webknossos-datastore/.../datavault/S3DataVault.scala, webknossos-datastore/.../services/DataSourceService.scala, webknossos-datastore/.../models/datasource/DataLayer.scala, webknossos-datastore/.../helpers/DirectoryConstants.scala, webknossos-datastore/.../models/datasource/DataSourceStatus.scala
Exposed hostBucketFromUri publicly; added resolvePathsInNewBasePath; added attachmentMapping param to StaticLayer.mapped; added unpackedDir = ".unpacked" constant; added DataSourceStatus.notImportedYet.
Removed service
webknossos-datastore/.../services/uploading/DatasetSymlinkService.scala
Removed DatasetSymlinkService and its public API (symlink-based linking deleted).
Legacy API & adapters
webknossos-datastore/.../controllers/DSLegacyApiController.scala, webknossos-datastore/conf/datastore.versioned.routes
Added DSLegacyApiController.reserveUploadV11() and legacy payload types (LegacyReserveUploadInformation, LegacyLinkedLayerIdentifier) to adapt old clients to new datasetId-based flow.
Routing & API versioning
conf/webknossos.latest.routes, conf/webknossos.versioned.routes, webknossos-tracingstore/conf/tracingstore.versioned.routes, util/src/main/scala/com/scalableminds/util/mvc/ApiVersioning.scala
reportDatasetUpload route now expects datasetId; added v12 routing entries and bumped CURRENT_API_VERSION to 12.
Frontend
frontend/javascripts/admin/dataset/dataset_upload_view.tsx
Propagates needsConversion into reserve upload payload.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • MichaelBuessemeyer

Poem

I nibbled code in twilight's bed,
Layers linked by ids instead of thread.
No symlinks carved — paths set with care,
Carrots to S3, Slack gives a flare.
v12 hops on — the rabbit cheers! 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning The PR also introduces broad refactors and new features not tied to the four linked issues, including analytics/Mailchimp/Slack side-effects in DatasetService, public API changes in AccessTokenService, new DataLayerAttachments mapping logic, ComposeService updates, and other controllers and DAOs unrelated to S3 uploads or layer-to-link behavior. Please isolate or remove functionality unrelated to S3 uploading or layer linking—such as analytics integrations, attachment mapping, and access token API changes—into separate PRs to keep scope focused on the linked issues.
✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly describes the primary enhancement of forwarding uploaded datasets to S3, aligning with the main objective of enabling S3-compatible object storage support.
Linked Issues Check ✅ Passed The PR implements S3 upload support via new configuration blocks, AWS SDK wiring, and S3 transfer logic (#8415); adds a Slack failure notification path through DSSlackNotificationService and controller hooks (#8934); replaces legacy layer identifiers with datasetId-based LinkedLayerIdentifier types, adapts API versioning to v12, and provides backward compatibility (#8893); and removes filesystem symlink creation in favor of database-driven layer linking via LayerToLinkService (#8749).
Description Check ✅ Passed The description clearly outlines the key changes—S3 uploads, layer-to-link API updates, virtual dataset behavior, symlink removal, and backup logic—and provides relevant testing steps and issue references, demonstrating strong alignment with the actual changes.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch s3-upload

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9efac9a and 8d51044.

📒 Files selected for processing (3)
  • app/models/dataset/Dataset.scala (2 hunks)
  • conf/application.conf (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • conf/application.conf
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
🧰 Additional context used
🧬 Code graph analysis (1)
app/models/dataset/Dataset.scala (2)
app/utils/sql/SecuredSQLDAO.scala (3)
  • readAccessQuery (23-36)
  • existingCollectionName (16-16)
  • assertUpdateAccess (36-48)
app/utils/sql/SqlInterpolation.scala (2)
  • q (20-39)
  • asUpdate (74-74)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
🔇 Additional comments (1)
app/models/dataset/Dataset.scala (1)

446-458: Remove unnecessary caller verification. Renaming the parameter from name to directoryName is an internal change; all call sites use positional arguments and remain valid.

Likely an incorrect or invalid review comment.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@frcroth frcroth self-assigned this Sep 10, 2025
@frcroth frcroth changed the title Implement rudimentary upload of datasets to S3 Upload datasets to S3 Sep 10, 2025
@frcroth frcroth requested a review from fm3 September 12, 2025 10:05
@frcroth frcroth marked this pull request as ready for review September 12, 2025 10:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (7)
unreleased_changes/8912.md (1)

7-17: Specify language for fenced config block to satisfy markdownlint and improve readability.

Use hocon for syntax highlighting.

-```
+```hocon
 s3Upload {
     enabled = true
     # Use the prefix / name of a credential in the dataVaults section here to use it for uploads.
     credentialName = "s3://fsn1.your-objectstorage.com/webknossos-test/upload-tests/"
     endpoint = "https://fsn1.your-objectstorage.com"
     bucketName = "webknossos-test"
     objectKeyPrefix = "upload-tests"
   }

</blockquote></details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)</summary><blockquote>

`70-71`: **Include S3Upload and DataVaults in children for consistency with nested config traversal/logging.**

Prevents omissions in config dumping/validation that rely on children.


```diff
-    val children = List(WebKnossos, WatchFileSystem, Cache, AdHocMesh, Redis, AgglomerateSkeleton)
+    val children = List(WebKnossos, WatchFileSystem, Cache, AdHocMesh, Redis, AgglomerateSkeleton, DataVaults, S3Upload)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)

525-540: Surface partial-failure details from TransferManager.

You check failedTransfers.isEmpty which is good. Consider logging individual failures (key and exception) to aid debugging large uploads.


818-823: Exclude the S3 staging root from orphan cleanup.

cleanUpOrphanUploadsForOrga will treat the .cloud directory itself as an “orphan” and delete/recreate it, adding churn. Filter it out.

Apply:

-        uploadDirs <- PathUtils.listDirectories(orgaUploadingDir, silent = false).toFox
-        _ <- Fox.serialCombined(uploadDirs) { uploadDir =>
+        uploadDirs <- PathUtils.listDirectories(orgaUploadingDir, silent = false).toFox
+        uploadDirsFiltered = uploadDirs.filterNot(_.getFileName.toString == uploadToS3Dir)
+        _ <- Fox.serialCombined(uploadDirsFiltered) { uploadDir =>

398-422: S3 path update + cleanup sequence: add best-effort rollback.

If updateDataSource fails after a successful S3 upload, the local unpack dir is already deleted. Consider best-effort cleanup (e.g., delete uploaded S3 prefix) or tag the dataset with a “failed upload” signal to enable ops cleanup.


254-266: Path traversal check could be more robust.

startsWith("../") on a relativized path is brittle. Prefer normalizing and ensuring the target stays within uploadDir.

Apply:

-  private def isOutsideUploadDir(uploadDir: Path, filePath: String): Boolean =
-    uploadDir.relativize(uploadDir.resolve(filePath)).startsWith("../")
+  private def isOutsideUploadDir(uploadDir: Path, filePath: String): Boolean = {
+    val target = uploadDir.resolve(filePath).normalize()
+    !target.startsWith(uploadDir)
+  }

826-853: Orphan upload cleanup at startup: OK; consider periodic metrics.

You may want a counter for deleted orphans to surface drift.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abcdb07 and 5c64063.

📒 Files selected for processing (12)
  • app/controllers/WKRemoteDataStoreController.scala (1 hunks)
  • app/models/dataset/DatasetService.scala (2 hunks)
  • conf/application.conf (1 hunks)
  • conf/webknossos.latest.routes (1 hunks)
  • project/Dependencies.scala (2 hunks)
  • unreleased_changes/8912.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (4 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DirectoryConstants.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (13 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)
util/src/main/scala/com/scalableminds/util/tools/ConfigReader.scala (1)
  • get (23-26)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (3)
  • isEmpty (21-21)
  • DatasetLayerAttachments (13-22)
  • DatasetLayerAttachments (24-27)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
  • DataVaultService (21-30)
  • DataVaultService (32-69)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (411-511)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (2)
  • toString (38-38)
  • toMagLiteral (40-43)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (3)
  • coordinateTransformations (89-89)
  • defaultViewConfiguration (101-101)
  • attachments (107-107)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (3)
  • PathUtils (14-14)
  • ensureDirectoryBox (124-132)
  • deleteDirectoryRecursively (186-193)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
  • DataStoreConfig (10-79)
  • Datastore (17-71)
  • S3Upload (63-69)
  • DataVaults (60-62)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataStoreRedisStore.scala (1)
  • DataStoreRedisStore (8-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (1)
  • RemoteSourceDescriptorService (20-177)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RedisTemporaryStore.scala (2)
  • insert (55-66)
  • remove (82-87)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
  • prependAllPaths (369-418)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
  • updateDataSource (139-147)
  • getCredential (194-205)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
  • DataStoreConfig (10-79)
  • Datastore (17-71)
  • S3Upload (63-69)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • isKnownUpload (175-178)
  • reserveUpload (192-230)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
  • reserveDataSourceUpload (130-139)
🪛 markdownlint-cli2 (0.17.2)
unreleased_changes/8912.md

8-8: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (19)
conf/webknossos.latest.routes (1)

290-290: Whitespace-only change — OK.

No functional impact.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DirectoryConstants.scala (1)

7-7: Constant addition looks good.

Consistent with hidden-dot convention for internal dirs.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)

63-69: Config group addition — OK.

Fields align with application.conf.

app/models/dataset/DatasetService.scala (2)

101-102: Propagation of isVirtual — OK.

Correctly forwards to createDataset.


91-93: Signature change verified — call sites updated.

DatasetService.scala now requires isVirtual; the only invocation found (app/controllers/WKRemoteDataStoreController.scala) passes uploadInfo.isVirtual.getOrElse(false).

project/Dependencies.scala (1)

8-8: Pinning awsVersion and adding s3-transfer-manager — artifacts available.
Verified software.amazon.awssdk:s3 and software.amazon.awssdk:s3-transfer-manager v2.32.24 are present on Maven Central (HTTP 200).

app/controllers/WKRemoteDataStoreController.scala (1)

88-94: createPreliminaryDataset now accepts isVirtual; callsites verified

Signature in app/models/dataset/DatasetService.scala includes isVirtual: Boolean; the sole callsite found (app/controllers/WKRemoteDataStoreController.scala) passes uploadInfo.requireUniqueName.getOrElse(false) and uploadInfo.isVirtual.getOrElse(false), preserving the legacy default false — no other usages found.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)

415-507: Attachment path rewrites included in mapped(): LGTM.

Adding attachmentMapping and propagating it across all concrete copies is correct and unlocks URI rewrites for attachments alongside mags.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)

70-74: Constructor wiring for DataStoreConfig is fine.

No issues; enables controller-level config decisions.


147-149: Manual uploads: forcing isVirtual = false is appropriate.

Matches the “on-disk” manual workflow.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (9)

384-386: Back-compat: datasetId missing in Redis for older reservations.

getDatasetIdByUploadId(uploadId) will fail for uploads reserved before this PR. Add a fallback (e.g., resolve via WK by directoryName) to let those finish.

Would you like me to add a DSRemoteWebknossosClient.findDatasetId(...) fallback here?


454-462: Path allowlist check for explored datasources: good guard.

Ensures URIs after exploration are permissible.


183-186: Staging dir for S3 uploads: OK.

Using ${uploadingDir}/$uploadToS3Dir/$uploadId isolates S3 prep; no functional issues spotted.

Also applies to: 586-599


531-539: Content-encoding workaround comment.

The checksum setting workaround for “aws-chunked” is pragmatic. Please confirm with a small integration test that uploaded objects are readable without special headers.


269-279: Chunk handling and bounds checks: solid.

Includes size guard via totalFileSizeInBytes and idempotent chunk writes.

Also applies to: 315-336


409-417: Delete local staging after S3 push: good.

Saves disk and avoids double-storage.


621-641: Layer linking rename switch is clean.

Covers all concrete layer types.


760-789: ZIP vs. raw layout handling is careful.

Truncating common prefixes and excluding known layer prefixes avoids extra nesting.


855-861: Helper to parse JSON from Redis: OK.

Minimal and typed.

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for diving into this! I had a first look and added some comments.

@normanrz maybe you would like to have a look as well, especially at some of the design decisions? Does this fit with our multi-bucket setup? Is it ok that we can’t show datastore-to-s3 upload progress to the user?

I did not yet do any testing.

Note that we’ll have to see what is merged first, this or #8844 – if the latter, we could use UPath stuff here as well.

Also, I guess the worker has to be adapted to write the output of the convert_to_wkw job to S3 as well. But maybe that can be deployed separately from this change?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (2)

162-173: Public helper should handle regional virtual-hosted endpoints (bucket.s3.us-xyz-1.amazonaws.com).

hostBucketFromUri fails for regional virtual-hosted style; making it public risks propagating this limitation. Extract bucket using the first occurrence of ".s3" and normalize path handling.

Apply:

-  def hostBucketFromUri(uri: URI): Option[String] = {
-    val host = uri.getHost
-    if (isShortStyle(uri)) { // assume host is omitted from uri, shortcut form s3://bucket/key
-      Some(host)
-    } else if (isVirtualHostedStyle(uri)) {
-      Some(host.substring(0, host.length - ".s3.amazonaws.com".length))
-    } else if (isPathStyle(uri)) {
-      Some(uri.getPath.substring(1).split("/")(0))
-    } else {
-      None
-    }
-  }
+  def hostBucketFromUri(uri: URI): Option[String] = {
+    val host = Option(uri.getHost).getOrElse("")
+    if (isShortStyle(uri)) { // s3://bucket/key
+      Some(host)
+    } else if (isVirtualHostedStyle(uri)) { // bucket.s3[.region].amazonaws.com
+      val idx = host.indexOf(".s3")
+      if (idx > 0) Some(host.substring(0, idx)) else None
+    } else if (isPathStyle(uri)) { // s3[.region].amazonaws.com/bucket/key OR non-AWS host
+      Some(uri.getPath.stripPrefix("/").split("/")(0))
+    } else None
+  }

175-186: Broaden endpoint detection for AWS styles; escape regex dots.

Covers regional and dualstack variants; current endsWith(".s3.amazonaws.com") is too narrow.

Apply:

-  private def isVirtualHostedStyle(uri: URI): Boolean =
-    uri.getHost.endsWith(".s3.amazonaws.com")
+  private def isVirtualHostedStyle(uri: URI): Boolean =
+    uri.getHost.matches(""".+\.s3([.-][\w\-]+)?\.amazonaws\.com""")
 
-  private def isPathStyle(uri: URI): Boolean =
-    uri.getHost.matches("s3(.[\\w\\-_]+)?.amazonaws.com") ||
-      (!uri.getHost.contains("amazonaws.com") && uri.getHost.contains("."))
+  private def isPathStyle(uri: URI): Boolean =
+    uri.getHost.matches("""s3([.-][\w\-]+)?\.amazonaws\.com""") ||
+      (!uri.getHost.contains("amazonaws.com") && uri.getHost.contains("."))
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)

186-188: Exclude hidden staging dir from orphan cleanup.

.cloud under uploading/ can be mistaken as an orphan and deleted by cleanUpOrphanUploadsForOrga. Skip dot-prefixed dirs in that sweep.

Proposed check inside cleanUpOrphanUploadsForOrga before trying to delete:

  • if uploadDir.getFileName.toString.startsWith("."), continue.

510-523: Bucket/endpoint derivation assumptions are brittle.

  • s3UploadBucketOpt depends on credentialName being an S3 URI containing a bucket segment.

  • s3UploadEndpoint forces HTTPS; some S3-compatible stores are HTTP-only.

  • Validate credentialName format at startup and fail fast with a clear message.

  • Consider reusing S3DataVault.getAmazonS3Client (with injected WSClient) for consistent endpoint/region handling, or make protocol configurable.


524-541: Region is hardcoded to us-east-1.

This can break signing for AWS regional endpoints. Either parse from host (like S3DataVault) or make it configurable.

Short-term: add a datastore.s3Upload.region with default us-east-1. Long-term: parse via AwsHostNameUtils.parseSigningRegion.


575-594: Edge case: depth limit may keep unused files.

maxDepth = 10 may miss deep trees, leaving unreferenced files to be uploaded.

  • Consider unlimited traversal or a higher bound, or compute neededPaths first and filter by prefix without full recursion.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc0509d and acc56ba.

📒 Files selected for processing (5)
  • conf/application.conf (1 hunks)
  • unreleased_changes/8912.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (12 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
  • conf/application.conf
🧰 Additional context used
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
  • DataStoreConfig (10-77)
  • Datastore (17-69)
  • S3Upload (63-67)
  • DataVaults (60-62)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
  • S3DataVault (45-149)
  • S3DataVault (151-244)
  • getCredential (140-140)
  • hostBucketFromUri (162-173)
  • create (152-160)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
  • UnfinishedUpload (7-16)
  • UnfinishedUpload (18-20)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
  • DSRemoteWebknossosClient (57-225)
  • updateDataSource (139-147)
  • reportDataSource (84-89)
  • getCredential (194-205)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
🪛 markdownlint-cli2 (0.17.2)
unreleased_changes/8912.md

8-8: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: frontend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests

@frcroth frcroth removed their assignment Sep 17, 2025
@fm3 fm3 self-assigned this Sep 22, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

511-531: Consider making S3 client initialization more robust

The current S3 client initialization could be improved to handle credential failures more gracefully.

Following up on the previous comment about credentials, consider using Box for better error handling:

-  private lazy val s3ClientBox: Box[S3AsyncClient] = for {
-    accessKeyId <- Box(s3UploadCredentialsOpt.map(_._1))
-    secretAccessKey <- Box(s3UploadCredentialsOpt.map(_._2))
-  } yield
+  private lazy val s3ClientBox: Box[S3AsyncClient] = 
+    s3UploadCredentialsOpt match {
+      case Some((accessKeyId, secretAccessKey)) if accessKeyId.nonEmpty && secretAccessKey.nonEmpty =>
+        Full(
           S3AsyncClient
             .builder()
             .credentialsProvider(
               StaticCredentialsProvider.create(
                 AwsBasicCredentials.builder.accessKeyId(accessKeyId).secretAccessKey(secretAccessKey).build()
               ))
             .crossRegionAccessEnabled(true)
             .forcePathStyle(true)
             .endpointOverride(s3UploadEndpoint)
             .region(Region.US_EAST_1)
             // Disabling checksum calculation prevents files being stored with Content Encoding "aws-chunked".
             .requestChecksumCalculation(RequestChecksumCalculation.WHEN_REQUIRED)
             .build()
+        )
+      case _ =>
+        Failure(s"S3 upload enabled but no valid credentials found for ${dataStoreConfig.Datastore.S3Upload.credentialName}")
+    }

   private lazy val transferManagerBox: Box[S3TransferManager] = for {
     client <- s3ClientBox
   } yield S3TransferManager.builder().s3Client(client).build()

Also, consider making the S3 region configurable instead of hardcoding Region.US_EAST_1.


562-580: Consider optimizing file filtering logic

The getNonReferencedFiles method could be improved for clarity and efficiency.

Consider extracting the implicit path construction logic:

 private def getNonReferencedFiles(dataDir: Path, dataSource: DataSource): Fox[List[Path]] =
   for {
     usableDataSource <- dataSource.toUsable.toFox ?~> "Data source is not usable"
-    explicitPaths: Set[Path] = usableDataSource.dataLayers
+    implicitPaths: Set[Path] = usableDataSource.dataLayers
       .flatMap(layer =>
         layer.mags.map(mag =>
           mag.path match {
             case Some(_) => None
             case None    => Some(dataDir.resolve(List(layer.name, mag.mag.toMagLiteral(true)).mkString("/")))
         }))
       .flatten
       .toSet
-    neededPaths = usableDataSource.dataLayers
+    explicitPaths = usableDataSource.dataLayers
       .flatMap(layer => layer.allExplicitPaths)
       .map(dataDir.resolve)
-      .toSet ++ explicitPaths
+      .toSet
+    neededPaths = explicitPaths ++ implicitPaths
     allFiles <- PathUtils.listFilesRecursive(dataDir, silent = true, maxDepth = 10).toFox
     filesToDelete = allFiles.filterNot(file => neededPaths.exists(neededPath => file.startsWith(neededPath)))
   } yield filesToDelete
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between acc56ba and 36533df.

📒 Files selected for processing (10)
  • app/controllers/WKRemoteDataStoreController.scala (1 hunks)
  • app/models/dataset/DatasetService.scala (1 hunks)
  • conf/application.conf (1 hunks)
  • project/Dependencies.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (12 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • conf/application.conf
  • app/controllers/WKRemoteDataStoreController.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
  • app/models/dataset/DatasetService.scala
  • project/Dependencies.scala
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
🧬 Code graph analysis (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
  • DataStoreConfig (11-78)
  • Datastore (18-70)
  • S3Upload (64-68)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • isKnownUpload (164-167)
  • reserveUpload (181-219)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
  • reserveDataSourceUpload (129-138)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (3)
  • coordinateTransformations (85-85)
  • attachments (103-103)
  • defaultViewConfiguration (97-97)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/PathSchemes.scala (1)
  • PathSchemes (3-9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (102-131)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (4)
  • PathUtils (14-14)
  • ensureDirectoryBox (124-132)
  • deleteDirectoryRecursively (186-193)
  • listFilesRecursive (104-108)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
  • DataStoreConfig (11-78)
  • Datastore (18-70)
  • S3Upload (64-68)
  • DataVaults (61-63)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
  • S3DataVault (46-155)
  • S3DataVault (157-250)
  • getCredential (143-143)
  • hostBucketFromUri (168-179)
  • create (158-166)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
  • DSRemoteWebknossosClient (56-222)
  • updateDataSource (138-144)
  • reportDataSource (83-88)
  • getCredential (191-202)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
🔇 Additional comments (7)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)

102-129: LGTM! Well-structured attachment mapping support

The addition of the attachmentMapping parameter with a default identity function is clean and maintains backward compatibility. The implementation correctly applies the mapping to both StaticColorLayer and StaticSegmentationLayer branches.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)

345-414: Persist and use isVirtual from reservation to ensure consistency

The finishUpload method currently determines S3 vs. local storage using dataStoreConfig.Datastore.S3Upload.enabled at Line 353. This could lead to inconsistent behavior if the configuration changes between reservation and finish.


390-395: Replace with Instant.logSince for consistency

Use the utility method for consistent timing logs.


387-406: Add cleanup on S3 upload failure

If the S3 upload fails between lines 393-405, local temporary data remains and the pre-created dataset may linger without proper cleanup.


489-510: Fail fast if S3 credentials are missing

The lazy initialization of s3UploadCredentialsOpt at line 489 uses getOrElse(("", "")) pattern which silently creates a client with empty credentials, leading to confusing runtime failures.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)

99-105: Good: isVirtual set from config at reservation time

Setting shouldBeVirtual once and enforcing it via request.body.copy(isVirtual = Some(shouldBeVirtual)) is the right approach. The controller correctly captures the configuration state at reservation time.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)

254-302: URI construction with null scheme creates invalid relative paths

The replaceUri function creates URIs with null schemes when prepending paths, which produces relative path strings that may not be properly handled by S3 clients and could cause path resolution issues.

Apply this refactor to properly handle URI construction and path normalization:

-  // Prepend newBasePath to all (relative) paths in mags and attachments of the data source.
-  def prependAllPaths(dataSource: DataSource, newBasePath: String): Fox[DataSource] = {
-    val replaceUri = (uri: URI) => {
-      val isRelativeFilePath = (uri.getScheme == null || uri.getScheme.isEmpty || uri.getScheme == PathSchemes.schemeFile) && !uri.isAbsolute
-      uri.getPath match {
-        case pathStr if isRelativeFilePath =>
-          new URI(uri.getScheme,
-                  uri.getUserInfo,
-                  uri.getHost,
-                  uri.getPort,
-                  newBasePath + pathStr,
-                  uri.getQuery,
-                  uri.getFragment)
-        case _ => uri
-      }
-    }
+  // Prepend newBasePath to all relative paths in mags and attachments of the data source.
+  def prependAllPaths(dataSource: DataSource, newBasePath: String): Fox[DataSource] = {
+    def ensureTrailingSlash(s: String): String = 
+      if (s.endsWith("/")) s else s + "/"
+    
+    def stripLeadingSlash(s: String): String = 
+      if (s.startsWith("/")) s.drop(1) else s
+    
+    val baseUri = new URI(ensureTrailingSlash(newBasePath))
+    
+    def prependToPath(pathStr: String): String = 
+      baseUri.resolve(stripLeadingSlash(pathStr)).toString
+    
+    def replaceUri(uri: URI): URI = {
+      val scheme = Option(uri.getScheme).getOrElse("")
+      // Only rewrite relative, non-opaque URIs (no scheme or file scheme)
+      if (scheme.isEmpty || scheme == PathSchemes.schemeFile) {
+        new URI(prependToPath(uri.getPath))
+      } else {
+        uri
+      }
+    }

    dataSource.toUsable match {
      case Some(usableDataSource) =>
        val updatedDataLayers = usableDataSource.dataLayers.map {
          case layerWithMagLocators: StaticLayer =>
            layerWithMagLocators.mapped(
              magMapping = mag =>
                mag.path match {
-                  case Some(pathStr) => mag.copy(path = Some(replaceUri(new URI(pathStr)).toString))
+                  case Some(pathStr) => mag.copy(path = Some(prependToPath(pathStr)))
                  // If the mag does not have a path, it is an implicit path, we need to make it explicit.
                  case _ =>
                    mag.copy(
                      path = Some(
-                        new URI(newBasePath)
-                          .resolve(List(layerWithMagLocators.name, mag.mag.toMagLiteral(true)).mkString("/"))
-                          .toString))
+                        baseUri.resolve(List(layerWithMagLocators.name, mag.mag.toMagLiteral(true)).mkString("/")).toString))
              },
              attachmentMapping = attachment =>
                DataLayerAttachments(
                  attachment.meshes.map(a => a.copy(path = replaceUri(a.path))),
                  attachment.agglomerates.map(a => a.copy(path = replaceUri(a.path))),
                  attachment.segmentIndex.map(a => a.copy(path = replaceUri(a.path))),
                  attachment.connectomes.map(a => a.copy(path = replaceUri(a.path))),
                  attachment.cumsum.map(a => a.copy(path = replaceUri(a.path)))
              )
            )
          case layer => layer
        }
        Fox.successful(usableDataSource.copy(dataLayers = updatedDataLayers))
      case None =>
        Fox.failure("Cannot replace paths of unusable datasource")
    }
  }

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
app/models/dataset/ComposeService.scala (1)

62-70: Reintroduce the StaticLayer guard to prevent MatchError

mapped is implemented via a this match { case StaticColorLayer | StaticSegmentationLayer => ... }, so calling it on a non-static DataLayer (e.g., EditableMappingLayer) now blows up with a MatchError. The earlier version short-circuited those cases; this change regresses into a runtime crash. Let’s keep the defensive pattern match.

-      applyCoordinateTransformations = (cOpt: Option[List[CoordinateTransformation]]) =>
-        cOpt match {
-          case Some(c) => Some(c ++ composeLayer.transformations.toList)
-          case None    => Some(composeLayer.transformations.toList)
-      }
-      editedLayer = layer.mapped(name = composeLayer.newName,
-                                 coordinateTransformations =
-                                   applyCoordinateTransformations(layer.coordinateTransformations))
-    } yield editedLayer
+      applyCoordinateTransformations = (cOpt: Option[List[CoordinateTransformation]]) =>
+        cOpt match {
+          case Some(c) => Some(c ++ composeLayer.transformations.toList)
+          case None    => Some(composeLayer.transformations.toList)
+        }
+      editedLayer <- layer match {
+        case staticLayer: StaticLayer =>
+          Fox.successful(
+            staticLayer.mapped(
+              name = composeLayer.newName,
+              coordinateTransformations =
+                applyCoordinateTransformations(staticLayer.coordinateTransformations)
+            )
+          )
+        case _ =>
+          Fox.failure("Only static layers can be composed")
+      }
+    } yield editedLayer
app/models/dataset/DatasetUploadToPathsService.scala (1)

228-233: Bug: reserveAttachmentUploadToPath regenerates directoryName and ignores uploadToPathsInfixOpt
File: app/models/dataset/DatasetUploadToPathsService.scala Lines ~211–218

  • It calls generateDirectoryName on dataset.directoryName instead of reusing the stored name, and omits uploadToPathsInfixOpt, so attachments land under a different path than layers.
    Apply this diff to mirror addPathsToDatasource:
-      newDirectoryName = datasetService.generateDirectoryName(dataset.directoryName, dataset._id)
-      datasetPath = uploadToPathsPrefix / dataset._organization / newDirectoryName
+      orgaDir = uploadToPathsPrefix / dataset._organization
+      datasetParent = uploadToPathsInfixOpt.map(infix => orgaDir / infix).getOrElse(orgaDir)
+      datasetPath = datasetParent / dataset.directoryName
       attachmentPath = generateAttachmentPath(parameters.attachmentName,
                                               parameters.attachmentDataformat,
                                               parameters.attachmentType,
                                               datasetPath / parameters.layerName)
🧹 Nitpick comments (2)
app/models/dataset/DatasetUploadToPathsService.scala (2)

139-142: Normalize requested path prefix before whitelist check.

If clients pass a logically identical but differently-normalized UPath (e.g., relative vs absolute), this equality test may reject it.

Apply this diff:

-        case Some(requested) =>
-          if (configuredPrefixes.contains(requested)) Full(requested)
+        case Some(requested) =>
+          val requestedAbs = requested.toAbsolute
+          if (configuredPrefixes.contains(requestedAbs)) Full(requestedAbs)
           else Failure("Requested path prefix is not in list of configured ones.")

146-156: DRY: centralize dataset parent path construction (optional).

Both addPathsToDatasource and reserveAttachmentUploadToPath compute the same orgaDir/infix logic. Extract a helper to reduce drift.

Add this helper:

private def datasetParentFor(prefix: UPath, organizationId: String): UPath = {
  val orgaDir = prefix / organizationId
  uploadToPathsInfixOpt.map(infix => orgaDir / infix).getOrElse(orgaDir)
}

Then use:

  • In addPathsToDatasource: replace orgaDir/datasetParent with datasetParentFor(uploadToPathsPrefix, organizationId)
  • In reserveAttachmentUploadToPath: replace with datasetParentFor(uploadToPathsPrefix, dataset._organization)

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: CodeRabbit UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 0f9e16546404424413b4b020b1aca1af14b3eb02 and a509ceba4f160452bb55a00bb5117b1a8c36eb8d.

</details>

<details>
<summary>📒 Files selected for processing (4)</summary>

* `app/models/dataset/ComposeService.scala` (2 hunks)
* `app/models/dataset/DatasetUploadToPathsService.scala` (3 hunks)
* `conf/application.conf` (1 hunks)
* `webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala` (4 hunks)

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary>

* conf/application.conf

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧬 Code graph analysis (3)</summary>

<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (4)</summary><blockquote>

<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)</summary>

* `webknossos` (70-72)
* `validateAccessFromTokenContext` (87-94)

</details>
<details>
<summary>util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)</summary>

* `validateJson` (204-209)

</details>
<details>
<summary>util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)</summary>

* `fromFuture` (42-53)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)</summary>

* `getDatasetId` (206-216)

</details>

</blockquote></details>
<details>
<summary>app/models/dataset/ComposeService.scala (2)</summary><blockquote>

<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)</summary>

* `mapped` (104-133)

</details>
<details>
<summary>webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (1)</summary>

* `coordinateTransformations` (85-85)

</details>

</blockquote></details>
<details>
<summary>app/models/dataset/DatasetUploadToPathsService.scala (2)</summary><blockquote>

<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)</summary>

* `LinkedLayerIdentifier` (80-80)
* `LinkedLayerIdentifier` (82-84)

</details>
<details>
<summary>app/models/dataset/LayerToLinkService.scala (1)</summary>

* `addLayersToLinkToDataSource` (32-42)

</details>

</blockquote></details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)</summary>

* GitHub Check: backend-tests
* GitHub Check: frontend-tests
* GitHub Check: build-smoketest-push

</details>

<details>
<summary>🔇 Additional comments (5)</summary><blockquote>

<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (1)</summary><blockquote>

`133-143`: **Legacy manual uploads must set `isVirtual = false`**

Leaving `isVirtual` as `None` makes manual uploads inherit the new default (`true`), so data that lives on disk is marked virtual and never served correctly. Explicitly pass `Some(false)` here.



```diff
-              Some(request.body.requireUniqueName),
-              None
+              Some(request.body.requireUniqueName),
+              Some(false)
app/models/dataset/DatasetUploadToPathsService.scala (4)

23-23: Good: import LinkedLayerIdentifier from uploading module.

This aligns with the new public surface in the datastore uploading package.


26-28: Good: switched to public request types.

Using ReserveDatasetUploadToPaths(ForPreliminary)Request from controllers is consistent with API versioning plans.


41-41: Good: delegate layer linking via injected LayerToLinkService.

Cleaner separation of concerns; avoids duplicating resolve/link logic here.


67-69: Good: delegate layersToLink handling to LayerToLinkService.

This ensures duplicate-name checks and resolution are centralized.

Please confirm i18n keys used inside LayerToLinkService ("dataset.layerToLink.failed", "dataset.duplicateLayerNames") exist in messages files for all supported locales.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)

330-341: Bug: use flatMap instead of map to avoid nested Fox and ensure cleanup runs

Current map returns Fox[Fox[Unit]] and likely skips size enforcement or fails type inference. Use flatMap.

-      _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize =>
-        tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.map(actualFileSize =>
-          if (actualFileSize > maxFileSize.toLong) {
-            cleanUpDatasetExceedingSize(uploadDir, uploadId)
-            Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes")
-          } else Fox.successful(()))
-      }
+      _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize =>
+        tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.flatMap { actualFileSize =>
+          if (actualFileSize > maxFileSize.toLong) {
+            cleanUpDatasetExceedingSize(uploadDir, uploadId)
+          } else Fox.successful(())
+        }
+      }
frontend/javascripts/admin/dataset/dataset_upload_view.tsx (1)

334-347: Add needsConversion to ReserveUploadInformation TS type and backend
The ReserveUploadInformation type in frontend/javascripts/admin/rest_api.ts (currently at lines 1136–1146) must include a needsConversion?: boolean field, and the corresponding backend DTO needs to accept and handle it optional-safely.

🧹 Nitpick comments (5)
app/models/dataset/DatasetService.scala (2)

88-99: Unfinished uploads query: add guardrails for scale and configurability.

  • Consider making the 14‑day lookback configurable (e.g., WkConf) and optionally add a hard LIMIT to avoid large result sets.
  • Ensure there is a supporting compound index matching the filters: (uploaderId, organizationId, isActive, status, created DESC).

100-121: createAndSetUpDataset: good access checks; avoid double‑write for folder if possible.

  • You create with root folder and then immediately update to folderIdWithFallback. To reduce writes and transient states, consider passing the intended folder into createDataset (e.g., new optional param initialFolderId: Option[ObjectId] = None defaulting to org root).
  • The updated DataSourceId with (directoryName, organizationId) looks correct for virtual datasets.
app/controllers/WKRemoteDataStoreController.scala (1)

76-90: isVirtual toggling vs. needsConversion—confirm intent

You set isVirtual = !needsConversion. That matches the note about convert_to_wkw requiring non‑virtual prelim datasets, but conflicts with “all newly uploaded datasets are virtual” as a general rule. Please confirm this exception is intended and documented in the changelog/API notes.

app/controllers/DatasetController.scala (1)

208-215: Marking explore-and-add datasets virtual is correct; consider analytics hook

Setting isVirtual = true here is aligned with “all newly uploaded datasets are virtual.” Optionally, also call trackNewDataset with viaAddRoute = true to keep analytics/Slack consistent across creation paths.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (1)

76-86: Two DSRemoteWebknossosClient injections — consolidate

Both remoteWebknossosClient and dsRemoteWebknossosClient are DSRemoteWebknossosClient. Use a single instance to reduce confusion and wiring complexity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a509ceb and 32ead33.

📒 Files selected for processing (9)
  • app/controllers/DatasetController.scala (7 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (5 hunks)
  • app/models/dataset/ComposeService.scala (2 hunks)
  • app/models/dataset/Dataset.scala (1 hunks)
  • app/models/dataset/DatasetService.scala (4 hunks)
  • app/models/dataset/DatasetUploadToPathsService.scala (4 hunks)
  • frontend/javascripts/admin/dataset/dataset_upload_view.tsx (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (4 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (18 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/models/dataset/ComposeService.scala
🧰 Additional context used
🧬 Code graph analysis (7)
app/models/dataset/Dataset.scala (2)
app/utils/sql/SecuredSQLDAO.scala (1)
  • assertUpdateAccess (36-48)
app/utils/sql/SqlInterpolation.scala (2)
  • q (20-39)
  • asUpdate (74-74)
app/controllers/WKRemoteDataStoreController.scala (9)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (3)
  • AuthorizedAccessContext (19-21)
  • DBAccessContext (31-37)
  • GlobalAccessContext (25-29)
util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (2)
  • ObjectId (11-13)
  • ObjectId (15-66)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (4)
  • DataSource (31-43)
  • UnusableDataSource (45-62)
  • UnusableDataSource (64-66)
  • isUsable (20-20)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
  • DataSourceStatus (3-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)
  • ReportDatasetUploadParameters (70-75)
  • ReportDatasetUploadParameters (76-79)
  • ReserveAdditionalInformation (64-64)
  • ReserveAdditionalInformation (65-68)
  • ReserveUploadInformation (46-59)
  • ReserveUploadInformation (60-62)
app/models/dataset/LayerToLinkService.scala (2)
  • validateLayerToLink (22-32)
  • addLayersToLinkToDataSource (32-42)
app/models/dataset/DatasetService.scala (2)
  • createAndSetUpDataset (100-123)
  • trackNewDataset (548-560)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
app/models/dataset/Dataset.scala (3)
  • findOne (203-210)
  • updateDataSource (633-655)
  • deleteDataset (708-736)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)
  • webknossos (70-72)
  • validateAccessFromTokenContext (87-94)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)
  • LinkedLayerIdentifier (81-81)
  • LinkedLayerIdentifier (83-85)
  • ReserveUploadInformation (46-59)
  • ReserveUploadInformation (60-62)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
  • fromFuture (42-53)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
  • getDatasetId (206-216)
app/models/dataset/DatasetUploadToPathsService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • LinkedLayerIdentifier (81-81)
  • LinkedLayerIdentifier (83-85)
app/models/dataset/LayerToLinkService.scala (1)
  • addLayersToLinkToDataSource (32-42)
app/models/dataset/Dataset.scala (1)
  • makeVirtual (666-672)
app/models/dataset/DatasetService.scala (8)
app/models/user/MultiUser.scala (1)
  • MultiUserDAO (39-181)
app/mail/MailchimpClient.scala (2)
  • MailchimpClient (17-82)
  • tagUser (45-54)
app/mail/MailchimpTag.scala (1)
  • MailchimpTag (5-22)
app/models/analytics/AnalyticsService.scala (2)
  • AnalyticsService (22-97)
  • track (38-46)
app/models/analytics/AnalyticsEvent.scala (1)
  • UploadDatasetEvent (205-218)
app/telemetry/SlackNotificationService.scala (2)
  • SlackNotificationService (11-59)
  • info (30-36)
app/utils/WkConf.scala (1)
  • WkConf (14-296)
app/models/dataset/DataStore.scala (1)
  • findOneByName (118-125)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
  • DataStoreConfig (11-78)
  • Datastore (18-70)
  • S3Upload (64-68)
  • DataVaults (61-63)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
  • S3DataVault (46-155)
  • S3DataVault (157-250)
  • getCredential (143-143)
  • hostBucketFromUri (168-179)
  • create (158-166)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromLocalPath (80-80)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
  • DSRemoteWebknossosClient (57-216)
  • reportUpload (98-105)
  • getCredential (185-196)
  • deleteDataset (138-143)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
  • DataSourceService (25-303)
  • resolvePathsInNewBasePath (253-269)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
app/controllers/DatasetController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • LinkedLayerIdentifier (81-81)
  • LinkedLayerIdentifier (83-85)
app/models/storage/UsedStorageService.scala (2)
  • UsedStorageService (21-103)
  • refreshStorageReportForDataset (89-103)
app/models/dataset/DatasetService.scala (2)
  • createAndSetUpDataset (100-123)
  • trackNewDataset (548-560)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
🔇 Additional comments (15)
app/models/dataset/DatasetService.scala (3)

24-24: LGTM: imports match new telemetry/analytics dependencies.

No issues spotted with these additions.

Also applies to: 26-27, 32-33


47-56: Refined search to locate manual new DatasetService invocations
Please rerun and check for any instantiations that need updating:

#!/bin/bash
rg -nP 'new\s+DatasetService\s*\(' --type scala -C2

If this still yields no results, manually verify any Guice modules or test setups that instantiate DatasetService.


560-570: Gate Slack logging by config and recover failures

  • If conf.SlackNotifications.uri is empty/None, return Fox.successful(()) to skip DB reads.
  • Wrap the inner for in .orElse(Fox.successful(())) (or use a recover) so Slack client errors don’t bubble up.
  • Confirm that conf.SlackNotifications.uri yields an empty string or None when unconfigured.
app/models/dataset/DatasetUploadToPathsService.scala (1)

67-69: Delegating layer linking to LayerToLinkService looks good

Centralizing linking and duplicate-name checks here is cleaner and reduces duplication. LGTM.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)

404-445: Respect configured endpoint object; avoid re-parsing credentialName

Use the normalized s3UploadEndpoint (which should preserve scheme/port) when building S3 base path.

-            endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost
-            newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox
+            val endPointHost = s3UploadEndpoint.getAuthority // includes host[:port]
+            newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox

527-538: Preserve scheme and port for custom S3 endpoints

Hardcoding https and dropping the port breaks MinIO and non‑standard endpoints. Keep original URI.

-  private lazy val s3UploadEndpoint: URI = {
-    val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
-    new URI(
-      "https",
-      null,
-      credentialUri.getHost,
-      -1,
-      null,
-      null,
-      null
-    )
-  }
+  private lazy val s3UploadEndpoint: URI =
+    new URI(dataStoreConfig.Datastore.S3Upload.credentialName)

342-386: Persist and honor isVirtual across finish; don’t rely on live config

Finish currently decides S3 vs local via dataStoreConfig.Datastore.S3Upload.enabled. Persist the chosen isVirtual at reserve and read it here for consistent behavior on retries/restarts.

@@
   private def redisKeyForUploadId(datasourceId: DataSourceId): String =
     s"upload___${Json.stringify(Json.toJson(datasourceId))}___datasourceId"
+  private def redisKeyForIsVirtual(uploadId: String): String =
+    s"upload___${uploadId}___isVirtual"
@@ def reserveUpload(...)
-      _ <- runningUploadMetadataStore.insert(
+      _ <- runningUploadMetadataStore.insert(
         redisKeyForDataSourceId(reserveUploadInfo.uploadId),
         Json.stringify(Json.toJson(newDataSourceId))
       )
+      _ <- runningUploadMetadataStore.insert(
+        redisKeyForIsVirtual(reserveUploadInfo.uploadId),
+        reserveUploadInfo.isVirtual
+          .orElse(reserveUploadInfo.needsConversion.map(nc => !nc)) // fallback: virtual if no conversion
+          .getOrElse(dataStoreConfig.Datastore.S3Upload.enabled)
+          .toString
+      )
@@ def finishUpload(...)
-      dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId)
+      isVirtualOpt <- runningUploadMetadataStore.find(redisKeyForIsVirtual(uploadId)).map(_.map(_.toBoolean))
+      uploadToS3 = isVirtualOpt.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled)
+      dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId, uploadToS3)
@@ def cleanUpUploadedDataset(...)
-    removeFromRedis(uploadId)
+    removeFromRedis(uploadId)
@@ def removeFromRedis(uploadId: String)
       _ <- runningUploadMetadataStore.remove(redisKeyForDataSourceId(uploadId))
+      _ <- runningUploadMetadataStore.remove(redisKeyForIsVirtual(uploadId))

And adapt moveUnpackedToTarget signature to accept uploadToS3 and use it instead of reading config.

app/controllers/WKRemoteDataStoreController.scala (1)

128-154: Require dataSource when needsConversion == false (persist and mark usable)

Without this, non‑conversion uploads can leave datasets unusable if dataSourceOpt is omitted. Enforce presence and persist.

       for {
         user <- bearerTokenService.userForToken(token)
         dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext) ?~> Messages("dataset.notFound", datasetId) ~> NOT_FOUND
+        _ <- Fox.fromBool(request.body.needsConversion || request.body.dataSourceOpt.isDefined) ?~>
+             "reportUpload.dataSourceMissing"
         _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset))
         _ <- datasetService.trackNewDataset(dataset,
                                             user,
                                             request.body.needsConversion,
                                             request.body.datasetSizeBytes,
                                             viaAddRoute = false)
         dataSourceWithLinkedLayersOpt <- Fox.runOptional(request.body.dataSourceOpt) {
           implicit val ctx: DBAccessContext = AuthorizedAccessContext(user)
           layerToLinkService.addLayersToLinkToDataSource(_, request.body.layersToLink)
         }
app/controllers/DatasetController.scala (4)

18-18: Consolidate LinkedLayerIdentifier source — good change

Switching to the shared type from uploading services avoids duplication and keeps JSON formats consistent. No action needed.


120-120: API shape change: folderId now ObjectId

DataSourceRegistrationInfo.folderId moved from Option[String] to Option[ObjectId]. Verify all callers (FE and any scripts) send a hex ObjectId and not a plain folder path/string. If older clients exist, provide a small adapter at the boundary.


230-237: createAndSetUpDataset usage looks correct

Passing folderId as ObjectId and setting isVirtual = true matches the new model.


141-141: Make storage‐report refresh best‐effort and retain 200 OK

Wrap the per‐dataset refresh so transient 4xx/5xx errors won’t fail the route after the dataset is marked usable:

-        _ <- usedStorageService.refreshStorageReportForDataset(dataset)
+        // best‐effort: don’t fail the route if storage‐report refresh fails
+        _ <- usedStorageService
+               .refreshStorageReportForDataset(dataset)
+               .recover { case _ => () }

Ensure the failure path of finishUploadToPaths still triggers the Slack notification required by #8934.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (3)

93-117: reserveUploadV11 mapping looks good; keep isVirtual unset

Forwarding requireUniqueName and leaving isVirtual/needsConversion as None is correct for non-manual flows. No changes needed.


119-124: Use datasetDirectoryName when resolving legacy links (prevents 404s)

getDatasetId expects the directory name. Fall back to dataSetName only if datasetDirectoryName is absent. This was flagged earlier and still applies.

Apply:

-      datasetId <- remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId,
-                                                       legacyLayerToLink.dataSetName)
+      val directoryName =
+        legacyLayerToLink.datasetDirectoryName.getOrElse(legacyLayerToLink.dataSetName)
+      datasetId <- remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId, directoryName)

Also add a minimal guard so we fail fast if both organizationId and organizationName are missing:

-  private def adaptLayerToLink(legacyLayerToLink: LegacyLinkedLayerIdentifier): Fox[LinkedLayerIdentifier] =
-    for {
+  private def adaptLayerToLink(legacyLayerToLink: LegacyLinkedLayerIdentifier): Fox[LinkedLayerIdentifier] =
+    for {
+      _ <- Fox.fromBool(legacyLayerToLink.organizationId.isDefined || legacyLayerToLink.organizationName.isDefined)
+            ?~> "organization.missing"
       datasetId <- ...

140-146: Set isVirtual = Some(false) for manual uploads (legacy path)

Manual disk uploads should explicitly mark isVirtual = false to avoid relying on defaults.

Apply:

-              Some(request.body.requireUniqueName),
-              None,
+              Some(request.body.requireUniqueName),
+              Some(false),
               needsConversion = None

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

171-207: Persist and consume isVirtual across reserve/finish; also clean up on move failures

Config flips can misroute data (S3 vs local) and break resumability. Persist isVirtual on reserve, read it at finish, and pass it into moveUnpackedToTarget. Also wrap the move step with cleanup-on-failure, like unpack/post-process. Finally, remove the persisted flag on cleanup.

Apply:

@@
   private def redisKeyForDatasetId(uploadId: String): String =
     s"upload___${uploadId}___datasetId"
+  private def redisKeyForIsVirtual(uploadId: String): String =
+    s"upload___${uploadId}___isVirtual"
@@ def reserveUpload(reserveUploadInfo: ReserveUploadInformation,
       _ <- runningUploadMetadataStore.insert(
         redisKeyForDatasetId(reserveUploadInfo.uploadId),
         Json.stringify(Json.toJson(reserveUploadAdditionalInfo.newDatasetId))
       )
+      _ <- runningUploadMetadataStore.insert(
+        redisKeyForIsVirtual(reserveUploadInfo.uploadId),
+        reserveUploadInfo.isVirtual.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled).toString
+      )
@@ def finishUpload(uploadInformation: UploadInformation)
-      dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId)
+      // Decide target storage based on persisted isVirtual (fallback to live config)
+      isVirtualOpt <- runningUploadMetadataStore.find(redisKeyForIsVirtual(uploadId)).map(_.map(_.toBoolean))
+      uploadToS3 = isVirtualOpt.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled)
+      moveResult <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId, uploadToS3).shiftBox
+      _ <- cleanUpOnFailure(moveResult, datasetId, dataSourceId, needsConversion, label = s"moving dataset to target storage")
+      dataSourceWithAbsolutePathsOpt <- moveResult.toFox
@@
   private def moveUnpackedToTarget(unpackedDir: Path,
                                    needsConversion: Boolean,
-                                   datasetId: ObjectId,
-                                   dataSourceId: DataSourceId): Fox[Option[UsableDataSource]] =
+                                   datasetId: ObjectId,
+                                   dataSourceId: DataSourceId,
+                                   uploadToS3: Boolean): Fox[Option[UsableDataSource]] =
@@
-        newBasePath <- if (dataStoreConfig.Datastore.S3Upload.enabled) {
+        newBasePath <- if (uploadToS3) {
@@
-            endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost
+            endPointHost = s3UploadEndpoint.getHost
             newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox
           } yield newBasePath
         } else {
@@ def removeFromRedis(uploadId: String): Fox[Unit] =
       dataSourceId <- getDataSourceIdByUploadId(uploadId)
       _ <- runningUploadMetadataStore.remove(redisKeyForDataSourceId(uploadId))
       _ <- runningUploadMetadataStore.remove(redisKeyForDatasetId(uploadId))
+      _ <- runningUploadMetadataStore.remove(redisKeyForIsVirtual(uploadId))
       _ <- runningUploadMetadataStore.remove(redisKeyForLinkedLayerIdentifier(uploadId))

Also applies to: 349-392, 410-451, 820-840


337-347: Cleanup on size-exceed must actually run; current call is dropped

The call to cleanUpDatasetExceedingSize is not sequenced and won’t execute before returning a failure.

Apply:

-      _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize =>
-        tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.map(actualFileSize =>
-          if (actualFileSize > maxFileSize.toLong) {
-            cleanUpDatasetExceedingSize(uploadDir, uploadId)
-            Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes")
-          } else Fox.successful(()))
-      }
+      _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize =>
+        tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.flatMap { actualFileSize =>
+          if (actualFileSize > maxFileSize.toLong)
+            cleanUpDatasetExceedingSize(uploadDir, uploadId)
+              .flatMap(_ => Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes"))
+          else Fox.successful(())
+        }
+      }
app/controllers/WKRemoteDataStoreController.scala (1)

210-227: Await DB delete and used-storage refresh

Current code builds a Future chain and discards it; side-effects may never run.

Apply:

-                _ = datasetDAO
-                  .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0)
-                  .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))
+                _ <- datasetDAO
+                  .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0)
+                  .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32ead33 and b48966d.

📒 Files selected for processing (3)
  • app/controllers/DatasetController.scala (7 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (5 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (19 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
app/controllers/WKRemoteDataStoreController.scala (8)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (3)
  • AuthorizedAccessContext (19-21)
  • DBAccessContext (31-37)
  • GlobalAccessContext (25-29)
util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (2)
  • ObjectId (11-13)
  • ObjectId (15-66)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (4)
  • DataSource (31-43)
  • UnusableDataSource (45-62)
  • UnusableDataSource (64-66)
  • isUsable (20-20)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
  • DataSourceStatus (3-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • ReportDatasetUploadParameters (70-75)
  • ReportDatasetUploadParameters (76-79)
app/models/dataset/LayerToLinkService.scala (2)
  • validateLayerToLink (22-32)
  • addLayersToLinkToDataSource (32-42)
app/models/dataset/DatasetService.scala (2)
  • createAndSetUpDataset (100-123)
  • trackNewDataset (548-560)
app/models/dataset/Dataset.scala (3)
  • findOne (203-210)
  • updateDataSource (633-655)
  • deleteDataset (708-736)
app/controllers/DatasetController.scala (2)
app/models/storage/UsedStorageService.scala (2)
  • UsedStorageService (21-103)
  • refreshStorageReportForDataset (89-103)
app/models/dataset/DatasetService.scala (2)
  • createAndSetUpDataset (100-123)
  • trackNewDataset (548-560)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (4)
  • PathUtils (14-14)
  • deleteDirectoryRecursively (186-193)
  • ensureDirectoryBox (124-132)
  • listFilesRecursive (104-108)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (2)
  • DataStoreConfig (11-78)
  • S3Upload (64-68)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
  • S3DataVault (46-155)
  • S3DataVault (157-250)
  • getCredential (143-143)
  • hostBucketFromUri (168-179)
  • create (158-166)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromLocalPath (80-80)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
  • UnfinishedUpload (7-16)
  • UnfinishedUpload (18-20)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
  • DSRemoteWebknossosClient (57-216)
  • reportUpload (98-105)
  • getCredential (185-196)
  • deleteDataset (138-143)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
  • deleteOnDisk (28-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
  • GitHub Check: build-smoketest-push
🔇 Additional comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

431-439: Use normalized endpoint host for S3 base path

Don’t re-parse the config string here; use s3UploadEndpoint to ensure scheme/host/port are consistent.

Apply:

-            endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost
+            endPointHost = s3UploadEndpoint.getHost

533-544: Preserve scheme/port when building S3 endpoint

Hardcoding https and dropping port breaks MinIO/custom endpoints.

Apply:

-  private lazy val s3UploadEndpoint: URI = {
-    val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
-    new URI(
-      "https",
-      null,
-      credentialUri.getHost,
-      -1,
-      null,
-      null,
-      null
-    )
-  }
+  private lazy val s3UploadEndpoint: URI =
+    new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
app/controllers/DatasetController.scala (1)

230-243: trackNewDataset not awaited and viaAddRoute should be true for Add route

Current code doesn’t await side-effects (Slack/analytics) and mislabels the source.

Apply:

-        )
-        _ = datasetService.trackNewDataset(dataset,
-                                           user,
-                                           needsConversion = false,
-                                           datasetSizeBytes = 0,
-                                           viaAddRoute = false)
+        )
+        _ <- datasetService.trackNewDataset(dataset,
+                                            user,
+                                            needsConversion = false,
+                                            datasetSizeBytes = 0,
+                                            viaAddRoute = true)
app/controllers/WKRemoteDataStoreController.scala (1)

125-154: Require dataSource when needsConversion == false; also await tracking

Without enforcing dataSource for non-conversion uploads, datasets remain unusable. Also, trackNewDataset must be awaited.

Apply:

   def reportDatasetUpload(name: String,
                           key: String,
                           token: String,
                           datasetId: ObjectId): Action[ReportDatasetUploadParameters] =
     Action.async(validateJson[ReportDatasetUploadParameters]) { implicit request =>
       dataStoreService.validateAccess(name, key) { _ =>
         for {
           user <- bearerTokenService.userForToken(token)
           dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext) ?~> Messages("dataset.notFound", datasetId) ~> NOT_FOUND
-          _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset))
-          _ = datasetService.trackNewDataset(dataset,
-                                             user,
-                                             request.body.needsConversion,
-                                             request.body.datasetSizeBytes,
-                                             viaAddRoute = false)
+          _ <- Fox.fromBool(request.body.needsConversion || request.body.dataSourceOpt.isDefined) ?~>
+               "reportUpload.dataSourceMissing"
+          _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset))
+          _ <- datasetService.trackNewDataset(dataset,
+                                              user,
+                                              request.body.needsConversion,
+                                              request.body.datasetSizeBytes,
+                                              viaAddRoute = false)
           dataSourceWithLinkedLayersOpt <- Fox.runOptional(request.body.dataSourceOpt) {
             implicit val ctx: DBAccessContext = AuthorizedAccessContext(user)
             layerToLinkService.addLayersToLinkToDataSource(_, request.body.layersToLink)
           }
           _ <- Fox.runOptional(dataSourceWithLinkedLayersOpt) { dataSource =>
             logger.info(s"Updating dataset $datasetId in database after upload reported from datastore $name.")
             datasetDAO.updateDataSource(datasetId,
                                         dataset._dataStore,
                                         dataSource.hashCode(),
                                         dataSource,
                                         isUsable = true)(GlobalAccessContext)
           }
         } yield Ok
       }
     }

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b48966d and 4bfb9a2.

📒 Files selected for processing (3)
  • app/controllers/WKRemoteDataStoreController.scala (6 hunks)
  • app/models/dataset/Dataset.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
app/models/dataset/Dataset.scala (2)
app/utils/sql/SecuredSQLDAO.scala (2)
  • existingCollectionName (16-16)
  • assertUpdateAccess (36-48)
app/utils/sql/SqlInterpolation.scala (2)
  • q (20-39)
  • asUpdate (74-74)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)
  • webknossos (70-72)
  • validateAccessFromTokenContext (87-94)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)
  • LinkedLayerIdentifier (81-81)
  • LinkedLayerIdentifier (83-85)
  • ReserveUploadInformation (46-59)
  • ReserveUploadInformation (60-62)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
  • fromFuture (42-53)
  • successful (53-56)
util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (1)
  • fromStringSync (22-26)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
  • getDatasetId (206-216)
app/controllers/WKRemoteDataStoreController.scala (11)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (3)
  • AuthorizedAccessContext (19-21)
  • DBAccessContext (31-37)
  • GlobalAccessContext (25-29)
util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (2)
  • ObjectId (11-13)
  • ObjectId (15-66)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (4)
  • DataSource (31-43)
  • UnusableDataSource (45-62)
  • UnusableDataSource (64-66)
  • isUsable (20-20)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
  • DataSourceStatus (3-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • ReportDatasetUploadParameters (70-75)
  • ReportDatasetUploadParameters (76-79)
app/models/storage/UsedStorageService.scala (2)
  • UsedStorageService (21-103)
  • refreshStorageReportForDataset (89-103)
app/models/dataset/LayerToLinkService.scala (2)
  • validateLayerToLink (22-32)
  • addLayersToLinkToDataSource (32-42)
app/models/dataset/DatasetService.scala (2)
  • createAndSetUpDataset (100-123)
  • trackNewDataset (548-560)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
app/security/WebknossosBearerTokenAuthenticatorService.scala (1)
  • userForToken (76-84)
app/models/dataset/Dataset.scala (3)
  • findOne (203-210)
  • updateDataSource (634-656)
  • deleteDataset (709-737)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
🔇 Additional comments (2)
app/controllers/WKRemoteDataStoreController.scala (1)

140-151: Re-introduce the dataSource guard for non-conversion uploads.

If needsConversion is false and dataSourceOpt is missing, we leave the dataset backed by the placeholder UnusableDataSource, so it remains unusable despite reporting success. The previous guard prevented this, but it’s gone now. Please add it back before processing optional linkage.

         for {
           user <- bearerTokenService.userForToken(token)
           dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext) ?~> Messages("dataset.notFound", datasetId) ~> NOT_FOUND
+          _ <- Fox.fromBool(request.body.needsConversion || request.body.dataSourceOpt.isDefined) ?~>
+               "reportUpload.dataSourceMissing"
           _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset))
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (1)

127-130: Prefer datasetDirectoryName when available.

remoteWebknossosClient.getDatasetId expects the directory name. When clients supply datasetDirectoryName, we must use it, otherwise legacy uploads break as soon as friendly names differ from directory names.

-          remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId, legacyLayerToLink.dataSetName)
+          val directoryName =
+            legacyLayerToLink.datasetDirectoryName.getOrElse(legacyLayerToLink.dataSetName)
+          remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId, directoryName)

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for implementing this. Including @frcroth :)

Here is my code feedback / questions. Testing is TODO

@fm3 fm3 requested a review from MichaelBuessemeyer October 2, 2025 12:41
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (7)
app/models/dataset/Dataset.scala (1)

667-672: Verify that worker flows use GlobalAccessContext when calling makeVirtual.

This method asserts update access, which could block convert_to_wkw worker paths if they don't use GlobalAccessContext. Confirm that all call sites (especially worker flows) either have update access or use GlobalAccessContext.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)

188-198: Verify that Slack notification is triggered only on failures.

The log(Some(slackNotificationService.noticeFailedFinishUpload)) wrapper may trigger notifications even on successful 200 OK responses. Confirm that the notification is only sent on failures.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)

352-393: Verify cleanup on moveUnpackedToTarget failure and config consistency.

Two concerns:

  1. If moveUnpackedToTarget fails, the local unpackToDir may not be cleaned up. Consider wrapping the move result in cleanUpOnFailure similar to unpack/post-processing steps.
  2. The method uses dataStoreConfig.Datastore.S3Upload.enabled directly instead of reading a persisted isVirtual flag from Redis. If the config changes between reserve and finish, the behavior may be inconsistent. Consider persisting isVirtual at reserve time and reading it here.

410-451: Verify cleanup on S3 upload failure.

If the S3 upload (line 432) or path rewrite (line 446) fails, the local unpackedDir may not be cleaned up. Consider wrapping the S3 branch in a recovery handler that deletes the local directory and calls remoteWebknossosClient.deleteDataset(datasetId) on failure.


546-562: Fail fast on missing S3 credentials.

The current code extracts accessKeyId and secretAccessKey as Box[Option[String]], which can succeed with None values and create an invalid S3 client. Validate s3UploadCredentialsOpt before building the client and fail with a clear error if missing.


533-544: Preserve scheme and port in s3UploadEndpoint.

The endpoint URI construction hardcodes "https" and port -1, which will break non-HTTPS or non-default-port S3 endpoints (e.g., MinIO). Preserve the original scheme and port from credentialName.


436-437: Use s3UploadEndpoint for consistent host extraction.

Line 436 extracts the host directly from the credentialName URI, but s3UploadEndpoint (lines 533-544) normalizes the URI. For consistency, use s3UploadEndpoint.getHost instead of re-parsing credentialName.

Apply this diff:

-            endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost
+            endPointHost = s3UploadEndpoint.getHost
🧹 Nitpick comments (1)
unreleased_changes/8912.md (1)

8-15: Add language identifier to the code block.

The code block should specify a language for proper syntax highlighting and to satisfy markdown linting rules (MD040).

Based on static analysis hints.

Apply this diff:

-```
+```hocon
 s3Upload {
   enabled = false
   # Use the name of a credential in the dataVaults section here to use it for uploads.
   credentialName = "s3://example/uri/prefix"
   objectKeyPrefix = "webknossos-uploads"
 }

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: CodeRabbit UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 4bfb9a225cf601fccc3c5f89bc57de92217dcb12 and 2aebc3497affc6de63adcb7921f52bb5f7970706.

</details>

<details>
<summary>📒 Files selected for processing (6)</summary>

* `app/models/dataset/ComposeService.scala` (2 hunks)
* `app/models/dataset/Dataset.scala` (2 hunks)
* `unreleased_changes/8912.md` (1 hunks)
* `webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala` (7 hunks)
* `webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala` (1 hunks)
* `webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala` (19 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (1)</summary>

<details>
<summary>📚 Learning: 2025-04-28T14:18:04.368Z</summary>

Learnt from: frcroth
PR: #8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, Fox.fromBool() expressions should use the <- binding operator instead of the = assignment operator to properly propagate error conditions. Using = will cause validation failures to be silently ignored.


**Applied to files:**
- `webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala`

</details>

</details><details>
<summary>🧬 Code graph analysis (4)</summary>

<details>
<summary>app/models/dataset/Dataset.scala (3)</summary><blockquote>

<details>
<summary>app/models/project/Project.scala (1)</summary>

* `findOneByNameAndOrganization` (115-125)

</details>
<details>
<summary>app/utils/sql/SecuredSQLDAO.scala (3)</summary>

* `readAccessQuery` (23-36)
* `existingCollectionName` (16-16)
* `assertUpdateAccess` (36-48)

</details>
<details>
<summary>app/utils/sql/SqlInterpolation.scala (2)</summary>

* `q` (20-39)
* `asUpdate` (74-74)

</details>

</blockquote></details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (6)</summary><blockquote>

<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (6)</summary>

* `webknossos` (70-72)
* `validateAccessFromTokenContext` (87-94)
* `UserAccessRequest` (30-30)
* `UserAccessRequest` (31-72)
* `writeDataset` (52-55)
* `deleteDataset` (49-52)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/DSSlackNotificationService.scala (2)</summary>

* `DSSlackNotificationService` (8-31)
* `noticeFailedFinishUpload` (26-31)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)</summary>

* `reserveDataSourceUpload` (123-132)
* `deleteDataset` (138-143)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)</summary>

* `reserveUpload` (171-212)
* `getDatasetIdByUploadId` (168-171)
* `extractDatasetUploadId` (157-157)
* `finishUpload` (352-393)
* `cancelUpload` (321-335)
* `isKnownUpload` (154-157)

</details>
<details>
<summary>util/src/main/scala/com/scalableminds/util/requestlogging/RequestLogging.scala (1)</summary>

* `log` (55-62)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/SlackClient.scala (1)</summary>

* `warn` (18-21)

</details>

</blockquote></details>
<details>
<summary>app/models/dataset/ComposeService.scala (3)</summary><blockquote>

<details>
<summary>app/models/dataset/DatasetService.scala (1)</summary>

* `createAndSetUpDataset` (100-123)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)</summary>

* `mapped` (104-133)

</details>
<details>
<summary>webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (1)</summary>

* `coordinateTransformations` (85-85)

</details>

</blockquote></details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (7)</summary><blockquote>

<details>
<summary>util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)</summary>

* `PathUtils` (14-14)
* `deleteDirectoryRecursively` (186-193)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)</summary>

* `DataStoreConfig` (11-78)
* `Datastore` (18-70)
* `S3Upload` (64-68)
* `DataVaults` (61-63)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)</summary>

* `S3DataVault` (46-182)
* `S3DataVault` (184-277)
* `getCredential` (170-170)
* `hostBucketFromUri` (195-206)
* `create` (185-193)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)</summary>

* `UPath` (54-96)
* `fromLocalPath` (80-80)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)</summary>

* `UnfinishedUpload` (7-16)
* `UnfinishedUpload` (18-20)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)</summary>

* `DSRemoteWebknossosClient` (57-216)
* `reportUpload` (98-105)
* `getCredential` (185-196)
* `deleteDataset` (138-143)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)</summary>

* `CredentialConfigReader` (8-64)

</details>

</blockquote></details>

</details><details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

<details>
<summary>unreleased_changes/8912.md</summary>

8-8: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)</summary>

* GitHub Check: build-smoketest-push
* GitHub Check: frontend-tests

</details>

<details>
<summary>🔇 Additional comments (19)</summary><blockquote>

<details>
<summary>app/models/dataset/ComposeService.scala (2)</summary><blockquote>

`50-55`: **LGTM! Virtual dataset creation aligns with PR objectives.**

The refactor from `createVirtualDataset` to `createAndSetUpDataset` with `isVirtual = true` correctly enforces the virtual dataset model. The `targetFolderId` type change to `ObjectId` is consistent with the method signature and improves type safety.

---

`71-73`: **LGTM! Simplified layer editing using polymorphic `mapped` method.**

The refactor eliminates explicit pattern matching by leveraging the `StaticLayer.mapped` method, which internally handles both `StaticColorLayer` and `StaticSegmentationLayer` types. This reduces code duplication and improves maintainability while preserving the coordinate transformation logic.

</blockquote></details>
<details>
<summary>app/models/dataset/Dataset.scala (1)</summary><blockquote>

`446-458`: **LGTM: Backwards‑compatible lookup by directoryName.**

The method now looks up datasets by `directoryName` instead of `name`, which aligns with the shift to datasetId-centric operations. The comment correctly explains that this is for backwards compatibility with legacy links.

</blockquote></details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (5)</summary><blockquote>

`36-36`: **LGTM: Slack notification service injected.**

The DSSlackNotificationService is properly injected for use in failure logging.




Also applies to: 67-67

---

`100-103`: **LGTM: Reserve flow updated to use for-comprehension.**

The reserve flow now awaits `reserveUploadAdditionalInfo` before calling `uploadService.reserveUpload`, which is more readable and correct.

---

`153-155`: **LGTM: Chunk upload flow updated to datasetId.**

The chunk upload and test chunk flows now correctly use `getDatasetIdByUploadId` and `UserAccessRequest.writeDataset(datasetId)`.




Also applies to: 176-177

---

`202-214`: **LGTM: Cancel upload flow updated to datasetId.**

The cancel upload flow now correctly uses `getDatasetIdByUploadId` and `UserAccessRequest.deleteDataset(datasetId)`.

---

`405-410`: **LGTM: Delete on disk flow updated to datasetId.**

The delete on disk flow now correctly uses `dsRemoteWebknossosClient.deleteDataset(datasetId)` and the log message references `datasetId`.

</blockquote></details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1)</summary><blockquote>

`195-195`: **LGTM: hostBucketFromUri made public for S3 upload support.**

The visibility change allows `UploadService` and other modules to extract bucket names from S3 URIs during the upload workflow.

</blockquote></details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (10)</summary><blockquote>

`46-101`: **LGTM: Data models updated for datasetId‑centric flow.**

The new and updated data models correctly reflect the shift to datasetId-based operations and S3 upload support. JSON formats are properly defined.

---

`103-108`: **LGTM: DataStoreConfig dependency added.**

The `DataStoreConfig` dependency is correctly injected to support S3 upload configuration.

---

`143-144`: **LGTM: Redis key helper added for datasetId.**

The `redisKeyForDatasetId` helper is consistent with existing Redis key patterns.

---

`159-164`: **LGTM: Upload directory helpers updated.**

The upload directory helpers now support the backup feature by creating backup directories in the `.trash` folder.

---

`168-169`: **LGTM: DatasetId retrieval method added.**

The `getDatasetIdByUploadId` method is consistent with the existing `getDataSourceIdByUploadId` pattern.

---

`171-210`: **LGTM: reserveUpload updated for datasetId and layer linking.**

The method now correctly validates that `needsConversion` and `layersToLink` are not both set, and stores the `datasetId` in Redis.

---

`263-319`: **LGTM: handleUploadChunk updated to datasetId.**

The chunk upload handler now correctly uses `getDatasetIdByUploadId` and includes `datasetId` in log messages.

---

`321-335`: **LGTM: cancelUpload updated to datasetId.**

The cancel upload handler now correctly uses `getDatasetIdByUploadId` and includes `datasetId` in log messages.

---

`602-628`: **LGTM: Cleanup logic includes dataset deletion.**

The cleanup logic correctly deletes the local directory and calls `remoteWebknossosClient.deleteDataset(datasetId)` on failure.

---

`802-804`: **LGTM: Raw uploaded data backed up for debugging.**

The backup feature correctly copies the raw uploaded data to a `.trash` directory before processing.

</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)

171-210: Persist isVirtual to Redis for consistent finish behavior.

While isVirtual is present in ReserveUploadInformation, it's not being persisted to Redis. This means finishUpload cannot read the reserved value and must rely on live config (dataStoreConfig.Datastore.S3Upload.enabled), risking inconsistency if the config changes between reserve and finish operations.

Add persistence and retrieval:

+  private def redisKeyForIsVirtual(uploadId: String): String =
+    s"upload___${uploadId}___isVirtual"
+
   def reserveUpload(reserveUploadInfo: ReserveUploadInformation,
                     reserveUploadAdditionalInfo: ReserveAdditionalInformation): Fox[Unit] =
     for {
       _ <- dataSourceService.assertDataDirWritable(reserveUploadInfo.organization)
       _ <- Fox.fromBool(
         !reserveUploadInfo.needsConversion.getOrElse(false) || !reserveUploadInfo.layersToLink
           .exists(_.nonEmpty)) ?~> "Cannot use linked layers if the dataset needs conversion"
@@
       _ <- runningUploadMetadataStore.insert(
         redisKeyForDatasetId(reserveUploadInfo.uploadId),
         Json.stringify(Json.toJson(reserveUploadAdditionalInfo.newDatasetId))
       )
+      _ <- runningUploadMetadataStore.insert(
+        redisKeyForIsVirtual(reserveUploadInfo.uploadId),
+        reserveUploadInfo.isVirtual.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled).toString
+      )

Then in finishUpload, read and pass it to moveUnpackedToTarget:

   def finishUpload(uploadInformation: UploadInformation)(implicit tc: TokenContext): Fox[ObjectId] = {
     val uploadId = uploadInformation.uploadId
 
     for {
       dataSourceId <- getDataSourceIdByUploadId(uploadId)
       datasetId <- getDatasetIdByUploadId(uploadId)
       linkedLayerIdentifiers <- getObjectFromRedis[LinkedLayerIdentifiers](redisKeyForLinkedLayerIdentifier(uploadId))
+      isVirtualOpt <- runningUploadMetadataStore.find(redisKeyForIsVirtual(uploadId)).map(_.map(_.toBoolean))
+      uploadToS3 = isVirtualOpt.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled)
@@
-      dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId)
+      dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, uploadToS3, datasetId, dataSourceId)

Also update removeFromRedis (around line 814) to clean up the new key, and update moveUnpackedToTarget signature to accept the uploadToS3 parameter instead of reading config directly.

Based on learnings

♻️ Duplicate comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)

529-540: Hardcoded HTTPS scheme breaks non-HTTPS S3 endpoints.

Line 532 forces "https" and line 535 drops the port (-1), which will break MinIO or any custom S3-compatible endpoint using HTTP or non-standard ports.

Preserve the original scheme and port:

   private lazy val s3UploadEndpoint: URI = {
     val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
     new URI(
-      "https",
+      credentialUri.getScheme,
       null,
       credentialUri.getHost,
-      -1,
+      credentialUri.getPort,
       null,
       null,
       null
     )
   }

Alternatively, if the credential name already contains the full endpoint URI, just use it directly:

-  private lazy val s3UploadEndpoint: URI = {
-    val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
-    new URI(
-      "https",
-      null,
-      credentialUri.getHost,
-      -1,
-      null,
-      null,
-      null
-    )
-  }
+  private lazy val s3UploadEndpoint: URI =
+    new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

542-558: Improve error message for missing S3 credentials.

If s3UploadCredentialsOpt is None, the for-comprehension fails with an empty Box, which doesn't provide a clear error message.

   private lazy val getS3TransferManager: Box[S3TransferManager] = for {
-    accessKeyId <- Box(s3UploadCredentialsOpt.map(_._1))
-    secretAccessKey <- Box(s3UploadCredentialsOpt.map(_._2))
+    creds <- Box(s3UploadCredentialsOpt) ?~ s"S3 upload enabled but no credentials found for '${dataStoreConfig.Datastore.S3Upload.credentialName}'"
+    (accessKeyId, secretAccessKey) = creds
     client <- tryo(
       S3AsyncClient
         .builder()
         .credentialsProvider(StaticCredentialsProvider.create(
           AwsBasicCredentials.builder.accessKeyId(accessKeyId).secretAccessKey(secretAccessKey).build()
         ))
         .crossRegionAccessEnabled(true)
         .forcePathStyle(true)
         .endpointOverride(s3UploadEndpoint)
         .region(Region.US_EAST_1)
         // Disabling checksum calculation prevents files being stored with Content Encoding "aws-chunked".
         .requestChecksumCalculation(RequestChecksumCalculation.WHEN_REQUIRED)
         .build())
   } yield S3TransferManager.builder().s3Client(client).build()

408-447: Use parsed s3UploadEndpoint for host lookup

  • Replace
    - endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost
    + endPointHost = s3UploadEndpoint.getHost
  • Keep the synchronized block: unpackToDirFor returns a shared directory per organization, so synchronization is required unless you refactor to use unique per-upload paths.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2aebc34 and 8dab124.

📒 Files selected for processing (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (18 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-28T14:18:04.368Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, `Fox.fromBool()` expressions should use the `<-` binding operator instead of the `=` assignment operator to properly propagate error conditions. Using `=` will cause validation failures to be silently ignored.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (10)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (4)
  • PathUtils (14-14)
  • ensureDirectory (118-122)
  • deleteDirectoryRecursively (186-193)
  • ensureDirectoryBox (124-132)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
  • DataStoreConfig (11-78)
  • S3Upload (64-68)
  • DataVaults (61-63)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
  • S3DataVault (46-182)
  • S3DataVault (184-277)
  • getCredential (170-170)
  • hostBucketFromUri (195-206)
  • create (185-193)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/explore/ExploreLocalLayerService.scala (1)
  • ExploreLocalLayerService (21-152)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromLocalPath (80-80)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
  • UnfinishedUpload (7-16)
  • UnfinishedUpload (18-20)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (3)
  • DSRemoteWebknossosClient (57-216)
  • reportUpload (98-105)
  • getCredential (185-196)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (2)
  • now (48-48)
  • logSince (72-75)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
  • deleteOnDisk (28-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
  • GitHub Check: backend-tests
🔇 Additional comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)

46-101: LGTM! Data models align well with PR objectives.

The new and updated case classes properly support dataset-ID-centric operations and S3 upload configuration. The addition of isVirtual, needsConversion, and LinkedLayerIdentifier structures enables the transition away from symlink-based layer linking.


393-406: LGTM! File cleanup logic correctly filters unreferenced files.

The logic correctly identifies files referenced by datasource layers (explicit paths + implicit mag paths) and safely deletes files not in that set. This aligns with the PR objective to avoid uploading unnecessary files.

The startsWith check at line 594 appropriately handles directory prefixes.

Also applies to: 576-596


798-800: Good practice: Raw upload backup for debugging.

Backing up raw uploaded data to .trash before processing enables debugging failed uploads. The regular cleanup by cronjob prevents unbounded growth.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe9c527 and 9efac9a.

📒 Files selected for processing (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (20 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-28T14:18:04.368Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, `Fox.fromBool()` expressions should use the `<-` binding operator instead of the `=` assignment operator to properly propagate error conditions. Using `=` will cause validation failures to be silently ignored.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (5)
  • PathUtils (14-14)
  • deleteDirectoryRecursively (186-193)
  • ensureDirectoryBox (124-132)
  • listFilesRecursive (104-108)
  • listFiles (101-102)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
  • DataStoreConfig (11-78)
  • Datastore (18-70)
  • S3Upload (64-68)
  • DataVaults (61-63)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
  • S3DataVault (46-182)
  • S3DataVault (184-277)
  • getCredential (170-170)
  • hostBucketFromUri (195-206)
  • create (185-193)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/explore/ExploreLocalLayerService.scala (1)
  • ExploreLocalLayerService (21-152)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromLocalPath (80-80)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
  • DSRemoteWebknossosClient (57-216)
  • reportUpload (98-105)
  • getCredential (185-196)
  • deleteDataset (138-143)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
  • deleteOnDisk (28-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for adding more logging.

Did not retest as I saw no critial changes

@fm3 fm3 enabled auto-merge (squash) October 8, 2025 07:44
@fm3 fm3 merged commit a35bbba into master Oct 8, 2025
4 of 5 checks passed
@fm3 fm3 deleted the s3-upload branch October 8, 2025 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

4 participants