-
Notifications
You must be signed in to change notification settings - Fork 29
Upload datasets to S3 #8912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload datasets to S3 #8912
Conversation
📝 WalkthroughWalkthroughRefactors upload and linking flows to be datasetId-centric, introduces LayerToLinkService, adds S3 upload config and AWS transfer-manager dependency, removes symlink-based linking and DatasetSymlinkService, adds Slack failure notification, bumps API to v12, and updates controllers, services, routes, models, and frontend. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (2)
🧰 Additional context used🧬 Code graph analysis (1)app/models/dataset/Dataset.scala (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (7)
unreleased_changes/8912.md (1)
7-17
: Specify language for fenced config block to satisfy markdownlint and improve readability.Use hocon for syntax highlighting.
-``` +```hocon s3Upload { enabled = true # Use the prefix / name of a credential in the dataVaults section here to use it for uploads. credentialName = "s3://fsn1.your-objectstorage.com/webknossos-test/upload-tests/" endpoint = "https://fsn1.your-objectstorage.com" bucketName = "webknossos-test" objectKeyPrefix = "upload-tests" }</blockquote></details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)</summary><blockquote> `70-71`: **Include S3Upload and DataVaults in children for consistency with nested config traversal/logging.** Prevents omissions in config dumping/validation that rely on children. ```diff - val children = List(WebKnossos, WatchFileSystem, Cache, AdHocMesh, Redis, AgglomerateSkeleton) + val children = List(WebKnossos, WatchFileSystem, Cache, AdHocMesh, Redis, AgglomerateSkeleton, DataVaults, S3Upload)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
525-540
: Surface partial-failure details from TransferManager.You check
failedTransfers.isEmpty
which is good. Consider logging individual failures (key and exception) to aid debugging large uploads.
818-823
: Exclude the S3 staging root from orphan cleanup.
cleanUpOrphanUploadsForOrga
will treat the.cloud
directory itself as an “orphan” and delete/recreate it, adding churn. Filter it out.Apply:
- uploadDirs <- PathUtils.listDirectories(orgaUploadingDir, silent = false).toFox - _ <- Fox.serialCombined(uploadDirs) { uploadDir => + uploadDirs <- PathUtils.listDirectories(orgaUploadingDir, silent = false).toFox + uploadDirsFiltered = uploadDirs.filterNot(_.getFileName.toString == uploadToS3Dir) + _ <- Fox.serialCombined(uploadDirsFiltered) { uploadDir =>
398-422
: S3 path update + cleanup sequence: add best-effort rollback.If
updateDataSource
fails after a successful S3 upload, the local unpack dir is already deleted. Consider best-effort cleanup (e.g., delete uploaded S3 prefix) or tag the dataset with a “failed upload” signal to enable ops cleanup.
254-266
: Path traversal check could be more robust.
startsWith("../")
on a relativized path is brittle. Prefer normalizing and ensuring the target stays withinuploadDir
.Apply:
- private def isOutsideUploadDir(uploadDir: Path, filePath: String): Boolean = - uploadDir.relativize(uploadDir.resolve(filePath)).startsWith("../") + private def isOutsideUploadDir(uploadDir: Path, filePath: String): Boolean = { + val target = uploadDir.resolve(filePath).normalize() + !target.startsWith(uploadDir) + }
826-853
: Orphan upload cleanup at startup: OK; consider periodic metrics.You may want a counter for deleted orphans to surface drift.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
app/controllers/WKRemoteDataStoreController.scala
(1 hunks)app/models/dataset/DatasetService.scala
(2 hunks)conf/application.conf
(1 hunks)conf/webknossos.latest.routes
(1 hunks)project/Dependencies.scala
(2 hunks)unreleased_changes/8912.md
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
(4 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DirectoryConstants.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(13 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)
util/src/main/scala/com/scalableminds/util/tools/ConfigReader.scala (1)
get
(23-26)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (3)
isEmpty
(21-21)DatasetLayerAttachments
(13-22)DatasetLayerAttachments
(24-27)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
DataVaultService
(21-30)DataVaultService
(32-69)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
mapped
(411-511)util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (2)
toString
(38-38)toMagLiteral
(40-43)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (3)
coordinateTransformations
(89-89)defaultViewConfiguration
(101-101)attachments
(107-107)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (3)
PathUtils
(14-14)ensureDirectoryBox
(124-132)deleteDirectoryRecursively
(186-193)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
DataStoreConfig
(10-79)Datastore
(17-71)S3Upload
(63-69)DataVaults
(60-62)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader
(8-64)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataStoreRedisStore.scala (1)
DataStoreRedisStore
(8-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (1)
RemoteSourceDescriptorService
(20-177)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RedisTemporaryStore.scala (2)
insert
(55-66)remove
(82-87)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
prependAllPaths
(369-418)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
updateDataSource
(139-147)getCredential
(194-205)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
DataStoreConfig
(10-79)Datastore
(17-71)S3Upload
(63-69)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
isKnownUpload
(175-178)reserveUpload
(192-230)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
reserveDataSourceUpload
(130-139)
🪛 markdownlint-cli2 (0.17.2)
unreleased_changes/8912.md
8-8: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (19)
conf/webknossos.latest.routes (1)
290-290
: Whitespace-only change — OK.No functional impact.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DirectoryConstants.scala (1)
7-7
: Constant addition looks good.Consistent with hidden-dot convention for internal dirs.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)
63-69
: Config group addition — OK.Fields align with application.conf.
app/models/dataset/DatasetService.scala (2)
101-102
: Propagation of isVirtual — OK.Correctly forwards to createDataset.
91-93
: Signature change verified — call sites updated.DatasetService.scala now requires isVirtual; the only invocation found (app/controllers/WKRemoteDataStoreController.scala) passes uploadInfo.isVirtual.getOrElse(false).
project/Dependencies.scala (1)
8-8
: Pinning awsVersion and adding s3-transfer-manager — artifacts available.
Verified software.amazon.awssdk:s3 and software.amazon.awssdk:s3-transfer-manager v2.32.24 are present on Maven Central (HTTP 200).app/controllers/WKRemoteDataStoreController.scala (1)
88-94
: createPreliminaryDataset now accepts isVirtual; callsites verifiedSignature in app/models/dataset/DatasetService.scala includes isVirtual: Boolean; the sole callsite found (app/controllers/WKRemoteDataStoreController.scala) passes uploadInfo.requireUniqueName.getOrElse(false) and uploadInfo.isVirtual.getOrElse(false), preserving the legacy default false — no other usages found.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
415-507
: Attachment path rewrites included in mapped(): LGTM.Adding
attachmentMapping
and propagating it across all concrete copies is correct and unlocks URI rewrites for attachments alongside mags.webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)
70-74
: Constructor wiring for DataStoreConfig is fine.No issues; enables controller-level config decisions.
147-149
: Manual uploads: forcing isVirtual = false is appropriate.Matches the “on-disk” manual workflow.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (9)
384-386
: Back-compat: datasetId missing in Redis for older reservations.
getDatasetIdByUploadId(uploadId)
will fail for uploads reserved before this PR. Add a fallback (e.g., resolve via WK by directoryName) to let those finish.Would you like me to add a
DSRemoteWebknossosClient.findDatasetId(...)
fallback here?
454-462
: Path allowlist check for explored datasources: good guard.Ensures URIs after exploration are permissible.
183-186
: Staging dir for S3 uploads: OK.Using
${uploadingDir}/$uploadToS3Dir/$uploadId
isolates S3 prep; no functional issues spotted.Also applies to: 586-599
531-539
: Content-encoding workaround comment.The checksum setting workaround for “aws-chunked” is pragmatic. Please confirm with a small integration test that uploaded objects are readable without special headers.
269-279
: Chunk handling and bounds checks: solid.Includes size guard via
totalFileSizeInBytes
and idempotent chunk writes.Also applies to: 315-336
409-417
: Delete local staging after S3 push: good.Saves disk and avoids double-storage.
621-641
: Layer linking rename switch is clean.Covers all concrete layer types.
760-789
: ZIP vs. raw layout handling is careful.Truncating common prefixes and excluding known layer prefixes avoids extra nesting.
855-861
: Helper to parse JSON from Redis: OK.Minimal and typed.
...-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
Outdated
Show resolved
Hide resolved
...nossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
Outdated
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for diving into this! I had a first look and added some comments.
@normanrz maybe you would like to have a look as well, especially at some of the design decisions? Does this fit with our multi-bucket setup? Is it ok that we can’t show datastore-to-s3 upload progress to the user?
I did not yet do any testing.
Note that we’ll have to see what is merged first, this or #8844 – if the latter, we could use UPath stuff here as well.
Also, I guess the worker has to be adapted to write the output of the convert_to_wkw job to S3 as well. But maybe that can be deployed separately from this change?
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Outdated
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Outdated
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Outdated
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (2)
162-173
: Public helper should handle regional virtual-hosted endpoints (bucket.s3.us-xyz-1.amazonaws.com).
hostBucketFromUri
fails for regional virtual-hosted style; making it public risks propagating this limitation. Extract bucket using the first occurrence of ".s3" and normalize path handling.Apply:
- def hostBucketFromUri(uri: URI): Option[String] = { - val host = uri.getHost - if (isShortStyle(uri)) { // assume host is omitted from uri, shortcut form s3://bucket/key - Some(host) - } else if (isVirtualHostedStyle(uri)) { - Some(host.substring(0, host.length - ".s3.amazonaws.com".length)) - } else if (isPathStyle(uri)) { - Some(uri.getPath.substring(1).split("/")(0)) - } else { - None - } - } + def hostBucketFromUri(uri: URI): Option[String] = { + val host = Option(uri.getHost).getOrElse("") + if (isShortStyle(uri)) { // s3://bucket/key + Some(host) + } else if (isVirtualHostedStyle(uri)) { // bucket.s3[.region].amazonaws.com + val idx = host.indexOf(".s3") + if (idx > 0) Some(host.substring(0, idx)) else None + } else if (isPathStyle(uri)) { // s3[.region].amazonaws.com/bucket/key OR non-AWS host + Some(uri.getPath.stripPrefix("/").split("/")(0)) + } else None + }
175-186
: Broaden endpoint detection for AWS styles; escape regex dots.Covers regional and dualstack variants; current
endsWith(".s3.amazonaws.com")
is too narrow.Apply:
- private def isVirtualHostedStyle(uri: URI): Boolean = - uri.getHost.endsWith(".s3.amazonaws.com") + private def isVirtualHostedStyle(uri: URI): Boolean = + uri.getHost.matches(""".+\.s3([.-][\w\-]+)?\.amazonaws\.com""") - private def isPathStyle(uri: URI): Boolean = - uri.getHost.matches("s3(.[\\w\\-_]+)?.amazonaws.com") || - (!uri.getHost.contains("amazonaws.com") && uri.getHost.contains(".")) + private def isPathStyle(uri: URI): Boolean = + uri.getHost.matches("""s3([.-][\w\-]+)?\.amazonaws\.com""") || + (!uri.getHost.contains("amazonaws.com") && uri.getHost.contains("."))webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)
186-188
: Exclude hidden staging dir from orphan cleanup.
.cloud
underuploading/
can be mistaken as an orphan and deleted bycleanUpOrphanUploadsForOrga
. Skip dot-prefixed dirs in that sweep.Proposed check inside
cleanUpOrphanUploadsForOrga
before trying to delete:
- if
uploadDir.getFileName.toString.startsWith(".")
, continue.
510-523
: Bucket/endpoint derivation assumptions are brittle.
s3UploadBucketOpt
depends oncredentialName
being an S3 URI containing a bucket segment.
s3UploadEndpoint
forces HTTPS; some S3-compatible stores are HTTP-only.Validate
credentialName
format at startup and fail fast with a clear message.Consider reusing
S3DataVault.getAmazonS3Client
(with injected WSClient) for consistent endpoint/region handling, or make protocol configurable.
524-541
: Region is hardcoded to us-east-1.This can break signing for AWS regional endpoints. Either parse from host (like S3DataVault) or make it configurable.
Short-term: add a datastore.s3Upload.region with default
us-east-1
. Long-term: parse viaAwsHostNameUtils.parseSigningRegion
.
575-594
: Edge case: depth limit may keep unused files.
maxDepth = 10
may miss deep trees, leaving unreferenced files to be uploaded.
- Consider unlimited traversal or a higher bound, or compute
neededPaths
first and filter by prefix without full recursion.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
conf/application.conf
(1 hunks)unreleased_changes/8912.md
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(12 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
- conf/application.conf
🧰 Additional context used
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
DataStoreConfig
(10-77)Datastore
(17-69)S3Upload
(63-67)DataVaults
(60-62)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
S3DataVault
(45-149)S3DataVault
(151-244)getCredential
(140-140)hostBucketFromUri
(162-173)create
(152-160)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
UnfinishedUpload
(7-16)UnfinishedUpload
(18-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
DSRemoteWebknossosClient
(57-225)updateDataSource
(139-147)reportDataSource
(84-89)getCredential
(194-205)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader
(8-64)
🪛 markdownlint-cli2 (0.17.2)
unreleased_changes/8912.md
8-8: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: frontend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
511-531
: Consider making S3 client initialization more robustThe current S3 client initialization could be improved to handle credential failures more gracefully.
Following up on the previous comment about credentials, consider using
Box
for better error handling:- private lazy val s3ClientBox: Box[S3AsyncClient] = for { - accessKeyId <- Box(s3UploadCredentialsOpt.map(_._1)) - secretAccessKey <- Box(s3UploadCredentialsOpt.map(_._2)) - } yield + private lazy val s3ClientBox: Box[S3AsyncClient] = + s3UploadCredentialsOpt match { + case Some((accessKeyId, secretAccessKey)) if accessKeyId.nonEmpty && secretAccessKey.nonEmpty => + Full( S3AsyncClient .builder() .credentialsProvider( StaticCredentialsProvider.create( AwsBasicCredentials.builder.accessKeyId(accessKeyId).secretAccessKey(secretAccessKey).build() )) .crossRegionAccessEnabled(true) .forcePathStyle(true) .endpointOverride(s3UploadEndpoint) .region(Region.US_EAST_1) // Disabling checksum calculation prevents files being stored with Content Encoding "aws-chunked". .requestChecksumCalculation(RequestChecksumCalculation.WHEN_REQUIRED) .build() + ) + case _ => + Failure(s"S3 upload enabled but no valid credentials found for ${dataStoreConfig.Datastore.S3Upload.credentialName}") + } private lazy val transferManagerBox: Box[S3TransferManager] = for { client <- s3ClientBox } yield S3TransferManager.builder().s3Client(client).build()Also, consider making the S3 region configurable instead of hardcoding
Region.US_EAST_1
.
562-580
: Consider optimizing file filtering logicThe
getNonReferencedFiles
method could be improved for clarity and efficiency.Consider extracting the implicit path construction logic:
private def getNonReferencedFiles(dataDir: Path, dataSource: DataSource): Fox[List[Path]] = for { usableDataSource <- dataSource.toUsable.toFox ?~> "Data source is not usable" - explicitPaths: Set[Path] = usableDataSource.dataLayers + implicitPaths: Set[Path] = usableDataSource.dataLayers .flatMap(layer => layer.mags.map(mag => mag.path match { case Some(_) => None case None => Some(dataDir.resolve(List(layer.name, mag.mag.toMagLiteral(true)).mkString("/"))) })) .flatten .toSet - neededPaths = usableDataSource.dataLayers + explicitPaths = usableDataSource.dataLayers .flatMap(layer => layer.allExplicitPaths) .map(dataDir.resolve) - .toSet ++ explicitPaths + .toSet + neededPaths = explicitPaths ++ implicitPaths allFiles <- PathUtils.listFilesRecursive(dataDir, silent = true, maxDepth = 10).toFox filesToDelete = allFiles.filterNot(file => neededPaths.exists(neededPath => file.startsWith(neededPath))) } yield filesToDelete
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
app/controllers/WKRemoteDataStoreController.scala
(1 hunks)app/models/dataset/DatasetService.scala
(1 hunks)conf/application.conf
(1 hunks)project/Dependencies.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
(3 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
(3 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(12 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- conf/application.conf
- app/controllers/WKRemoteDataStoreController.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
- app/models/dataset/DatasetService.scala
- project/Dependencies.scala
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
🧬 Code graph analysis (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
DataStoreConfig
(11-78)Datastore
(18-70)S3Upload
(64-68)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
isKnownUpload
(164-167)reserveUpload
(181-219)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
reserveDataSourceUpload
(129-138)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (3)
coordinateTransformations
(85-85)attachments
(103-103)defaultViewConfiguration
(97-97)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/PathSchemes.scala (1)
PathSchemes
(3-9)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
mapped
(102-131)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (4)
PathUtils
(14-14)ensureDirectoryBox
(124-132)deleteDirectoryRecursively
(186-193)listFilesRecursive
(104-108)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
DataStoreConfig
(11-78)Datastore
(18-70)S3Upload
(64-68)DataVaults
(61-63)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
S3DataVault
(46-155)S3DataVault
(157-250)getCredential
(143-143)hostBucketFromUri
(168-179)create
(158-166)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
DSRemoteWebknossosClient
(56-222)updateDataSource
(138-144)reportDataSource
(83-88)getCredential
(191-202)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader
(8-64)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
🔇 Additional comments (7)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
102-129
: LGTM! Well-structured attachment mapping supportThe addition of the
attachmentMapping
parameter with a default identity function is clean and maintains backward compatibility. The implementation correctly applies the mapping to bothStaticColorLayer
andStaticSegmentationLayer
branches.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)
345-414
: Persist and use isVirtual from reservation to ensure consistencyThe
finishUpload
method currently determines S3 vs. local storage usingdataStoreConfig.Datastore.S3Upload.enabled
at Line 353. This could lead to inconsistent behavior if the configuration changes between reservation and finish.
390-395
: Replace with Instant.logSince for consistencyUse the utility method for consistent timing logs.
387-406
: Add cleanup on S3 upload failureIf the S3 upload fails between lines 393-405, local temporary data remains and the pre-created dataset may linger without proper cleanup.
489-510
: Fail fast if S3 credentials are missingThe lazy initialization of
s3UploadCredentialsOpt
at line 489 usesgetOrElse(("", ""))
pattern which silently creates a client with empty credentials, leading to confusing runtime failures.webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
99-105
: Good: isVirtual set from config at reservation timeSetting
shouldBeVirtual
once and enforcing it viarequest.body.copy(isVirtual = Some(shouldBeVirtual))
is the right approach. The controller correctly captures the configuration state at reservation time.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
254-302
: URI construction with null scheme creates invalid relative pathsThe
replaceUri
function creates URIs with null schemes when prepending paths, which produces relative path strings that may not be properly handled by S3 clients and could cause path resolution issues.Apply this refactor to properly handle URI construction and path normalization:
- // Prepend newBasePath to all (relative) paths in mags and attachments of the data source. - def prependAllPaths(dataSource: DataSource, newBasePath: String): Fox[DataSource] = { - val replaceUri = (uri: URI) => { - val isRelativeFilePath = (uri.getScheme == null || uri.getScheme.isEmpty || uri.getScheme == PathSchemes.schemeFile) && !uri.isAbsolute - uri.getPath match { - case pathStr if isRelativeFilePath => - new URI(uri.getScheme, - uri.getUserInfo, - uri.getHost, - uri.getPort, - newBasePath + pathStr, - uri.getQuery, - uri.getFragment) - case _ => uri - } - } + // Prepend newBasePath to all relative paths in mags and attachments of the data source. + def prependAllPaths(dataSource: DataSource, newBasePath: String): Fox[DataSource] = { + def ensureTrailingSlash(s: String): String = + if (s.endsWith("/")) s else s + "/" + + def stripLeadingSlash(s: String): String = + if (s.startsWith("/")) s.drop(1) else s + + val baseUri = new URI(ensureTrailingSlash(newBasePath)) + + def prependToPath(pathStr: String): String = + baseUri.resolve(stripLeadingSlash(pathStr)).toString + + def replaceUri(uri: URI): URI = { + val scheme = Option(uri.getScheme).getOrElse("") + // Only rewrite relative, non-opaque URIs (no scheme or file scheme) + if (scheme.isEmpty || scheme == PathSchemes.schemeFile) { + new URI(prependToPath(uri.getPath)) + } else { + uri + } + } dataSource.toUsable match { case Some(usableDataSource) => val updatedDataLayers = usableDataSource.dataLayers.map { case layerWithMagLocators: StaticLayer => layerWithMagLocators.mapped( magMapping = mag => mag.path match { - case Some(pathStr) => mag.copy(path = Some(replaceUri(new URI(pathStr)).toString)) + case Some(pathStr) => mag.copy(path = Some(prependToPath(pathStr))) // If the mag does not have a path, it is an implicit path, we need to make it explicit. case _ => mag.copy( path = Some( - new URI(newBasePath) - .resolve(List(layerWithMagLocators.name, mag.mag.toMagLiteral(true)).mkString("/")) - .toString)) + baseUri.resolve(List(layerWithMagLocators.name, mag.mag.toMagLiteral(true)).mkString("/")).toString)) }, attachmentMapping = attachment => DataLayerAttachments( attachment.meshes.map(a => a.copy(path = replaceUri(a.path))), attachment.agglomerates.map(a => a.copy(path = replaceUri(a.path))), attachment.segmentIndex.map(a => a.copy(path = replaceUri(a.path))), attachment.connectomes.map(a => a.copy(path = replaceUri(a.path))), attachment.cumsum.map(a => a.copy(path = replaceUri(a.path))) ) ) case layer => layer } Fox.successful(usableDataSource.copy(dataLayers = updatedDataLayers)) case None => Fox.failure("Cannot replace paths of unusable datasource") } }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
app/models/dataset/ComposeService.scala (1)
62-70
: Reintroduce the StaticLayer guard to prevent MatchError
mapped
is implemented via athis match { case StaticColorLayer | StaticSegmentationLayer => ... }
, so calling it on a non-staticDataLayer
(e.g.,EditableMappingLayer
) now blows up with aMatchError
. The earlier version short-circuited those cases; this change regresses into a runtime crash. Let’s keep the defensive pattern match.- applyCoordinateTransformations = (cOpt: Option[List[CoordinateTransformation]]) => - cOpt match { - case Some(c) => Some(c ++ composeLayer.transformations.toList) - case None => Some(composeLayer.transformations.toList) - } - editedLayer = layer.mapped(name = composeLayer.newName, - coordinateTransformations = - applyCoordinateTransformations(layer.coordinateTransformations)) - } yield editedLayer + applyCoordinateTransformations = (cOpt: Option[List[CoordinateTransformation]]) => + cOpt match { + case Some(c) => Some(c ++ composeLayer.transformations.toList) + case None => Some(composeLayer.transformations.toList) + } + editedLayer <- layer match { + case staticLayer: StaticLayer => + Fox.successful( + staticLayer.mapped( + name = composeLayer.newName, + coordinateTransformations = + applyCoordinateTransformations(staticLayer.coordinateTransformations) + ) + ) + case _ => + Fox.failure("Only static layers can be composed") + } + } yield editedLayerapp/models/dataset/DatasetUploadToPathsService.scala (1)
228-233
: Bug: reserveAttachmentUploadToPath regenerates directoryName and ignores uploadToPathsInfixOpt
File: app/models/dataset/DatasetUploadToPathsService.scala Lines ~211–218
- It calls
generateDirectoryName
ondataset.directoryName
instead of reusing the stored name, and omitsuploadToPathsInfixOpt
, so attachments land under a different path than layers.
Apply this diff to mirroraddPathsToDatasource
:- newDirectoryName = datasetService.generateDirectoryName(dataset.directoryName, dataset._id) - datasetPath = uploadToPathsPrefix / dataset._organization / newDirectoryName + orgaDir = uploadToPathsPrefix / dataset._organization + datasetParent = uploadToPathsInfixOpt.map(infix => orgaDir / infix).getOrElse(orgaDir) + datasetPath = datasetParent / dataset.directoryName attachmentPath = generateAttachmentPath(parameters.attachmentName, parameters.attachmentDataformat, parameters.attachmentType, datasetPath / parameters.layerName)
🧹 Nitpick comments (2)
app/models/dataset/DatasetUploadToPathsService.scala (2)
139-142
: Normalize requested path prefix before whitelist check.If clients pass a logically identical but differently-normalized UPath (e.g., relative vs absolute), this equality test may reject it.
Apply this diff:
- case Some(requested) => - if (configuredPrefixes.contains(requested)) Full(requested) + case Some(requested) => + val requestedAbs = requested.toAbsolute + if (configuredPrefixes.contains(requestedAbs)) Full(requestedAbs) else Failure("Requested path prefix is not in list of configured ones.")
146-156
: DRY: centralize dataset parent path construction (optional).Both addPathsToDatasource and reserveAttachmentUploadToPath compute the same orgaDir/infix logic. Extract a helper to reduce drift.
Add this helper:
private def datasetParentFor(prefix: UPath, organizationId: String): UPath = { val orgaDir = prefix / organizationId uploadToPathsInfixOpt.map(infix => orgaDir / infix).getOrElse(orgaDir) }Then use:
- In addPathsToDatasource: replace orgaDir/datasetParent with datasetParentFor(uploadToPathsPrefix, organizationId)
- In reserveAttachmentUploadToPath: replace with datasetParentFor(uploadToPathsPrefix, dataset._organization)
</blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: CodeRabbit UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 0f9e16546404424413b4b020b1aca1af14b3eb02 and a509ceba4f160452bb55a00bb5117b1a8c36eb8d. </details> <details> <summary>📒 Files selected for processing (4)</summary> * `app/models/dataset/ComposeService.scala` (2 hunks) * `app/models/dataset/DatasetUploadToPathsService.scala` (3 hunks) * `conf/application.conf` (1 hunks) * `webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala` (4 hunks) </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary> * conf/application.conf </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧬 Code graph analysis (3)</summary> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (4)</summary><blockquote> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)</summary> * `webknossos` (70-72) * `validateAccessFromTokenContext` (87-94) </details> <details> <summary>util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)</summary> * `validateJson` (204-209) </details> <details> <summary>util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)</summary> * `fromFuture` (42-53) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)</summary> * `getDatasetId` (206-216) </details> </blockquote></details> <details> <summary>app/models/dataset/ComposeService.scala (2)</summary><blockquote> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)</summary> * `mapped` (104-133) </details> <details> <summary>webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (1)</summary> * `coordinateTransformations` (85-85) </details> </blockquote></details> <details> <summary>app/models/dataset/DatasetUploadToPathsService.scala (2)</summary><blockquote> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)</summary> * `LinkedLayerIdentifier` (80-80) * `LinkedLayerIdentifier` (82-84) </details> <details> <summary>app/models/dataset/LayerToLinkService.scala (1)</summary> * `addLayersToLinkToDataSource` (32-42) </details> </blockquote></details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)</summary> * GitHub Check: backend-tests * GitHub Check: frontend-tests * GitHub Check: build-smoketest-push </details> <details> <summary>🔇 Additional comments (5)</summary><blockquote> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (1)</summary><blockquote> `133-143`: **Legacy manual uploads must set `isVirtual = false`** Leaving `isVirtual` as `None` makes manual uploads inherit the new default (`true`), so data that lives on disk is marked virtual and never served correctly. Explicitly pass `Some(false)` here. ```diff - Some(request.body.requireUniqueName), - None + Some(request.body.requireUniqueName), + Some(false)
app/models/dataset/DatasetUploadToPathsService.scala (4)
23-23
: Good: import LinkedLayerIdentifier from uploading module.This aligns with the new public surface in the datastore uploading package.
26-28
: Good: switched to public request types.Using ReserveDatasetUploadToPaths(ForPreliminary)Request from controllers is consistent with API versioning plans.
41-41
: Good: delegate layer linking via injected LayerToLinkService.Cleaner separation of concerns; avoids duplicating resolve/link logic here.
67-69
: Good: delegate layersToLink handling to LayerToLinkService.This ensures duplicate-name checks and resolution are centralized.
Please confirm i18n keys used inside LayerToLinkService ("dataset.layerToLink.failed", "dataset.duplicateLayerNames") exist in messages files for all supported locales.
...datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
330-341
: Bug: use flatMap instead of map to avoid nested Fox and ensure cleanup runsCurrent map returns Fox[Fox[Unit]] and likely skips size enforcement or fails type inference. Use flatMap.
- _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize => - tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.map(actualFileSize => - if (actualFileSize > maxFileSize.toLong) { - cleanUpDatasetExceedingSize(uploadDir, uploadId) - Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes") - } else Fox.successful(())) - } + _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize => + tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.flatMap { actualFileSize => + if (actualFileSize > maxFileSize.toLong) { + cleanUpDatasetExceedingSize(uploadDir, uploadId) + } else Fox.successful(()) + } + }frontend/javascripts/admin/dataset/dataset_upload_view.tsx (1)
334-347
: AddneedsConversion
toReserveUploadInformation
TS type and backend
TheReserveUploadInformation
type infrontend/javascripts/admin/rest_api.ts
(currently at lines 1136–1146) must include aneedsConversion?: boolean
field, and the corresponding backend DTO needs to accept and handle it optional-safely.
🧹 Nitpick comments (5)
app/models/dataset/DatasetService.scala (2)
88-99
: Unfinished uploads query: add guardrails for scale and configurability.
- Consider making the 14‑day lookback configurable (e.g., WkConf) and optionally add a hard
LIMIT
to avoid large result sets.- Ensure there is a supporting compound index matching the filters: (uploaderId, organizationId, isActive, status, created DESC).
100-121
: createAndSetUpDataset: good access checks; avoid double‑write for folder if possible.
- You create with root folder and then immediately update to
folderIdWithFallback
. To reduce writes and transient states, consider passing the intended folder intocreateDataset
(e.g., new optional paraminitialFolderId: Option[ObjectId] = None
defaulting to org root).- The updated DataSourceId with (directoryName, organizationId) looks correct for virtual datasets.
app/controllers/WKRemoteDataStoreController.scala (1)
76-90
: isVirtual toggling vs. needsConversion—confirm intentYou set isVirtual = !needsConversion. That matches the note about convert_to_wkw requiring non‑virtual prelim datasets, but conflicts with “all newly uploaded datasets are virtual” as a general rule. Please confirm this exception is intended and documented in the changelog/API notes.
app/controllers/DatasetController.scala (1)
208-215
: Marking explore-and-add datasets virtual is correct; consider analytics hookSetting isVirtual = true here is aligned with “all newly uploaded datasets are virtual.” Optionally, also call trackNewDataset with viaAddRoute = true to keep analytics/Slack consistent across creation paths.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (1)
76-86
: Two DSRemoteWebknossosClient injections — consolidateBoth remoteWebknossosClient and dsRemoteWebknossosClient are DSRemoteWebknossosClient. Use a single instance to reduce confusion and wiring complexity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
app/controllers/DatasetController.scala
(7 hunks)app/controllers/WKRemoteDataStoreController.scala
(5 hunks)app/models/dataset/ComposeService.scala
(2 hunks)app/models/dataset/Dataset.scala
(1 hunks)app/models/dataset/DatasetService.scala
(4 hunks)app/models/dataset/DatasetUploadToPathsService.scala
(4 hunks)frontend/javascripts/admin/dataset/dataset_upload_view.tsx
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala
(4 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(18 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- app/models/dataset/ComposeService.scala
🧰 Additional context used
🧬 Code graph analysis (7)
app/models/dataset/Dataset.scala (2)
app/utils/sql/SecuredSQLDAO.scala (1)
assertUpdateAccess
(36-48)app/utils/sql/SqlInterpolation.scala (2)
q
(20-39)asUpdate
(74-74)
app/controllers/WKRemoteDataStoreController.scala (9)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (3)
AuthorizedAccessContext
(19-21)DBAccessContext
(31-37)GlobalAccessContext
(25-29)util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (2)
ObjectId
(11-13)ObjectId
(15-66)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (4)
DataSource
(31-43)UnusableDataSource
(45-62)UnusableDataSource
(64-66)isUsable
(20-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
DataSourceStatus
(3-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)
ReportDatasetUploadParameters
(70-75)ReportDatasetUploadParameters
(76-79)ReserveAdditionalInformation
(64-64)ReserveAdditionalInformation
(65-68)ReserveUploadInformation
(46-59)ReserveUploadInformation
(60-62)app/models/dataset/LayerToLinkService.scala (2)
validateLayerToLink
(22-32)addLayersToLinkToDataSource
(32-42)app/models/dataset/DatasetService.scala (2)
createAndSetUpDataset
(100-123)trackNewDataset
(548-560)util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)app/models/dataset/Dataset.scala (3)
findOne
(203-210)updateDataSource
(633-655)deleteDataset
(708-736)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)
webknossos
(70-72)validateAccessFromTokenContext
(87-94)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)
LinkedLayerIdentifier
(81-81)LinkedLayerIdentifier
(83-85)ReserveUploadInformation
(46-59)ReserveUploadInformation
(60-62)util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
fromFuture
(42-53)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
getDatasetId
(206-216)
app/models/dataset/DatasetUploadToPathsService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
LinkedLayerIdentifier
(81-81)LinkedLayerIdentifier
(83-85)app/models/dataset/LayerToLinkService.scala (1)
addLayersToLinkToDataSource
(32-42)app/models/dataset/Dataset.scala (1)
makeVirtual
(666-672)
app/models/dataset/DatasetService.scala (8)
app/models/user/MultiUser.scala (1)
MultiUserDAO
(39-181)app/mail/MailchimpClient.scala (2)
MailchimpClient
(17-82)tagUser
(45-54)app/mail/MailchimpTag.scala (1)
MailchimpTag
(5-22)app/models/analytics/AnalyticsService.scala (2)
AnalyticsService
(22-97)track
(38-46)app/models/analytics/AnalyticsEvent.scala (1)
UploadDatasetEvent
(205-218)app/telemetry/SlackNotificationService.scala (2)
SlackNotificationService
(11-59)info
(30-36)app/utils/WkConf.scala (1)
WkConf
(14-296)app/models/dataset/DataStore.scala (1)
findOneByName
(118-125)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
DataStoreConfig
(11-78)Datastore
(18-70)S3Upload
(64-68)DataVaults
(61-63)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
S3DataVault
(46-155)S3DataVault
(157-250)getCredential
(143-143)hostBucketFromUri
(168-179)create
(158-166)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath
(54-96)fromLocalPath
(80-80)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
DSRemoteWebknossosClient
(57-216)reportUpload
(98-105)getCredential
(185-196)deleteDataset
(138-143)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
DataSourceService
(25-303)resolvePathsInNewBasePath
(253-269)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader
(8-64)
app/controllers/DatasetController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
LinkedLayerIdentifier
(81-81)LinkedLayerIdentifier
(83-85)app/models/storage/UsedStorageService.scala (2)
UsedStorageService
(21-103)refreshStorageReportForDataset
(89-103)app/models/dataset/DatasetService.scala (2)
createAndSetUpDataset
(100-123)trackNewDataset
(548-560)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
🔇 Additional comments (15)
app/models/dataset/DatasetService.scala (3)
24-24
: LGTM: imports match new telemetry/analytics dependencies.No issues spotted with these additions.
Also applies to: 26-27, 32-33
47-56
: Refined search to locate manualnew DatasetService
invocations
Please rerun and check for any instantiations that need updating:#!/bin/bash rg -nP 'new\s+DatasetService\s*\(' --type scala -C2If this still yields no results, manually verify any Guice modules or test setups that instantiate
DatasetService
.
560-570
: Gate Slack logging by config and recover failures
- If
conf.SlackNotifications.uri
is empty/None, returnFox.successful(())
to skip DB reads.- Wrap the inner
for
in.orElse(Fox.successful(()))
(or use arecover
) so Slack client errors don’t bubble up.- Confirm that
conf.SlackNotifications.uri
yields an empty string orNone
when unconfigured.app/models/dataset/DatasetUploadToPathsService.scala (1)
67-69
: Delegating layer linking to LayerToLinkService looks goodCentralizing linking and duplicate-name checks here is cleaner and reduces duplication. LGTM.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)
404-445
: Respect configured endpoint object; avoid re-parsing credentialNameUse the normalized s3UploadEndpoint (which should preserve scheme/port) when building S3 base path.
- endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost - newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox + val endPointHost = s3UploadEndpoint.getAuthority // includes host[:port] + newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox
527-538
: Preserve scheme and port for custom S3 endpointsHardcoding https and dropping the port breaks MinIO and non‑standard endpoints. Keep original URI.
- private lazy val s3UploadEndpoint: URI = { - val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName) - new URI( - "https", - null, - credentialUri.getHost, - -1, - null, - null, - null - ) - } + private lazy val s3UploadEndpoint: URI = + new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
342-386
: Persist and honor isVirtual across finish; don’t rely on live configFinish currently decides S3 vs local via dataStoreConfig.Datastore.S3Upload.enabled. Persist the chosen isVirtual at reserve and read it here for consistent behavior on retries/restarts.
@@ private def redisKeyForUploadId(datasourceId: DataSourceId): String = s"upload___${Json.stringify(Json.toJson(datasourceId))}___datasourceId" + private def redisKeyForIsVirtual(uploadId: String): String = + s"upload___${uploadId}___isVirtual" @@ def reserveUpload(...) - _ <- runningUploadMetadataStore.insert( + _ <- runningUploadMetadataStore.insert( redisKeyForDataSourceId(reserveUploadInfo.uploadId), Json.stringify(Json.toJson(newDataSourceId)) ) + _ <- runningUploadMetadataStore.insert( + redisKeyForIsVirtual(reserveUploadInfo.uploadId), + reserveUploadInfo.isVirtual + .orElse(reserveUploadInfo.needsConversion.map(nc => !nc)) // fallback: virtual if no conversion + .getOrElse(dataStoreConfig.Datastore.S3Upload.enabled) + .toString + ) @@ def finishUpload(...) - dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId) + isVirtualOpt <- runningUploadMetadataStore.find(redisKeyForIsVirtual(uploadId)).map(_.map(_.toBoolean)) + uploadToS3 = isVirtualOpt.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled) + dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId, uploadToS3) @@ def cleanUpUploadedDataset(...) - removeFromRedis(uploadId) + removeFromRedis(uploadId) @@ def removeFromRedis(uploadId: String) _ <- runningUploadMetadataStore.remove(redisKeyForDataSourceId(uploadId)) + _ <- runningUploadMetadataStore.remove(redisKeyForIsVirtual(uploadId))And adapt moveUnpackedToTarget signature to accept uploadToS3 and use it instead of reading config.
app/controllers/WKRemoteDataStoreController.scala (1)
128-154
: Require dataSource when needsConversion == false (persist and mark usable)Without this, non‑conversion uploads can leave datasets unusable if dataSourceOpt is omitted. Enforce presence and persist.
for { user <- bearerTokenService.userForToken(token) dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext) ?~> Messages("dataset.notFound", datasetId) ~> NOT_FOUND + _ <- Fox.fromBool(request.body.needsConversion || request.body.dataSourceOpt.isDefined) ?~> + "reportUpload.dataSourceMissing" _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset)) _ <- datasetService.trackNewDataset(dataset, user, request.body.needsConversion, request.body.datasetSizeBytes, viaAddRoute = false) dataSourceWithLinkedLayersOpt <- Fox.runOptional(request.body.dataSourceOpt) { implicit val ctx: DBAccessContext = AuthorizedAccessContext(user) layerToLinkService.addLayersToLinkToDataSource(_, request.body.layersToLink) }app/controllers/DatasetController.scala (4)
18-18
: Consolidate LinkedLayerIdentifier source — good changeSwitching to the shared type from uploading services avoids duplication and keeps JSON formats consistent. No action needed.
120-120
: API shape change: folderId now ObjectIdDataSourceRegistrationInfo.folderId moved from Option[String] to Option[ObjectId]. Verify all callers (FE and any scripts) send a hex ObjectId and not a plain folder path/string. If older clients exist, provide a small adapter at the boundary.
230-237
: createAndSetUpDataset usage looks correctPassing folderId as ObjectId and setting isVirtual = true matches the new model.
141-141
: Make storage‐report refresh best‐effort and retain 200 OKWrap the per‐dataset refresh so transient 4xx/5xx errors won’t fail the route after the dataset is marked usable:
- _ <- usedStorageService.refreshStorageReportForDataset(dataset) + // best‐effort: don’t fail the route if storage‐report refresh fails + _ <- usedStorageService + .refreshStorageReportForDataset(dataset) + .recover { case _ => () }Ensure the failure path of
finishUploadToPaths
still triggers the Slack notification required by #8934.webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (3)
93-117
: reserveUploadV11 mapping looks good; keep isVirtual unsetForwarding requireUniqueName and leaving isVirtual/needsConversion as None is correct for non-manual flows. No changes needed.
119-124
: Use datasetDirectoryName when resolving legacy links (prevents 404s)getDatasetId expects the directory name. Fall back to dataSetName only if datasetDirectoryName is absent. This was flagged earlier and still applies.
Apply:
- datasetId <- remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId, - legacyLayerToLink.dataSetName) + val directoryName = + legacyLayerToLink.datasetDirectoryName.getOrElse(legacyLayerToLink.dataSetName) + datasetId <- remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId, directoryName)Also add a minimal guard so we fail fast if both organizationId and organizationName are missing:
- private def adaptLayerToLink(legacyLayerToLink: LegacyLinkedLayerIdentifier): Fox[LinkedLayerIdentifier] = - for { + private def adaptLayerToLink(legacyLayerToLink: LegacyLinkedLayerIdentifier): Fox[LinkedLayerIdentifier] = + for { + _ <- Fox.fromBool(legacyLayerToLink.organizationId.isDefined || legacyLayerToLink.organizationName.isDefined) + ?~> "organization.missing" datasetId <- ...
140-146
: Set isVirtual = Some(false) for manual uploads (legacy path)Manual disk uploads should explicitly mark isVirtual = false to avoid relying on defaults.
Apply:
- Some(request.body.requireUniqueName), - None, + Some(request.body.requireUniqueName), + Some(false), needsConversion = None
...datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
171-207
: Persist and consume isVirtual across reserve/finish; also clean up on move failuresConfig flips can misroute data (S3 vs local) and break resumability. Persist isVirtual on reserve, read it at finish, and pass it into moveUnpackedToTarget. Also wrap the move step with cleanup-on-failure, like unpack/post-process. Finally, remove the persisted flag on cleanup.
Apply:
@@ private def redisKeyForDatasetId(uploadId: String): String = s"upload___${uploadId}___datasetId" + private def redisKeyForIsVirtual(uploadId: String): String = + s"upload___${uploadId}___isVirtual" @@ def reserveUpload(reserveUploadInfo: ReserveUploadInformation, _ <- runningUploadMetadataStore.insert( redisKeyForDatasetId(reserveUploadInfo.uploadId), Json.stringify(Json.toJson(reserveUploadAdditionalInfo.newDatasetId)) ) + _ <- runningUploadMetadataStore.insert( + redisKeyForIsVirtual(reserveUploadInfo.uploadId), + reserveUploadInfo.isVirtual.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled).toString + ) @@ def finishUpload(uploadInformation: UploadInformation) - dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId) + // Decide target storage based on persisted isVirtual (fallback to live config) + isVirtualOpt <- runningUploadMetadataStore.find(redisKeyForIsVirtual(uploadId)).map(_.map(_.toBoolean)) + uploadToS3 = isVirtualOpt.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled) + moveResult <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId, uploadToS3).shiftBox + _ <- cleanUpOnFailure(moveResult, datasetId, dataSourceId, needsConversion, label = s"moving dataset to target storage") + dataSourceWithAbsolutePathsOpt <- moveResult.toFox @@ private def moveUnpackedToTarget(unpackedDir: Path, needsConversion: Boolean, - datasetId: ObjectId, - dataSourceId: DataSourceId): Fox[Option[UsableDataSource]] = + datasetId: ObjectId, + dataSourceId: DataSourceId, + uploadToS3: Boolean): Fox[Option[UsableDataSource]] = @@ - newBasePath <- if (dataStoreConfig.Datastore.S3Upload.enabled) { + newBasePath <- if (uploadToS3) { @@ - endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost + endPointHost = s3UploadEndpoint.getHost newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox } yield newBasePath } else { @@ def removeFromRedis(uploadId: String): Fox[Unit] = dataSourceId <- getDataSourceIdByUploadId(uploadId) _ <- runningUploadMetadataStore.remove(redisKeyForDataSourceId(uploadId)) _ <- runningUploadMetadataStore.remove(redisKeyForDatasetId(uploadId)) + _ <- runningUploadMetadataStore.remove(redisKeyForIsVirtual(uploadId)) _ <- runningUploadMetadataStore.remove(redisKeyForLinkedLayerIdentifier(uploadId))Also applies to: 349-392, 410-451, 820-840
337-347
: Cleanup on size-exceed must actually run; current call is droppedThe call to cleanUpDatasetExceedingSize is not sequenced and won’t execute before returning a failure.
Apply:
- _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize => - tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.map(actualFileSize => - if (actualFileSize > maxFileSize.toLong) { - cleanUpDatasetExceedingSize(uploadDir, uploadId) - Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes") - } else Fox.successful(())) - } + _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize => + tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.flatMap { actualFileSize => + if (actualFileSize > maxFileSize.toLong) + cleanUpDatasetExceedingSize(uploadDir, uploadId) + .flatMap(_ => Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes")) + else Fox.successful(()) + } + }app/controllers/WKRemoteDataStoreController.scala (1)
210-227
: Await DB delete and used-storage refreshCurrent code builds a Future chain and discards it; side-effects may never run.
Apply:
- _ = datasetDAO - .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0) - .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset)) + _ <- datasetDAO + .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0) + .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
app/controllers/DatasetController.scala
(7 hunks)app/controllers/WKRemoteDataStoreController.scala
(5 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(19 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
app/controllers/WKRemoteDataStoreController.scala (8)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (3)
AuthorizedAccessContext
(19-21)DBAccessContext
(31-37)GlobalAccessContext
(25-29)util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (2)
ObjectId
(11-13)ObjectId
(15-66)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (4)
DataSource
(31-43)UnusableDataSource
(45-62)UnusableDataSource
(64-66)isUsable
(20-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
DataSourceStatus
(3-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
ReportDatasetUploadParameters
(70-75)ReportDatasetUploadParameters
(76-79)app/models/dataset/LayerToLinkService.scala (2)
validateLayerToLink
(22-32)addLayersToLinkToDataSource
(32-42)app/models/dataset/DatasetService.scala (2)
createAndSetUpDataset
(100-123)trackNewDataset
(548-560)app/models/dataset/Dataset.scala (3)
findOne
(203-210)updateDataSource
(633-655)deleteDataset
(708-736)
app/controllers/DatasetController.scala (2)
app/models/storage/UsedStorageService.scala (2)
UsedStorageService
(21-103)refreshStorageReportForDataset
(89-103)app/models/dataset/DatasetService.scala (2)
createAndSetUpDataset
(100-123)trackNewDataset
(548-560)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (4)
PathUtils
(14-14)deleteDirectoryRecursively
(186-193)ensureDirectoryBox
(124-132)listFilesRecursive
(104-108)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (2)
DataStoreConfig
(11-78)S3Upload
(64-68)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
S3DataVault
(46-155)S3DataVault
(157-250)getCredential
(143-143)hostBucketFromUri
(168-179)create
(158-166)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath
(54-96)fromLocalPath
(80-80)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
UnfinishedUpload
(7-16)UnfinishedUpload
(18-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
DSRemoteWebknossosClient
(57-216)reportUpload
(98-105)getCredential
(185-196)deleteDataset
(138-143)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader
(8-64)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
deleteOnDisk
(28-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: backend-tests
- GitHub Check: frontend-tests
- GitHub Check: build-smoketest-push
🔇 Additional comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
431-439
: Use normalized endpoint host for S3 base pathDon’t re-parse the config string here; use s3UploadEndpoint to ensure scheme/host/port are consistent.
Apply:
- endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost + endPointHost = s3UploadEndpoint.getHost
533-544
: Preserve scheme/port when building S3 endpointHardcoding https and dropping port breaks MinIO/custom endpoints.
Apply:
- private lazy val s3UploadEndpoint: URI = { - val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName) - new URI( - "https", - null, - credentialUri.getHost, - -1, - null, - null, - null - ) - } + private lazy val s3UploadEndpoint: URI = + new URI(dataStoreConfig.Datastore.S3Upload.credentialName)app/controllers/DatasetController.scala (1)
230-243
: trackNewDataset not awaited and viaAddRoute should be true for Add routeCurrent code doesn’t await side-effects (Slack/analytics) and mislabels the source.
Apply:
- ) - _ = datasetService.trackNewDataset(dataset, - user, - needsConversion = false, - datasetSizeBytes = 0, - viaAddRoute = false) + ) + _ <- datasetService.trackNewDataset(dataset, + user, + needsConversion = false, + datasetSizeBytes = 0, + viaAddRoute = true)app/controllers/WKRemoteDataStoreController.scala (1)
125-154
: Require dataSource when needsConversion == false; also await trackingWithout enforcing dataSource for non-conversion uploads, datasets remain unusable. Also, trackNewDataset must be awaited.
Apply:
def reportDatasetUpload(name: String, key: String, token: String, datasetId: ObjectId): Action[ReportDatasetUploadParameters] = Action.async(validateJson[ReportDatasetUploadParameters]) { implicit request => dataStoreService.validateAccess(name, key) { _ => for { user <- bearerTokenService.userForToken(token) dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext) ?~> Messages("dataset.notFound", datasetId) ~> NOT_FOUND - _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset)) - _ = datasetService.trackNewDataset(dataset, - user, - request.body.needsConversion, - request.body.datasetSizeBytes, - viaAddRoute = false) + _ <- Fox.fromBool(request.body.needsConversion || request.body.dataSourceOpt.isDefined) ?~> + "reportUpload.dataSourceMissing" + _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset)) + _ <- datasetService.trackNewDataset(dataset, + user, + request.body.needsConversion, + request.body.datasetSizeBytes, + viaAddRoute = false) dataSourceWithLinkedLayersOpt <- Fox.runOptional(request.body.dataSourceOpt) { implicit val ctx: DBAccessContext = AuthorizedAccessContext(user) layerToLinkService.addLayersToLinkToDataSource(_, request.body.layersToLink) } _ <- Fox.runOptional(dataSourceWithLinkedLayersOpt) { dataSource => logger.info(s"Updating dataset $datasetId in database after upload reported from datastore $name.") datasetDAO.updateDataSource(datasetId, dataset._dataStore, dataSource.hashCode(), dataSource, isUsable = true)(GlobalAccessContext) } } yield Ok } }
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
app/controllers/WKRemoteDataStoreController.scala
(6 hunks)app/models/dataset/Dataset.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
app/models/dataset/Dataset.scala (2)
app/utils/sql/SecuredSQLDAO.scala (2)
existingCollectionName
(16-16)assertUpdateAccess
(36-48)app/utils/sql/SqlInterpolation.scala (2)
q
(20-39)asUpdate
(74-74)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)
webknossos
(70-72)validateAccessFromTokenContext
(87-94)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)
LinkedLayerIdentifier
(81-81)LinkedLayerIdentifier
(83-85)ReserveUploadInformation
(46-59)ReserveUploadInformation
(60-62)util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
fromFuture
(42-53)successful
(53-56)util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (1)
fromStringSync
(22-26)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
getDatasetId
(206-216)
app/controllers/WKRemoteDataStoreController.scala (11)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (3)
AuthorizedAccessContext
(19-21)DBAccessContext
(31-37)GlobalAccessContext
(25-29)util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (2)
ObjectId
(11-13)ObjectId
(15-66)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (4)
DataSource
(31-43)UnusableDataSource
(45-62)UnusableDataSource
(64-66)isUsable
(20-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
DataSourceStatus
(3-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
ReportDatasetUploadParameters
(70-75)ReportDatasetUploadParameters
(76-79)app/models/storage/UsedStorageService.scala (2)
UsedStorageService
(21-103)refreshStorageReportForDataset
(89-103)app/models/dataset/LayerToLinkService.scala (2)
validateLayerToLink
(22-32)addLayersToLinkToDataSource
(32-42)app/models/dataset/DatasetService.scala (2)
createAndSetUpDataset
(100-123)trackNewDataset
(548-560)util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)app/security/WebknossosBearerTokenAuthenticatorService.scala (1)
userForToken
(76-84)app/models/dataset/Dataset.scala (3)
findOne
(203-210)updateDataSource
(634-656)deleteDataset
(709-737)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
- GitHub Check: frontend-tests
🔇 Additional comments (2)
app/controllers/WKRemoteDataStoreController.scala (1)
140-151
: Re-introduce thedataSource
guard for non-conversion uploads.If
needsConversion
is false anddataSourceOpt
is missing, we leave the dataset backed by the placeholderUnusableDataSource
, so it remains unusable despite reporting success. The previous guard prevented this, but it’s gone now. Please add it back before processing optional linkage.for { user <- bearerTokenService.userForToken(token) dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext) ?~> Messages("dataset.notFound", datasetId) ~> NOT_FOUND + _ <- Fox.fromBool(request.body.needsConversion || request.body.dataSourceOpt.isDefined) ?~> + "reportUpload.dataSourceMissing" _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset))webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (1)
127-130
: PreferdatasetDirectoryName
when available.
remoteWebknossosClient.getDatasetId
expects the directory name. When clients supplydatasetDirectoryName
, we must use it, otherwise legacy uploads break as soon as friendly names differ from directory names.- remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId, legacyLayerToLink.dataSetName) + val directoryName = + legacyLayerToLink.datasetDirectoryName.getOrElse(legacyLayerToLink.dataSetName) + remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId, directoryName)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for implementing this. Including @frcroth :)
Here is my code feedback / questions. Testing is TODO
...datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Outdated
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
...datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (7)
app/models/dataset/Dataset.scala (1)
667-672
: Verify that worker flows use GlobalAccessContext when calling makeVirtual.This method asserts update access, which could block convert_to_wkw worker paths if they don't use GlobalAccessContext. Confirm that all call sites (especially worker flows) either have update access or use GlobalAccessContext.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
188-198
: Verify that Slack notification is triggered only on failures.The
log(Some(slackNotificationService.noticeFailedFinishUpload))
wrapper may trigger notifications even on successful 200 OK responses. Confirm that the notification is only sent on failures.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
352-393
: Verify cleanup on moveUnpackedToTarget failure and config consistency.Two concerns:
- If
moveUnpackedToTarget
fails, the localunpackToDir
may not be cleaned up. Consider wrapping the move result incleanUpOnFailure
similar to unpack/post-processing steps.- The method uses
dataStoreConfig.Datastore.S3Upload.enabled
directly instead of reading a persistedisVirtual
flag from Redis. If the config changes between reserve and finish, the behavior may be inconsistent. Consider persistingisVirtual
at reserve time and reading it here.
410-451
: Verify cleanup on S3 upload failure.If the S3 upload (line 432) or path rewrite (line 446) fails, the local
unpackedDir
may not be cleaned up. Consider wrapping the S3 branch in a recovery handler that deletes the local directory and callsremoteWebknossosClient.deleteDataset(datasetId)
on failure.
546-562
: Fail fast on missing S3 credentials.The current code extracts
accessKeyId
andsecretAccessKey
asBox[Option[String]]
, which can succeed withNone
values and create an invalid S3 client. Validates3UploadCredentialsOpt
before building the client and fail with a clear error if missing.
533-544
: Preserve scheme and port in s3UploadEndpoint.The endpoint URI construction hardcodes
"https"
and port-1
, which will break non-HTTPS or non-default-port S3 endpoints (e.g., MinIO). Preserve the original scheme and port fromcredentialName
.
436-437
: Use s3UploadEndpoint for consistent host extraction.Line 436 extracts the host directly from the
credentialName
URI, buts3UploadEndpoint
(lines 533-544) normalizes the URI. For consistency, uses3UploadEndpoint.getHost
instead of re-parsingcredentialName
.Apply this diff:
- endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost + endPointHost = s3UploadEndpoint.getHost
🧹 Nitpick comments (1)
unreleased_changes/8912.md (1)
8-15
: Add language identifier to the code block.The code block should specify a language for proper syntax highlighting and to satisfy markdown linting rules (MD040).
Based on static analysis hints.
Apply this diff:
-``` +```hocon s3Upload { enabled = false # Use the name of a credential in the dataVaults section here to use it for uploads. credentialName = "s3://example/uri/prefix" objectKeyPrefix = "webknossos-uploads" }</blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: CodeRabbit UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 4bfb9a225cf601fccc3c5f89bc57de92217dcb12 and 2aebc3497affc6de63adcb7921f52bb5f7970706. </details> <details> <summary>📒 Files selected for processing (6)</summary> * `app/models/dataset/ComposeService.scala` (2 hunks) * `app/models/dataset/Dataset.scala` (2 hunks) * `unreleased_changes/8912.md` (1 hunks) * `webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala` (7 hunks) * `webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala` (1 hunks) * `webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala` (19 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (1)</summary> <details> <summary>📚 Learning: 2025-04-28T14:18:04.368Z</summary>
Learnt from: frcroth
PR: #8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad,Fox.fromBool()
expressions should use the<-
binding operator instead of the=
assignment operator to properly propagate error conditions. Using=
will cause validation failures to be silently ignored.**Applied to files:** - `webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala` </details> </details><details> <summary>🧬 Code graph analysis (4)</summary> <details> <summary>app/models/dataset/Dataset.scala (3)</summary><blockquote> <details> <summary>app/models/project/Project.scala (1)</summary> * `findOneByNameAndOrganization` (115-125) </details> <details> <summary>app/utils/sql/SecuredSQLDAO.scala (3)</summary> * `readAccessQuery` (23-36) * `existingCollectionName` (16-16) * `assertUpdateAccess` (36-48) </details> <details> <summary>app/utils/sql/SqlInterpolation.scala (2)</summary> * `q` (20-39) * `asUpdate` (74-74) </details> </blockquote></details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (6)</summary><blockquote> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (6)</summary> * `webknossos` (70-72) * `validateAccessFromTokenContext` (87-94) * `UserAccessRequest` (30-30) * `UserAccessRequest` (31-72) * `writeDataset` (52-55) * `deleteDataset` (49-52) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/DSSlackNotificationService.scala (2)</summary> * `DSSlackNotificationService` (8-31) * `noticeFailedFinishUpload` (26-31) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)</summary> * `reserveDataSourceUpload` (123-132) * `deleteDataset` (138-143) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)</summary> * `reserveUpload` (171-212) * `getDatasetIdByUploadId` (168-171) * `extractDatasetUploadId` (157-157) * `finishUpload` (352-393) * `cancelUpload` (321-335) * `isKnownUpload` (154-157) </details> <details> <summary>util/src/main/scala/com/scalableminds/util/requestlogging/RequestLogging.scala (1)</summary> * `log` (55-62) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/SlackClient.scala (1)</summary> * `warn` (18-21) </details> </blockquote></details> <details> <summary>app/models/dataset/ComposeService.scala (3)</summary><blockquote> <details> <summary>app/models/dataset/DatasetService.scala (1)</summary> * `createAndSetUpDataset` (100-123) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)</summary> * `mapped` (104-133) </details> <details> <summary>webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (1)</summary> * `coordinateTransformations` (85-85) </details> </blockquote></details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (7)</summary><blockquote> <details> <summary>util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)</summary> * `PathUtils` (14-14) * `deleteDirectoryRecursively` (186-193) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)</summary> * `DataStoreConfig` (11-78) * `Datastore` (18-70) * `S3Upload` (64-68) * `DataVaults` (61-63) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)</summary> * `S3DataVault` (46-182) * `S3DataVault` (184-277) * `getCredential` (170-170) * `hostBucketFromUri` (195-206) * `create` (185-193) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)</summary> * `UPath` (54-96) * `fromLocalPath` (80-80) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)</summary> * `UnfinishedUpload` (7-16) * `UnfinishedUpload` (18-20) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)</summary> * `DSRemoteWebknossosClient` (57-216) * `reportUpload` (98-105) * `getCredential` (185-196) * `deleteDataset` (138-143) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)</summary> * `CredentialConfigReader` (8-64) </details> </blockquote></details> </details><details> <summary>🪛 markdownlint-cli2 (0.18.1)</summary> <details> <summary>unreleased_changes/8912.md</summary> 8-8: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)</summary> * GitHub Check: build-smoketest-push * GitHub Check: frontend-tests </details> <details> <summary>🔇 Additional comments (19)</summary><blockquote> <details> <summary>app/models/dataset/ComposeService.scala (2)</summary><blockquote> `50-55`: **LGTM! Virtual dataset creation aligns with PR objectives.** The refactor from `createVirtualDataset` to `createAndSetUpDataset` with `isVirtual = true` correctly enforces the virtual dataset model. The `targetFolderId` type change to `ObjectId` is consistent with the method signature and improves type safety. --- `71-73`: **LGTM! Simplified layer editing using polymorphic `mapped` method.** The refactor eliminates explicit pattern matching by leveraging the `StaticLayer.mapped` method, which internally handles both `StaticColorLayer` and `StaticSegmentationLayer` types. This reduces code duplication and improves maintainability while preserving the coordinate transformation logic. </blockquote></details> <details> <summary>app/models/dataset/Dataset.scala (1)</summary><blockquote> `446-458`: **LGTM: Backwards‑compatible lookup by directoryName.** The method now looks up datasets by `directoryName` instead of `name`, which aligns with the shift to datasetId-centric operations. The comment correctly explains that this is for backwards compatibility with legacy links. </blockquote></details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (5)</summary><blockquote> `36-36`: **LGTM: Slack notification service injected.** The DSSlackNotificationService is properly injected for use in failure logging. Also applies to: 67-67 --- `100-103`: **LGTM: Reserve flow updated to use for-comprehension.** The reserve flow now awaits `reserveUploadAdditionalInfo` before calling `uploadService.reserveUpload`, which is more readable and correct. --- `153-155`: **LGTM: Chunk upload flow updated to datasetId.** The chunk upload and test chunk flows now correctly use `getDatasetIdByUploadId` and `UserAccessRequest.writeDataset(datasetId)`. Also applies to: 176-177 --- `202-214`: **LGTM: Cancel upload flow updated to datasetId.** The cancel upload flow now correctly uses `getDatasetIdByUploadId` and `UserAccessRequest.deleteDataset(datasetId)`. --- `405-410`: **LGTM: Delete on disk flow updated to datasetId.** The delete on disk flow now correctly uses `dsRemoteWebknossosClient.deleteDataset(datasetId)` and the log message references `datasetId`. </blockquote></details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1)</summary><blockquote> `195-195`: **LGTM: hostBucketFromUri made public for S3 upload support.** The visibility change allows `UploadService` and other modules to extract bucket names from S3 URIs during the upload workflow. </blockquote></details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (10)</summary><blockquote> `46-101`: **LGTM: Data models updated for datasetId‑centric flow.** The new and updated data models correctly reflect the shift to datasetId-based operations and S3 upload support. JSON formats are properly defined. --- `103-108`: **LGTM: DataStoreConfig dependency added.** The `DataStoreConfig` dependency is correctly injected to support S3 upload configuration. --- `143-144`: **LGTM: Redis key helper added for datasetId.** The `redisKeyForDatasetId` helper is consistent with existing Redis key patterns. --- `159-164`: **LGTM: Upload directory helpers updated.** The upload directory helpers now support the backup feature by creating backup directories in the `.trash` folder. --- `168-169`: **LGTM: DatasetId retrieval method added.** The `getDatasetIdByUploadId` method is consistent with the existing `getDataSourceIdByUploadId` pattern. --- `171-210`: **LGTM: reserveUpload updated for datasetId and layer linking.** The method now correctly validates that `needsConversion` and `layersToLink` are not both set, and stores the `datasetId` in Redis. --- `263-319`: **LGTM: handleUploadChunk updated to datasetId.** The chunk upload handler now correctly uses `getDatasetIdByUploadId` and includes `datasetId` in log messages. --- `321-335`: **LGTM: cancelUpload updated to datasetId.** The cancel upload handler now correctly uses `getDatasetIdByUploadId` and includes `datasetId` in log messages. --- `602-628`: **LGTM: Cleanup logic includes dataset deletion.** The cleanup logic correctly deletes the local directory and calls `remoteWebknossosClient.deleteDataset(datasetId)` on failure. --- `802-804`: **LGTM: Raw uploaded data backed up for debugging.** The backup feature correctly copies the raw uploaded data to a `.trash` directory before processing. </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
171-210
: PersistisVirtual
to Redis for consistent finish behavior.While
isVirtual
is present inReserveUploadInformation
, it's not being persisted to Redis. This meansfinishUpload
cannot read the reserved value and must rely on live config (dataStoreConfig.Datastore.S3Upload.enabled
), risking inconsistency if the config changes between reserve and finish operations.Add persistence and retrieval:
+ private def redisKeyForIsVirtual(uploadId: String): String = + s"upload___${uploadId}___isVirtual" + def reserveUpload(reserveUploadInfo: ReserveUploadInformation, reserveUploadAdditionalInfo: ReserveAdditionalInformation): Fox[Unit] = for { _ <- dataSourceService.assertDataDirWritable(reserveUploadInfo.organization) _ <- Fox.fromBool( !reserveUploadInfo.needsConversion.getOrElse(false) || !reserveUploadInfo.layersToLink .exists(_.nonEmpty)) ?~> "Cannot use linked layers if the dataset needs conversion" @@ _ <- runningUploadMetadataStore.insert( redisKeyForDatasetId(reserveUploadInfo.uploadId), Json.stringify(Json.toJson(reserveUploadAdditionalInfo.newDatasetId)) ) + _ <- runningUploadMetadataStore.insert( + redisKeyForIsVirtual(reserveUploadInfo.uploadId), + reserveUploadInfo.isVirtual.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled).toString + )Then in
finishUpload
, read and pass it tomoveUnpackedToTarget
:def finishUpload(uploadInformation: UploadInformation)(implicit tc: TokenContext): Fox[ObjectId] = { val uploadId = uploadInformation.uploadId for { dataSourceId <- getDataSourceIdByUploadId(uploadId) datasetId <- getDatasetIdByUploadId(uploadId) linkedLayerIdentifiers <- getObjectFromRedis[LinkedLayerIdentifiers](redisKeyForLinkedLayerIdentifier(uploadId)) + isVirtualOpt <- runningUploadMetadataStore.find(redisKeyForIsVirtual(uploadId)).map(_.map(_.toBoolean)) + uploadToS3 = isVirtualOpt.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled) @@ - dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId) + dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, uploadToS3, datasetId, dataSourceId)Also update
removeFromRedis
(around line 814) to clean up the new key, and updatemoveUnpackedToTarget
signature to accept theuploadToS3
parameter instead of reading config directly.Based on learnings
♻️ Duplicate comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
529-540
: Hardcoded HTTPS scheme breaks non-HTTPS S3 endpoints.Line 532 forces
"https"
and line 535 drops the port (-1
), which will break MinIO or any custom S3-compatible endpoint using HTTP or non-standard ports.Preserve the original scheme and port:
private lazy val s3UploadEndpoint: URI = { val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName) new URI( - "https", + credentialUri.getScheme, null, credentialUri.getHost, - -1, + credentialUri.getPort, null, null, null ) }Alternatively, if the credential name already contains the full endpoint URI, just use it directly:
- private lazy val s3UploadEndpoint: URI = { - val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName) - new URI( - "https", - null, - credentialUri.getHost, - -1, - null, - null, - null - ) - } + private lazy val s3UploadEndpoint: URI = + new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
542-558
: Improve error message for missing S3 credentials.If
s3UploadCredentialsOpt
isNone
, the for-comprehension fails with an empty Box, which doesn't provide a clear error message.private lazy val getS3TransferManager: Box[S3TransferManager] = for { - accessKeyId <- Box(s3UploadCredentialsOpt.map(_._1)) - secretAccessKey <- Box(s3UploadCredentialsOpt.map(_._2)) + creds <- Box(s3UploadCredentialsOpt) ?~ s"S3 upload enabled but no credentials found for '${dataStoreConfig.Datastore.S3Upload.credentialName}'" + (accessKeyId, secretAccessKey) = creds client <- tryo( S3AsyncClient .builder() .credentialsProvider(StaticCredentialsProvider.create( AwsBasicCredentials.builder.accessKeyId(accessKeyId).secretAccessKey(secretAccessKey).build() )) .crossRegionAccessEnabled(true) .forcePathStyle(true) .endpointOverride(s3UploadEndpoint) .region(Region.US_EAST_1) // Disabling checksum calculation prevents files being stored with Content Encoding "aws-chunked". .requestChecksumCalculation(RequestChecksumCalculation.WHEN_REQUIRED) .build()) } yield S3TransferManager.builder().s3Client(client).build()
408-447
: Use parseds3UploadEndpoint
for host lookup
- Replace
- endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost + endPointHost = s3UploadEndpoint.getHost- Keep the
synchronized
block:unpackToDirFor
returns a shared directory per organization, so synchronization is required unless you refactor to use unique per-upload paths.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(18 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-28T14:18:04.368Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, `Fox.fromBool()` expressions should use the `<-` binding operator instead of the `=` assignment operator to properly propagate error conditions. Using `=` will cause validation failures to be silently ignored.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (10)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (4)
PathUtils
(14-14)ensureDirectory
(118-122)deleteDirectoryRecursively
(186-193)ensureDirectoryBox
(124-132)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
DataStoreConfig
(11-78)S3Upload
(64-68)DataVaults
(61-63)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
S3DataVault
(46-182)S3DataVault
(184-277)getCredential
(170-170)hostBucketFromUri
(195-206)create
(185-193)webknossos-datastore/app/com/scalableminds/webknossos/datastore/explore/ExploreLocalLayerService.scala (1)
ExploreLocalLayerService
(21-152)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath
(54-96)fromLocalPath
(80-80)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
UnfinishedUpload
(7-16)UnfinishedUpload
(18-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (3)
DSRemoteWebknossosClient
(57-216)reportUpload
(98-105)getCredential
(185-196)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader
(8-64)util/src/main/scala/com/scalableminds/util/time/Instant.scala (2)
now
(48-48)logSince
(72-75)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
deleteOnDisk
(28-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
- GitHub Check: backend-tests
🔇 Additional comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)
46-101
: LGTM! Data models align well with PR objectives.The new and updated case classes properly support dataset-ID-centric operations and S3 upload configuration. The addition of
isVirtual
,needsConversion
, andLinkedLayerIdentifier
structures enables the transition away from symlink-based layer linking.
393-406
: LGTM! File cleanup logic correctly filters unreferenced files.The logic correctly identifies files referenced by datasource layers (explicit paths + implicit mag paths) and safely deletes files not in that set. This aligns with the PR objective to avoid uploading unnecessary files.
The
startsWith
check at line 594 appropriately handles directory prefixes.Also applies to: 576-596
798-800
: Good practice: Raw upload backup for debugging.Backing up raw uploaded data to
.trash
before processing enables debugging failed uploads. The regular cleanup by cronjob prevents unbounded growth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(20 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-28T14:18:04.368Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, `Fox.fromBool()` expressions should use the `<-` binding operator instead of the `=` assignment operator to properly propagate error conditions. Using `=` will cause validation failures to be silently ignored.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (5)
PathUtils
(14-14)deleteDirectoryRecursively
(186-193)ensureDirectoryBox
(124-132)listFilesRecursive
(104-108)listFiles
(101-102)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
DataStoreConfig
(11-78)Datastore
(18-70)S3Upload
(64-68)DataVaults
(61-63)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
S3DataVault
(46-182)S3DataVault
(184-277)getCredential
(170-170)hostBucketFromUri
(195-206)create
(185-193)webknossos-datastore/app/com/scalableminds/webknossos/datastore/explore/ExploreLocalLayerService.scala (1)
ExploreLocalLayerService
(21-152)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath
(54-96)fromLocalPath
(80-80)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
DSRemoteWebknossosClient
(57-216)reportUpload
(98-105)getCredential
(185-196)deleteDataset
(138-143)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader
(8-64)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
deleteOnDisk
(28-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for adding more logging.
Did not retest as I saw no critial changes
Description
Steps to test:
yarn enable-jobs
and a workerTODOs:
Issues:
fixes Support dataset upload to S3 #8415
fixes Slack notification for failed finishUpload calls #8934
fixes layersToLink should reference datasets by id #8893
fixes LayersToLink should not create symlinks but instead set paths in datasource #8749
Follow-up issue Slim down finishUpload route #8979
$PR_NUMBER.md
file inunreleased_changes
or use./tools/create-changelog-entry.py
)