-
Notifications
You must be signed in to change notification settings - Fork 14
WIP: add basic support for RFC5 transformations #243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
thewtex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jo-mueller beautiful, thank you!!
A few comments inline.
|
|
||
|
|
||
| @dataclass | ||
| class Dataset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a generic name -- what is this supposed to represent? Can we make it more specific?
| else: | ||
| dims = tuple(a["name"] if "name" in a else a for a in metadata["axes"]) | ||
| from .v04.zarr_metadata import supported_dims | ||
| dims = list(reversed(supported_dims)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want dims to be a tuple (immutable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This WIP PR introduces basic support for RFC5 (OME-NGFF 0.6) transformations by creating new v06 metadata classes that restructure coordinate system information. The key architectural change moves from a flat axes list to a nested coordinateSystems structure containing axes, with transformations organized as sequences mapping between coordinate systems. The PR modifies reading, writing, and metadata preparation logic to handle this new structure alongside existing v04/v05 formats.
Key changes:
- Introduces v06 metadata classes with
CoordinateSystemandTransformSequencetypes - Updates metadata parsing to conditionally handle
coordinateSystemsvsaxes - Modifies transformation handling to use sequence-based transformations with input/output coordinate systems
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 12 comments.
| File | Description |
|---|---|
| py/ngff_zarr/v06/zarr_metadata.py | New file defining v06 metadata dataclasses including CoordinateSystem, Scale, Translation, TransformSequence, Dataset, and Metadata for RFC5 support |
| py/ngff_zarr/to_ngff_zarr.py | Adds v06 metadata import and updates _pop_metadata_optionals to handle coordinateSystems structure; adds dimension_names extraction for v06 |
| py/ngff_zarr/to_multiscales.py | Changes imports to use v06 metadata classes and modifies dataset creation to use TransformSequence with input/output coordinate systems |
| py/ngff_zarr/from_ngff_zarr.py | Updates metadata reading logic to detect and parse coordinateSystems; adds v06-specific metadata construction and dataset handling |
| from .rich_dask_progress import NgffProgress, NgffProgressCallback | ||
| from .to_ngff_image import to_ngff_image | ||
| from .v04.zarr_metadata import Axis, Dataset, Metadata, Scale, Translation | ||
| from .v06.zarr_metadata import coordinateSystem, Dataset, Metadata, Scale, Translation, TransformSequence, Axis |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import statement imports only v06 metadata classes, breaking compatibility with v04/v05 workflows. The line now imports from .v06.zarr_metadata import coordinateSystem, Dataset, Metadata, Scale, Translation, TransformSequence, Axis, which shadows the v04 imports that were previously used. This will cause issues when to_multiscales() is called in v04/v05 contexts since the returned Multiscales object will always have v06 Metadata, but downstream code may expect v04/v05 metadata. Consider making this version-aware or importing with aliases to avoid shadowing.
| from .v06.zarr_metadata import coordinateSystem, Dataset, Metadata, Scale, Translation, TransformSequence, Axis | |
| from .v04 import zarr_metadata as zarr_metadata_v04 | |
| from .v05 import zarr_metadata as zarr_metadata_v05 | |
| from .v06 import zarr_metadata as zarr_metadata_v06 |
| if "axes" in metadata: | ||
| dims = [a["name"] if "name" in a else a for a in metadata["axes"]] | ||
| elif "coordinateSystems" in metadata: | ||
| dims = [a["name"] for a in metadata["coordinateSystems"][0]["axes"]] | ||
| else: | ||
| dims = tuple(a["name"] if "name" in a else a for a in metadata["axes"]) | ||
| from .v04.zarr_metadata import supported_dims | ||
| dims = list(reversed(supported_dims)) |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type inconsistency: dims is constructed as a list at lines 121, 123, and 126, but is likely expected to be a tuple elsewhere in the codebase. In the original code (before changes), line 125 used tuple(reversed(supported_dims)) and line 125 (now deleted) used tuple(...). Consider wrapping the list assignments with tuple() to maintain type consistency, e.g., dims = tuple([a["name"] if "name" in a else a for a in metadata["axes"]]).
|
|
||
| metadata_obj = Metadata( | ||
| datasets=datasets, | ||
| coordinateSystems=[metadata["coordinateSystems"][0]], |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential KeyError: accessing metadata["coordinateSystems"][0] without checking if the key exists or if the list is non-empty. While this code path is only executed when version.startswith("0.6"), and the dims extraction at line 122-123 suggests coordinateSystems exists, there's no guarantee that:
- The key still exists in metadata at this point (it could have been popped)
- The list is non-empty
Consider adding a check or using .get() with a default value to prevent runtime errors, e.g., coordinateSystems=metadata.get("coordinateSystems", [{}])[:1] if "coordinateSystems" in metadata else [].
| coordinateSystems=[metadata["coordinateSystems"][0]], | |
| coordinateSystems=[(metadata.get("coordinateSystems") or [{}])[0]], |
| input: Optional[Union[None, str, coordinateSystem]] = None | ||
| output: Optional[Union[None, str, coordinateSystem]] = None |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type annotation Optional[Union[None, str, coordinateSystem]] is redundant. Optional[X] already means Union[X, None], so including None explicitly in the Union creates a double-optional. This should be either Optional[Union[str, coordinateSystem]] or Union[None, str, coordinateSystem] (though the former is more idiomatic).
| axes = [Axis(name=axis, type=type_dict[axis]) for axis in metadata["axes"]] | ||
| elif "coordinateSystems" in metadata: | ||
| axes = [ | ||
| Axis(name=axis["name"], type="space") # Default to space if type not given |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing axis type information handling for v06 metadata. The code defaults all axes to type="space" when reading from coordinateSystems, but this loses important type information (time, channel, space distinctions). The v06 spec likely includes type information in the axes. Consider reading the type from axis.get("type", "space") if available, similar to how v04/v05 handles it.
| Axis(name=axis["name"], type="space") # Default to space if type not given | |
| Axis(name=axis["name"], type=axis.get("type", "space")) # Use type if present, else default to space |
| if "axes" in metadata_dict: | ||
| for ax in metadata_dict["axes"]: | ||
| if ax["unit"] is None: | ||
| ax.pop("unit") | ||
|
|
||
| # Handle RFC 4: Remove orientation if RFC 4 is not enabled | ||
| if not is_rfc4_enabled(enabled_rfcs) and "orientation" in ax: | ||
| ax.pop("orientation") | ||
| # Handle RFC 4: Remove orientation if RFC 4 is not enabled | ||
| if not is_rfc4_enabled(enabled_rfcs) and "orientation" in ax: | ||
| ax.pop("orientation") | ||
|
|
||
| if "coordinateSystems" in metadata_dict: | ||
| for cs in metadata_dict["coordinateSystems"]: | ||
| for ax in cs["axes"]: | ||
| if ax.get("unit") is None: | ||
| ax.pop("unit") |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent unit handling: uses ax["unit"] at line 54 but ax.get("unit") at line 64. For coordinateSystems axes, the code safely uses .get() to avoid KeyError if "unit" is missing, but for regular axes it directly accesses ax["unit"] which could raise KeyError. Consider using ax.get("unit") for consistency and safety at line 54 as well, or ensure the dict access is safe.
| @@ -0,0 +1,55 @@ | |||
| from dataclasses import dataclass | |||
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing SPDX license headers. All Python source files in this repository should include the standard SPDX license headers at the top:
# SPDX-FileCopyrightText: Copyright (c) Fideus Labs LLC
# SPDX-License-Identifier: MITThis is consistent with other files like py/ngff_zarr/v04/zarr_metadata.py and py/ngff_zarr/v05/zarr_metadata.py.
| from ..v04.zarr_metadata import Axis, Omero, MethodMetadata | ||
|
|
||
| @dataclass | ||
| class coordinateSystem: |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class name coordinateSystem does not follow PEP 8 naming conventions. Class names should use CapWords (PascalCase) convention. This should be renamed to CoordinateSystem to be consistent with other class names like Scale, Translation, and TransformSequence in this file, as well as with existing v04/v05 classes like Axis, Dataset, and Metadata.
| input: Optional[Union[None, str, coordinateSystem]] = None | ||
| output: Optional[Union[None, str, coordinateSystem]] = None |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type annotation Optional[Union[None, str, coordinateSystem]] is redundant. Optional[X] already means Union[X, None], so including None explicitly in the Union creates a double-optional. This should be either Optional[Union[str, coordinateSystem]] or Union[None, str, coordinateSystem] (though the former is more idiomatic).
Co-authored-by: Copilot <[email protected]>
Coded myself a bit into a corner over at #238 , so started from scratch with bringing RFC5-like metadata here. A bunch of tests are still failing, but I think for the time being some feedback on whether the path I pursued in implementing is how you (@thewtex ) would see it happening?
I think the biggest challenge is that right now, the two main supported versions (0.4 and 0.5) are quite similar in their metadata layouts so that a lot of the internal metadata parsing doesn't have to account for the actual version. I.e.,
axesis always undermultiscales > axesand the only difference is the presence of a toplevelomekey. In 0.6, this will be quite different, and dragging the checks for the existence ofaxes, one or morecoordinateSystemsandomethrough the whole code introduces a lot of boilerplate.I don't have a good solution for this problem rn - the only thing I could think of would be a more elaborated metadata modelling with separate parsers for 0.4/0.5/0.6 metadata and, more importantly, conversion routines.
Anyway, would love to hear your thoughts on the matter 👍