Skip to content

Conversation

@jo-mueller
Copy link
Contributor

Coded myself a bit into a corner over at #238 , so started from scratch with bringing RFC5-like metadata here. A bunch of tests are still failing, but I think for the time being some feedback on whether the path I pursued in implementing is how you (@thewtex ) would see it happening?

I think the biggest challenge is that right now, the two main supported versions (0.4 and 0.5) are quite similar in their metadata layouts so that a lot of the internal metadata parsing doesn't have to account for the actual version. I.e., axes is always under multiscales > axes and the only difference is the presence of a toplevel ome key. In 0.6, this will be quite different, and dragging the checks for the existence of axes, one or more coordinateSystems and ome through the whole code introduces a lot of boilerplate.

I don't have a good solution for this problem rn - the only thing I could think of would be a more elaborated metadata modelling with separate parsers for 0.4/0.5/0.6 metadata and, more importantly, conversion routines.

Anyway, would love to hear your thoughts on the matter 👍

@jo-mueller jo-mueller changed the title add basic support for RFC5 transformations WIP: add basic support for RFC5 transformations Nov 26, 2025
Copy link
Collaborator

@thewtex thewtex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jo-mueller beautiful, thank you!!

A few comments inline.



@dataclass
class Dataset:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a generic name -- what is this supposed to represent? Can we make it more specific?

else:
dims = tuple(a["name"] if "name" in a else a for a in metadata["axes"])
from .v04.zarr_metadata import supported_dims
dims = list(reversed(supported_dims))
Copy link
Collaborator

@thewtex thewtex Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want dims to be a tuple (immutable)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This WIP PR introduces basic support for RFC5 (OME-NGFF 0.6) transformations by creating new v06 metadata classes that restructure coordinate system information. The key architectural change moves from a flat axes list to a nested coordinateSystems structure containing axes, with transformations organized as sequences mapping between coordinate systems. The PR modifies reading, writing, and metadata preparation logic to handle this new structure alongside existing v04/v05 formats.

Key changes:

  • Introduces v06 metadata classes with CoordinateSystem and TransformSequence types
  • Updates metadata parsing to conditionally handle coordinateSystems vs axes
  • Modifies transformation handling to use sequence-based transformations with input/output coordinate systems

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 12 comments.

File Description
py/ngff_zarr/v06/zarr_metadata.py New file defining v06 metadata dataclasses including CoordinateSystem, Scale, Translation, TransformSequence, Dataset, and Metadata for RFC5 support
py/ngff_zarr/to_ngff_zarr.py Adds v06 metadata import and updates _pop_metadata_optionals to handle coordinateSystems structure; adds dimension_names extraction for v06
py/ngff_zarr/to_multiscales.py Changes imports to use v06 metadata classes and modifies dataset creation to use TransformSequence with input/output coordinate systems
py/ngff_zarr/from_ngff_zarr.py Updates metadata reading logic to detect and parse coordinateSystems; adds v06-specific metadata construction and dataset handling

from .rich_dask_progress import NgffProgress, NgffProgressCallback
from .to_ngff_image import to_ngff_image
from .v04.zarr_metadata import Axis, Dataset, Metadata, Scale, Translation
from .v06.zarr_metadata import coordinateSystem, Dataset, Metadata, Scale, Translation, TransformSequence, Axis
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import statement imports only v06 metadata classes, breaking compatibility with v04/v05 workflows. The line now imports from .v06.zarr_metadata import coordinateSystem, Dataset, Metadata, Scale, Translation, TransformSequence, Axis, which shadows the v04 imports that were previously used. This will cause issues when to_multiscales() is called in v04/v05 contexts since the returned Multiscales object will always have v06 Metadata, but downstream code may expect v04/v05 metadata. Consider making this version-aware or importing with aliases to avoid shadowing.

Suggested change
from .v06.zarr_metadata import coordinateSystem, Dataset, Metadata, Scale, Translation, TransformSequence, Axis
from .v04 import zarr_metadata as zarr_metadata_v04
from .v05 import zarr_metadata as zarr_metadata_v05
from .v06 import zarr_metadata as zarr_metadata_v06

Copilot uses AI. Check for mistakes.
Comment on lines +120 to +126
if "axes" in metadata:
dims = [a["name"] if "name" in a else a for a in metadata["axes"]]
elif "coordinateSystems" in metadata:
dims = [a["name"] for a in metadata["coordinateSystems"][0]["axes"]]
else:
dims = tuple(a["name"] if "name" in a else a for a in metadata["axes"])
from .v04.zarr_metadata import supported_dims
dims = list(reversed(supported_dims))
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type inconsistency: dims is constructed as a list at lines 121, 123, and 126, but is likely expected to be a tuple elsewhere in the codebase. In the original code (before changes), line 125 used tuple(reversed(supported_dims)) and line 125 (now deleted) used tuple(...). Consider wrapping the list assignments with tuple() to maintain type consistency, e.g., dims = tuple([a["name"] if "name" in a else a for a in metadata["axes"]]).

Copilot uses AI. Check for mistakes.

metadata_obj = Metadata(
datasets=datasets,
coordinateSystems=[metadata["coordinateSystems"][0]],
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential KeyError: accessing metadata["coordinateSystems"][0] without checking if the key exists or if the list is non-empty. While this code path is only executed when version.startswith("0.6"), and the dims extraction at line 122-123 suggests coordinateSystems exists, there's no guarantee that:

  1. The key still exists in metadata at this point (it could have been popped)
  2. The list is non-empty

Consider adding a check or using .get() with a default value to prevent runtime errors, e.g., coordinateSystems=metadata.get("coordinateSystems", [{}])[:1] if "coordinateSystems" in metadata else [].

Suggested change
coordinateSystems=[metadata["coordinateSystems"][0]],
coordinateSystems=[(metadata.get("coordinateSystems") or [{}])[0]],

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +17
input: Optional[Union[None, str, coordinateSystem]] = None
output: Optional[Union[None, str, coordinateSystem]] = None
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type annotation Optional[Union[None, str, coordinateSystem]] is redundant. Optional[X] already means Union[X, None], so including None explicitly in the Union creates a double-optional. This should be either Optional[Union[str, coordinateSystem]] or Union[None, str, coordinateSystem] (though the former is more idiomatic).

Copilot uses AI. Check for mistakes.
axes = [Axis(name=axis, type=type_dict[axis]) for axis in metadata["axes"]]
elif "coordinateSystems" in metadata:
axes = [
Axis(name=axis["name"], type="space") # Default to space if type not given
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing axis type information handling for v06 metadata. The code defaults all axes to type="space" when reading from coordinateSystems, but this loses important type information (time, channel, space distinctions). The v06 spec likely includes type information in the axes. Consider reading the type from axis.get("type", "space") if available, similar to how v04/v05 handles it.

Suggested change
Axis(name=axis["name"], type="space") # Default to space if type not given
Axis(name=axis["name"], type=axis.get("type", "space")) # Use type if present, else default to space

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +65
if "axes" in metadata_dict:
for ax in metadata_dict["axes"]:
if ax["unit"] is None:
ax.pop("unit")

# Handle RFC 4: Remove orientation if RFC 4 is not enabled
if not is_rfc4_enabled(enabled_rfcs) and "orientation" in ax:
ax.pop("orientation")
# Handle RFC 4: Remove orientation if RFC 4 is not enabled
if not is_rfc4_enabled(enabled_rfcs) and "orientation" in ax:
ax.pop("orientation")

if "coordinateSystems" in metadata_dict:
for cs in metadata_dict["coordinateSystems"]:
for ax in cs["axes"]:
if ax.get("unit") is None:
ax.pop("unit")
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent unit handling: uses ax["unit"] at line 54 but ax.get("unit") at line 64. For coordinateSystems axes, the code safely uses .get() to avoid KeyError if "unit" is missing, but for regular axes it directly accesses ax["unit"] which could raise KeyError. Consider using ax.get("unit") for consistency and safety at line 54 as well, or ensure the dict access is safe.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,55 @@
from dataclasses import dataclass
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing SPDX license headers. All Python source files in this repository should include the standard SPDX license headers at the top:

# SPDX-FileCopyrightText: Copyright (c) Fideus Labs LLC
# SPDX-License-Identifier: MIT

This is consistent with other files like py/ngff_zarr/v04/zarr_metadata.py and py/ngff_zarr/v05/zarr_metadata.py.

Copilot uses AI. Check for mistakes.
from ..v04.zarr_metadata import Axis, Omero, MethodMetadata

@dataclass
class coordinateSystem:
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class name coordinateSystem does not follow PEP 8 naming conventions. Class names should use CapWords (PascalCase) convention. This should be renamed to CoordinateSystem to be consistent with other class names like Scale, Translation, and TransformSequence in this file, as well as with existing v04/v05 classes like Axis, Dataset, and Metadata.

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +26
input: Optional[Union[None, str, coordinateSystem]] = None
output: Optional[Union[None, str, coordinateSystem]] = None
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type annotation Optional[Union[None, str, coordinateSystem]] is redundant. Optional[X] already means Union[X, None], so including None explicitly in the Union creates a double-optional. This should be either Optional[Union[str, coordinateSystem]] or Union[None, str, coordinateSystem] (though the former is more idiomatic).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants