Skip to content

Conversation

@joshmoore
Copy link
Member

@joshmoore joshmoore commented May 16, 2025

This is the implementation of ZEP10 which introduces a generic extensions object.

screenshot of ZEP10

        ☝🏽 👉🏽 Rendered ZEP10 page: https://zeps--67.org.readthedocs.build/en/67/draft/ZEP0010.html

🗓️ Proposed Timeline for ZEP Feedback and Approval

The @zarr-developers/steering-council is proposing the following timeline for community review and approval of this ZEP. The goal is to hold a vote within 30 days, with the policy that non-response will be interpreted as approval.

📍 Timeline

Date Range Milestone Description
Day 0 (May 16) 📝 Initial Draft Published PR submitted to the ZEP repository. Announcement sent to Zarr community channels (zulip, social media).
Days 1–7 💬 Open Community Review All are invited to read, comment, and suggest changes to the proposal.
Days 8–14 ✍️ Revisions & Response to Feedback Proposer integrates feedback, publishes a revised draft, and posts a summary of changes.
Days 15–20 📣 Final Call for Feedback Reminder sent to all stakeholders with the finalized draft and voting instructions.
Days 21–30 Voting Period Stakeholders cast their vote via GitHub comments or discussion. Silence will be considered approval.
Day 31 📦 Final Resolution Voting closes. Results are posted, and the ZEP is updated with its final status.
Day 31 Summer break Timeline reset for summer vacation, etc. TBD after revision is posted.

If this PR is merged, then Zarr 3.2 will be released and ZEP10 will be moved to the "accepted" status.


✅ Vote Tracking

Please indicate your stance on this ZEP by commenting below.

Name GitHub Handle Vote Notes (optional)
Josh Moore @joshmoore Author Zarr Steering Council
Norman Rzepka @normanrz Author Zarr Steering Council
Alistair Miles @alimanfoo Zarr Steering Council
Ryan Abernathey @rabernat Zarr Steering Council
John Kirkham @jakirkham Zarr Steering Council
Joe Hamman @jhamman Zarr Implementation Council (zarr-python)
Jeremy Maitin-Shepard @jbms Zarr Implementation Council (tensorstore)
Trevor Manz @manzt Zarr Implementation Council (zarr.js)
Fabian Gans @meggart Zarr Implementation Council (Zarr.jl)
Stephan Saalfeld @axtimwalde Zarr Implementation Council (n5)
Ward Fisher @WardF Zarr Implementation Council (netcdf)
Lachlan Deakin @LDeakin Zarr Implementation Council (zarrs)

Legend:
✅ Approve | ❌ Disapprove | 🤔 Abstain | 👍🏽 Endorse (for non-voters) | 👎 Object (for non-voters)

You are welcome to vote or endorse early, but please be aware that subsequent pushes to the PR may substantially change the proposal.


If you have any questions or concerns, please comment on this thread or contact the ZEP editors.

@alimanfoo
Copy link
Member

Hi @joshmoore, I have a question about the processing model for generic extensions on groups. I can see two options:

Option 1 "isolated" - Generic extensions apply to the node (group or array) on which they are declared and no other nodes.

Option 2 "inherited" - Generic extensions apply to the node on which they are declared and all descendant nodes.

To elaborate a little, under option 1, when an implementation reads the metadata for any given node, then it does not need to read the metadata for any ancestor nodes. I.e., implementations can always assume that each node can safely be accessed and interpreted in isolation from any other nodes.

Under option 2, when an implementation reads the metadata for any given node, it would also have to read the metadata for all ancestor nodes. It would then have to collect all generic extension declarations found on the node and its ancestors, and apply them all in some order, when interpreting the given node.

FWIW I see a number of potential difficulties with option 2, but I'll wait to check first if these two options are clear and if you had an intention regarding one or the other.

@jbms
Copy link
Contributor

jbms commented May 16, 2025

Hi @joshmoore, I have a question about the processing model for generic extensions on groups. I can see two options:

Option 1 "isolated" - Generic extensions apply to the node (group or array) on which they are declared and no other nodes.

Option 2 "inherited" - Generic extensions apply to the node on which they are declared and all descendant nodes.

To elaborate a little, under option 1, when an implementation reads the metadata for any given node, then it does not need to read the metadata for any ancestor nodes. I.e., implementations can always assume that each node can safely be accessed and interpreted in isolation from any other nodes.

Under option 2, when an implementation reads the metadata for any given node, it would also have to read the metadata for all ancestor nodes. It would then have to collect all generic extension declarations found on the node and its ancestors, and apply them all in some order, when interpreting the given node.

FWIW I see a number of potential difficulties with option 2, but I'll wait to check first if these two options are clear and if you had an intention regarding one or the other.

The document discusses this a bit under Application to sub-nodes but I agree it merits more discussion.

I agree that inheriting by default is problematic because it forces you to read all ancestor metadata in all cases.

It seems like there are two possible solutions --- explicitly adding an additional extension to add descendants, or some generic metadata field, e.g. "depends_on_parent_metadata": true that says: "you need to read the parent group metadata also". Personally I think I'm in favor of "depends_on_parent_metadata": true since that avoids needing separate logic for each extension to do essentially the same thing.

Some extensions (like storage transformers) might make it impossible to access the zarr.json metadata of descendants in the normal way at all, e.g. because descendants are contained within a zip file. However, it could still be possible for a user to directly access a child e.g. via some URL syntax like file:/path/to/archive.zip|zip:path/to/array/. In that case, it may or may not be correct to set depends_on_parent_metadata: true and possibly a different field would also be needed.

@joshmoore
Copy link
Member Author

Definitely agreed that it's a tricky part of this (and why we might even consider moving this whole conversation thread to its own location). If we don't think we can specify the possible ways that unknown extensions might need to refer to siblings (my assumption), then perhaps we can make use of extensibility itself to allow prototyping different mechanisms (my hope).

@normanrz
Copy link
Member

My thinking is that we don't need to define the behavior of application to sub nodes in this PR and leave it open to the extension specs to define their own behavior. See https://github.com/zarr-developers/zeps/pull/67/files#r2102632220

@joshmoore
Copy link
Member Author

A heads up that most of the discussion is currently happening on zarr-developers/zeps#67. I've pushed a few clarification committees there that haven't yet triggered any new committees here.

@joshmoore
Copy link
Member Author

I had hoped to have some in person meetings to further socialize the general concept here. They didn't materialize. With the (Northern) summer period ramping up, I don't think the aggressive timeline outlined above will be possible. I will summarize the discussion on zarr-developers/zeps#67 including a few questions that are lingering on my side with the hope of having an updated version of the ZEP early August for us to restart the clock.

@LDeakin
Copy link
Member

LDeakin commented Aug 9, 2025

@joshmoore I realised one potential thing after our chat yesterday about ZEP10 in regards to compatibility with Zarr 3.0.

It seems to be supported in this thread to have the following "must_understand" behaviour:

  • "must_understand": true: must understand for reading and writing,
  • "must_understand": false: must understand for writing only.

However, that conflicts with the below snippet from Zarr 3.0 if "must_understand": false generic extensions are top-level fields:

if the value of an unknown feature is an object containing the key-value pair "must_understand": false, it can be ignored.

If "extensions" are introduced as a JSON object instead of an array with "must_understand": true (implicitly is fine), then generic extensions really become an opt-in feature for Zarr 3.2 compatible implementations without breaking Zarr 3.0 compatibility. "extensions" need not be serialised if empty.

{
   "node_type": "group",
   ...
   "extensions": {
       "must_understand": true,
       "consolidated_metadata": {
           "must_understand": false,
           ...
       }
   }
}

Food for thought...

@jbms
Copy link
Contributor

jbms commented Aug 13, 2025

@joshmoore I realised one potential thing after our chat yesterday about ZEP10 in regards to compatibility with Zarr 3.0.

It seems to be supported in this thread to have the following "must_understand" behaviour:

  • "must_understand": true: must understand for reading and writing,
  • "must_understand": false: must understand for writing only.

However, that conflicts with the below snippet from Zarr 3.0 if "must_understand": false generic extensions are top-level fields:

if the value of an unknown feature is an object containing the key-value pair "must_understand": false, it can be ignored.

If "extensions" are introduced as a JSON object instead of an array with "must_understand": true (implicitly is fine), then generic extensions really become an opt-in feature for Zarr 3.2 compatible implementations without breaking Zarr 3.0 compatibility. "extensions" need not be serialised if empty.

{
   "node_type": "group",
   ...
   "extensions": {
       "must_understand": true,
       "consolidated_metadata": {
           "must_understand": false,
           ...
       }
   }
}

Food for thought...

That is a good point --- a possible solution would be to use "must_understand_for_reading": false and leave "must_understand": false to mean it may be ignored even for writing. That way existing implementations would safely fail to open the zarr node if it specifies "must_understand_for_reading": false, similar to what would happen if they encounter the separate extensions object. We could say that new uses of "must_understand": false are discouraged and registered attributes should be used instead.

@joshmoore
Copy link
Member Author

Closing following this conversation: zarr-developers/zeps#67 (comment)

@joshmoore joshmoore closed this Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants