-
Notifications
You must be signed in to change notification settings - Fork 49
account for endian
being possibly None
#787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
are you able to share this dataset? If not, what dtype is this for? I'm wondering if it's a single byte dtype where the configuration should be empty. I'm reluctant to accept this change because an endian of None seems to be out-of-spec. |
Not sure, I'll have to find a public version of it.
these are single-byte dtypes (in my case,
which I assume is the default for this dtype? |
from the v3 spec:
|
but that's describing a JSON object, not an in-memory python object. in zarr python, our in-memory representation of the bytes codec uses endian= |
not exactly the same dataset, but this also fails with that error: from obstore.store import from_url
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from virtualizarr import open_virtual_dataset
url = "https://data-cersat.ifremer.fr/data/sea-surface-temperature/odyssea/l4/glob/nrt/data/v2.1/2021/001/20210101000000-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v02.1-fv01.0.nc"
store = from_url(url)
registry = ObjectStoreRegistry({url: store})
vds = open_virtual_dataset(
url=url,
loadable_variables=[],
parser=HDFParser(),
registry=registry,
)
display(vds)
vds.vz.to_kerchunk("refs.parquet", format="parquet") |
in digging through the zarr code, I noticed that BytesCodec.from_dict({"name": "bytes", "configuration": {}})
BytesCodec.from_dict({"name": "bytes"}) both get the default system endian because that's the default for BytesCodec.from_dict({"name": "bytes", "configuration": {"endian": None}}) (but that would be against the spec, right?) Also: from zarr.codecs import BytesCodec
codec = BytesCodec(endian=None)
roundtripped = BytesCodec.from_dict(codec.to_dict())
assert codec.endian == roundtripped.endian fails for the same reason. @d-v-b, any idea on what the intended behavior should be? |
Not sure if there's a better fix for this, but the `getattr` call basically guarantees that if `codec` does not have the `endian` attribute the last condition is never reached.
the failing RTD build looks unrelated (it fails while creating the env) |
We just ran into this and was about to open the same PR, thanks for working on this @keewis. It'll be great if this gets merged and we get a new release since this is a blocker for all earthaccess virtualizarr workflows. cc @maxrjones |
For some reason a dataset I've been working on produces
BytesCodec
instances whereendian
isNone
, which causes the translation from v3 to v2 metadata to fail.No tests because I didn't check thoroughly on where these should be.
(I've seen that you're planning to migrate to v3 entirely, in which case this won't be necessary anymore. It's such a simple fix, though, that this might still be worth it)
docs/releases.rst
api.rst