How to spatially concatenate (along latitude and longitude) multiple Kerchunk-referenced NetCDF files into a valid mosaic.json #564
Unanswered
PouriaRezz
asked this question in
Q&A
Replies: 2 comments
-
Just an idea, not tested :
source_directory = Path(source_directory)
reference_file_paths = list(source_directory.glob(pattern))
reference_file_paths = list(map(str, reference_file_paths))
from kerchunk.combine import MultiZarrToZarr
mzz = MultiZarrToZarr(
reference_file_paths,
concat_dims=["time"],
identical_dims=["total_latitude", "total_longitude"],
)
multifile_kerchunk = mzz.translate()
combined_reference_filename = Path(combined_reference)
local_fs = fsspec.filesystem("file")
with local_fs.open(combined_reference_filename, "wb") as f:
f.write(ujson.dumps(multifile_kerchunk).encode()) ? |
Beta Was this translation helpful? Give feedback.
0 replies
-
To be sure: MZZ should cope with concatenation along multiple dimensions like this case of tiling, and not just along time. Perhaps you can run # note that we concatenate on coords, but times are labeled identical
mzz = MultiZarrToZarr(mzz = MultiZarrToZarr(
reference_file_paths,
concat_dims=["latitude", "longitude"],
identical_dims=["time"],
)
mzz.first_pass() # populates mzz.coos withvalues inferred from all the inputs
mzz.store_coords() # writes output arrays and see what happens. Also, the logger "kerchunk.combine" might tell you more. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I’m working with a large dataset , each one covering a distinct spatial tile (no overlaps), but all sharing the same time axis. Each file has shape
(281, 168, 168)
corresponding to(time, latitude, longitude)
.Each NetCDF file has been individually converted to a Kerchunk JSON reference using
kerchunk.hdf.SingleHdf5ToZarr
.The structure is always the same across files. For example:
Note:
Goal
I want to build a single valid
mosaic.json
that allows me to open the full spatial-temporal dataset.The desired result is a dataset with shape:
(time=281, latitude=TOTAL_LAT, longitude=TOTAL_LON)
where TOTAL_LAT and TOTAL_LON come from the concatenation of all disjoint tiles.
What I’ve Tried
kerchunk.combine.MultiZarrToZarr
works well for time-based concatenation, but it assumes that the coordinate variables like["latitude", "longitude"]
are identical across files, which is not the case here.Questions
Beta Was this translation helpful? Give feedback.
All reactions