Skip to content

Adding option to return binary data #1285

@JPBergsma

Description

@JPBergsma

@giovannipizzi @ml-evs

For the upcoming trajectories endpoint, it would be nice to support a binary output format.

I made a small test script and managed to return the Cartesian site coordinates in the hdf5 format, which is much faster than with the JSON format. Although in the test case, there was no overhead from processing the query and such.
If we are using the JSON converter from FAST API, the JSON converter is called for each element of a list, which makes it quite slow. In addition, the numerical values have to be converted to strings, which is not necessary for the hdf5 format.

So I thought it would be good to implement the hdf5 format.
On closer inspection, I however noticed that the hdf5 format does not seem to support nested dictionaries like we use in the data.
A possible solution would be to use a dictionary instead of a list. For example, by replacing the indexes by a number or the id/name.
This would however be a deviation from the Optimade standard.

I was therefore wondering whether you think this is a problem, and if so, whether you know of other file formats that are able to efficiently store binary numbers while maintaining the Optimade data structure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions