Category: A2; Team name: NREL-Insightcenter; Dataset: OMol25 Metals subset #244
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Checklist
Description
This pull request adds support for the OMol25Metals dataset, a metal–complex subset of the OMol25 benchmark, as a new higher-order molecular dataset in the hypergraph domain for the TAG-DS Topological Deep Learning Challenge 2025.
Dataset and features
OMol25 https://arxiv.org/abs/2505.08762 is one of the largest publicly available molecular datasets, and this PR focuses on the metal complex subset. In our integration:
Each sample is a metal-containing molecule with:
Graph-level targets and graph features:
We expose several OMol25 scalar quantities as graph-level features so users can define different regression tasks without rebuilding the dataset:
y: total energy term (used as the default scalar regression target in our config).nl_energyspinhomo_energyhomo_lumo_gapIn the current configuration, the dataset is set up as a graph-level regression benchmark on a single scalar. At the same time, the other entries above are stored as graph-level attributes, so users can:
These node, edge, face, and graph-level features are stored in the processed
data.ptfile so users can directly feed them into higher-order neural networks (for example GCCN-style models, cell-complex networks, or hypergraph architectures) without needing to reconstruct higher-order structure from the raw molecules.Issue
There is no existing GitHub issue associated with this contribution.
Additional context
Contributors
Data provenance and processing pipeline
https://huggingface.co/facebook/OMol25/blob/main/DATASET.md
Dataobjects (graph structure).processed/data.ptfile used byOMol25MetalsDataset.The processing code and documentation are hosted in our OMol25Metals pipeline repository:
ase2pyg.pyscript):https://github.com/demiqin/omol25_metals
This integration is intended to let users: