Skip to content

Conversation

@Snopoff
Copy link
Contributor

@Snopoff Snopoff commented Nov 24, 2025

Checklist

  • My pull request has a clear and explanatory title.
  • My pull request passes the Linting test.
  • I added appropriate unit tests and I made sure the code passes all unit tests. (refer to comment below)
  • My PR follows PEP8 guidelines. (refer to comment below)
  • My code is properly documented, using numpy docs conventions, and I made sure the documentation renders properly.
  • I linked to issues and PRs that are relevant to this PR.

Description

This PR adds three conjugated molecule datasets (OCELOTv1, OPV, PCQM4MV2) to TopoBench for hypergraph-based molecular property prediction tasks.

This implementation adapts the work from "Molecular Hypergraph Neural Networks" to the TopoBench framework. It ports the dataset preprocessing and hypergraph construction logic from the original mhnn repository while ensuring full compatibility with TopoBench's standardized data loaders and model interfaces.

Datasets Added

  1. OCELOTv1 - 25,249 organic chromophores with 15 molecular properties
  2. OPV - 80,823 organic photovoltaic molecules with 8 properties (includes polymer task variant: 44,335 molecules)
  3. PCQM4MV2 - 3.7M molecules with HOMO-LUMO gap predictions

Hypergraph structure

Conjugated bonds are natural higher-order structures that occur in molecules. The idea of representing molecules with conjugated bonds as hypergraphs first appeared, to the knowledge of authors, in Molecular Hypergraph Neural Networks.

In organic materials, conjugated systems determine key properties such as conductivity or reactivity. Higher-order representations capture electron delocalization that pairwise graphs cannot. Therefore this PR enables benchmarking such hypergraphs within the framework of TopoBench. This supports research in drug discovery and materials science.

References

@levtelyatnikov levtelyatnikov added the category-a2 Submission to TDL Challenge 2025: Mission A, Category 2. label Nov 25, 2025
@Snopoff Snopoff changed the title Category: A2; Team name: Snopoff; Dataset: ConjugatedMoleculeDataset Category: A2; Team name: ConjMol; Dataset: ConjugatedMoleculeDataset Nov 25, 2025
@Snopoff Snopoff changed the title Category: A2; Team name: ConjMol; Dataset: ConjugatedMoleculeDataset Category: A2; Team name: PI; Dataset: ConjugatedMoleculeDataset Nov 25, 2025
@Snopoff Snopoff changed the title Category: A2; Team name: PI; Dataset: ConjugatedMoleculeDataset Category: A2; Team name: IgPa; Dataset: ConjugatedMoleculeDataset Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category-a2 Submission to TDL Challenge 2025: Mission A, Category 2.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants