Category: B1; Team name: HT; Dataset: MUTAG #254

henrytsay · 2025-11-26T00:24:40Z

Checklist

My pull request has a clear and explanatory title.
My pull request passes the Linting test.
I added appropriate unit tests and I made sure the code passes all unit tests. (refer to comment below)
My PR follows PEP8 guidelines. (refer to comment below)
My code is properly documented, using numpy docs conventions, and I made sure the documentation renders properly.
I linked to issues and PRs that are relevant to this PR.

Description

This PR implements Challenge B1 by creating a scalable data loading pipeline for large-scale inductive learning settings using an OnDiskPreProcessor class that enables memory-efficient processing of large datasets.

Key Features

OnDiskPreProcessor Implementation (topobench/data/preprocessor/on_disk_preprocessor.py):

Inherits from torch_geometric.data.OnDiskDataset for on-disk storage
Processes raw data samples one at a time, saving each to disk immediately as data_{i}.pt
Supports topological lifting transformations via DataTransform integration
Bypasses memory bottlenecks by avoiding loading entire datasets into RAM
Maintains full compatibility with existing PreProcessor API and split loading methods

Configuration Support:

Added use_on_disk_preprocessing: true flag to dataset configs
Example config: MUTAG_ondisk.yaml
Integrated with topobench/run.py for seamless config-based switching

Comprehensive Testing:

23 unit tests in test/data/preprocess/test_on_disk_preprocessor.py
Tests cover: initialization, transforms, split loading, edge cases, and memory efficiency
End-to-end integration tests in test/pipeline/test_on_disk_pipeline.py
Validates model training with OnDiskPreProcessor using GCN on MUTAG dataset

henrytsay added 2 commits November 25, 2025 19:07

add OnDiskPreprocessor

cf3a3e7

Trigger CI

11ddef4

levtelyatnikov added the category-b1 Submission to TDL Challenge 2025: Mission B, Category 1. label Nov 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Category: B1; Team name: HT; Dataset: MUTAG #254

Category: B1; Team name: HT; Dataset: MUTAG #254

Uh oh!

henrytsay commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Category: B1; Team name: HT; Dataset: MUTAG #254

Are you sure you want to change the base?

Category: B1; Team name: HT; Dataset: MUTAG #254

Uh oh!

Conversation

henrytsay commented Nov 26, 2025

Checklist

Description

Key Features

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants