Category: B1 (Bonus); Team name: TG; Dataset: ogbn-products #250
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚀 B1 Bonus: On-Disk Transductive Learning with SQLite-Backed Structure Indexing
📌 Problem Statement
Challenge: Training Topological Neural Networks (TNNs) on large transductive graphs (100K+ nodes) with complex topological structures (cliques, cycles, etc.) faces fundamental memory limitations:
Real-World Impact: Popular benchmarks like
ogbn-products(2.4M nodes, 61M edges) are currently infeasible for TNNs due to these memory constraints.💡 Our Approach: Two-Strategy Solution
We developed two complementary strategies for memory-efficient transductive TNN training:
Strategy 1: Structure-Centric Sampling 🎯
Guarantee: 100% Structure Completeness
Key Innovation: Reverses traditional graph sampling—we sample topological structures (cliques) first, then derive the node set. This guarantees all sampled structures are 100% complete in the batch (no missing nodes).
Strategy 2: Extended Context Sampling 🌐
Near-Complete Structures with Topology-Aware Heuristics
Key Innovation: Uses graph topology (community detection) to sample dense regions, then adds context nodes to increase structure completeness. Distinguishes "core" vs "context" nodes for loss computation.
🏗️ Architecture & Workflow
Core Components
Training Workflow
✨ Key Innovations
1. SQLite-Backed Structure Index
2. Dual Sampling Strategies
3. Batch-Time Transform Application
4. Seamless Integration with Existing Pipeline
5. Budget-Aware Sampling
Tutorials
Note: Due to time reason I could not finish this PR. Though most things should work, there are potentially some small issues. After the challenge is officially over, I am going to add benchmarks, tests and thoroughly refactor the code.