-
Notifications
You must be signed in to change notification settings - Fork 68
Category: B2; Team name: NeuroTriangles; Dataset: A123CortexM (Mouse Auditory Cortex) #252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Category: B2; Team name: NeuroTriangles; Dataset: A123CortexM (Mouse Auditory Cortex) #252
Conversation
…o check and test the training. To be able to download dataset in the function 'download_file_from_link' in requests.get() verify parameter should be specified as False. Note also that currently the run script on the data doesn't run as it fails to download data even if verify parameter set to False
…ll_to_dict and process_mat. I have also modified download_file_from_link by specifying verify=False in requests.get()
…is flag in the config file
- Implement TriangleClassifier utility for extracting and classifying triangles - Add create_triangle_classification_task() to generate 7-class role predictions - Support triangle task via dataset loader (task_type='triangle_task') - Add config section with triangle task settings (7 classes, 3 features) - Include comprehensive test suite for triangle classification pipeline Added traingle common neighbours classification task: predicting the number of neighbours based on triangle intra-data: - Dataset creation - Support triangle task via dataset loader (task_type='triangle_common_task') - Add config section - Include cmprehensive test suite
… to the for the datasets (use for the triangle options) as the classes are imbalanced. Ideally, need to add class weights to the model and opt for a focal loss
…sed the number of samples for downsampling
… TriangleClassifier into utils for reusability and created a new class inhereting from it to specify it for a123 task
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
Hi @marindigen! It seems that the testing error can be easily fixed just by slightly renaming the newly introduced |
…ion. Move the test class to appropriate file. Note, that the same changes were done in the PR geometric-intelligence#241 (they are duplicated here, as the script wouldn't run otherwise and would require additional adaptation to the old download_file_from_link function.
|
The CI failed with a no space left on device error on the GitHub runner (infrastructure issue, not code-related), which could be due to the size of the dataset (~2 GB). What could be a possible solution to this? |
Dear @marindigen, one possible solution is to mock the data instead of downloading it. Please refer to PR:233 for the reference if needed. |
|
Hi again @marindigen! Could you please comment out (or turn to markdown) the content of |
|
Hi @gbg141 and @levtelyatnikov! Thank you for your patience and for accepting the submission as it is! I have converted the tutorial to the markdown file. |
Checklist
Description
This PR is a Category B2 (“Pioneering New TDL Benchmark Tasks”) submission to the TAG-DS Topological Deep Learning Challenge 2025.
It integrates the Bowen et al. (2024) mouse auditory cortex calcium-imaging dataset into TopoBench under the name:
and builds a small family of topology-aware benchmark tasks:
All of these are driven by a single dataset / loader pair and configured via the
specific_taskparameter in the dataset YAML.In addition, the PR introduces a generic triangle utility in
topobench.data.utils.triangle_classifierthat can be reused by other datasets.Dataset and graph construction
The underlying data come from:
Each recording session provides:
SigCorrs) and noise correlations (NoiseCorrsTrial),The dataset class:
topobench/data/datasets/a123.pyA123CortexMDataset(InMemoryDataset)performs the following steps:
Download & unpack
Uses
download_file_from_linkto fetch the “Auditory cortex data” archive and extract it underraw/.Session / layer extraction
For each
.matfile and each layer (1–5), it reads:SigCorrs(signal correlation matrix),NoiseCorrsTrial(trial-level noise correlations),BFInfo[layer]["BFval"](per-neuron best frequency).BF-bin subgraphs
Neurons are binned by BF into
n_bins(default 9). For each (session, layer, BF-bin) with at leastmin_neuronsneurons (default from config, 3 for tests):{session_file, session_id, layer, bf_bin, neuron_indices, corr, noise_corr}.Graph representation (
_sample_to_pyg_data)Each sample becomes a
torch_geometric.data.Datagraph with:Nodes: neurons in a single (session, layer, BF-bin).
Node features
x ∈ ℝ^{n×3}:mean_corr: mean signal correlation to others,std_corr: standard deviation of signal correlation,noise_diag: diagonal entries of the noise-correlation matrix (per-neuron noise level).Edges: undirected edges between neuron pairs whose signal correlation ≥
corr_threshold(configurable;corr_threshold: 0.2in the YAML). Edges are constructed from the upper triangle and symmetrised withto_undirected.Edge attributes: correlation weights on those edges.
Label
y: integer BF-bin in[0, num_classes − 1](confignum_classes: 9).Metadata:
session_id,layer.During
process(), graphs with no edges are filtered out, and the remaining graphs are collated and stored inprocessed/data.pt.The dataset behaviour is controlled by the YAML
configs/dataset/graph/a123.yaml.For CI we restrict to
num_graphs: 10to keep runtime reasonable; users can increase this for full experiments.Generic triangle utilities
To make triangle-based benchmarks reusable across datasets, this PR adds:
topobench/data/utils/triangle_classifier.pywith the base class:
The base methods provide:
_classify_roleand_role_to_label(which are intentionally left abstract).This utility is then specialised for the auditory cortex dataset, but can be reused by other TopoBench datasets to define their own triangle-based tasks.
A123-specific triangle classifier
In
topobench/data/datasets/a123.pywe define:This subclass implements domain-specific logic for auditory cortex correlation graphs with the data appropriate classes based on the number of neighbors and the weights class.
Tasks
The dataset’s
process()method always builds the graph dataset and then inspects:to optionally build triangle-level tasks:
1. Graph-level BF-bin classification (
specific_task: classification)task_level: graph,num_classes: 9,loss_type: cross_entropy.This is a standard graph classification benchmark on correlation graphs, suitable as a baseline and as input for higher-order liftings.
2. Triangle role classification (
specific_task: triangle_classification)Implemented in:
A123CortexMDataset._extract_triangles_from_graphs()A123CortexMDataset.create_triangle_classification_task()Step 1 – Triangle extraction
_extract_triangles_from_graphs():Iterates over all graphs in the dataset,
Builds a NetworkX graph
Gfor each (with edge weights from signal correlations),Uses
TriangleClassifier.enumerate_triangles(G)andclassify_and_weight_triangles()to obtain triangle dicts with:nodes:(a, b, c),edge_weights:[w_ab, w_bc, w_ac],role: role string,label: 0–8.A list of raw triangle records is collected, each with:
graph_idx,tri(triangle dict),G,num_nodes.Step 2 – Building the triangle dataset
create_triangle_classification_task()converts these intotorch_geometric.data.Dataobjects:x ∈ ℝ^{1×3}: the three edge weights (purely topological/functional – no node features or BF info),y: integer role label in{0, …, 8},nodes,role,graph_idx.This defines a triangle-level classification benchmark targeting 2-simplex motif roles.
3. Triangle common-neighbour prediction (
specific_task: triangle_common_neighbors)Implemented in:
A123CortexMDataset.create_triangle_common_neighbors_task()Here we focus on a purely structural topological quantity: the number of common neighbours of each triangle.
For each triangle
(a, b, c)in the raw list:Compute the set of common neighbours:
Define the label:
Exact common-neighbour count, capped at 8:
Define the features:
Node degrees of the triangle vertices in
G:So each triangle sample has
x ∈ ℝ^{1×3}(degrees) andy ∈ {0, …, 8}(binned common neighbours).This gives a triangle-level classification task where labels are higher-order topological statistics (coface-like information), and features are structural (degrees), avoiding direct leakage of the label.
Loader and task selection
The loader:
topobench/data/loaders/graph/a123_loader.pyA123DatasetLoader(AbstractLoader)does:
Reads
data_nameandspecific_taskfromparameters.Constructs
A123CortexMDataset(root, name, parameters).Depending on
specific_task:classification→ Uses the default graph dataset from
processed/data.pt.triangle_classification→ Loads triangle dataset from
processed/data_triangles.pt(if it exists) and assigns it toself.dataset.data/self.dataset.slices.triangle_common_neighbors→ Loads triangle CN (common neighbors) dataset from
processed/data_triangles_common_neighbors.pt.If the triangle files are missing, the loader emits a clear warning suggesting to ensure that the dataset has been processed with the appropriate
specific_task.This keeps the one loader per PR rule, while making triangle tasks selectable via configuration.
Tests and pipeline integration
To satisfy the challenge requirements:
Unit tests
test/data/load/test_a123_dataset.pychecks:triangle_classification/triangle_common_neighbors(e.g. shapes ofx, valid label ranges, non-empty datasets in test settings).Pipeline test
test/pipeline/test_pipeline.pyis extended with a configuration that:specific_task: triangle_classificationandnum_graphs: 10),max_epochs=2,train/accuracy,val/accuracy,test/accuracy, macroprecision,recall, andF1.This demonstrates that the entire training pipeline runs successfully on the new benchmark task. Performance is not tuned; the goal is compatibility and coverage.
Coverage
Tests exercise:
A123CortexMDataset.process,A123DatasetLoader.load_datasetfor multiplespecific_tasksettings,topobench.data.utils.triangle_classifierutility.This helps maintain the ≥93% Codecov target.
Why this is a useful B2 benchmark
This contribution adds:
Key points for TDL:
These are exactly the kind of questions where simplicial networks, cell-complex networks, and hypergraph networks should shine compared to edge-only GNNs:
Because everything is driven through a single YAML (
specific_taskswitch) and a reusable triangle utility, the benchmark is also extensible: other datasets can plug intotopobench.data.utils.triangle_classifierand define their own domain-specific triangle roles or CN-style tasks.Limitations and future directions
num_graphs: 10; full experiments on the whole dataset will likely reveal richer distributions of motif types and CN counts.Relation to previous work
This PR builds on my earlier contributions to data loading and streaming in TopoBench (previous PR:
#241), but is focused on the new TDL benchmark tasks (Category B2), with an emphasis on higher-order structure in functional brain networks.References