Category: A1; Team name: Guris; Dataset: ogbn-arxiv and ogbn-products #232

alexsandro-santos · 2025-11-20T23:27:49Z

Checklist

My pull request has a clear and explanatory title.
My pull request passes the Linting test.
I added appropriate unit tests and I made sure the code passes all unit tests. (refer to comment below)
My PR follows PEP8 guidelines. (refer to comment below)
My code is properly documented, using numpy docs conventions, and I made sure the documentation renders properly.
I linked to issues and PRs that are relevant to this PR.

Description

This pull request integrates two datasets from the Open Graph Benchmark (OGB) into the TopoBench framework: ogbn-arxiv and ogbn-products. Both datasets are homogeneous, single-label node classification tasks, which makes them fully compatible with TopoBench’s current graph pipeline.

The first dataset, ogbn-arxiv, is a directed citation network of computer science papers, where each node corresponds to a paper described by a 128-dimensional word-embedding feature vector, and the task is to predict one of forty subject areas. The second dataset, ogbn-products, is a large co-purchase graph where nodes represent Amazon products with 100-dimensional semantic features, and the task is to classify each product into one of forty-seven categories.

To support these datasets, this PR adds a unified dataset loader that handles feature casting and label formatting. Corresponding dataset configuration files have been added under configs/dataset/graph/, following the same structure and conventions used by the existing TopoBench datasets. Although this PR adds full support for ogbn-products, it is intentionally not used in test/pipeline/test_pipeline.py, because of its size.

Additional context

Although the OGBN collection contains five datasets in total, only ogbn-arxiv and ogbn-products are included in this PR. The remaining three datasets require functionality that is beyond TopoBench’s current architecture. ogbn-mag is a heterogeneous graph containing multiple node and edge types, which would require dedicated hetero-graph support to integrate properly. ogbn-proteins is a multi-label regression task with no node features, which does not align with TopoBench’s assumption of single-label node classification on feature-bearing graphs. Finally, ogbn-papers100M is extremely large and cannot be feasibly downloaded or tested in the CI environment. Because these datasets would require substantial structural additions or special-case handling, they are intentionally omitted from this PR to keep the scope clean and maintain alignment with the existing pipeline.

Co-authored by: @giovanni-br and @alexsandro-santos

Up to date with main repo

giovanni-br and others added 18 commits November 17, 2025 00:17

add ogbn

6948ad5

trying to fix errors

4739265

Merge remote-tracking branch 'upstream/main'

9d1b1a0

Up to date with main repo

update readme

e19fb6f

Document ogbn-arxiv dataset and fix AbstractLoader path

cb1483a

Adapted script name to requirements

6a921a9

making the loader general

2aaa932

fix loader

83f564d

fix loader

6632608

removing problem with interaction with the terminal

a17a44a

Adding ogbn-proteins dataset

39d01ba

Fixign datatype in yaml files

9e3dca9

Fixing dataset names to match repository convention

095988f

Fixing dimensionality issue

1383dd0

Attempt to fix ogbn-arxiv.yaml config

58aeb15

Deleting ogbn-proteins

859afdc

Fixing config files for ogbn datasets

1ccd71b

Removing unnecessary code from ogbn_dataset_loader

ec4553b

alexsandro-santos marked this pull request as ready for review November 20, 2025 23:35

Add test ogbn-arxiv in the pipeline

1061ea8

giovanni-br marked this pull request as draft November 21, 2025 13:43

making it lighter

ff71753

giovanni-br marked this pull request as ready for review November 21, 2025 16:07

Updating README.md

f7b5927

levtelyatnikov added the category-a1 Submission to TDL Challenge 2025: Mission A, Category 1. label Nov 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Category: A1; Team name: Guris; Dataset: ogbn-arxiv and ogbn-products #232

Category: A1; Team name: Guris; Dataset: ogbn-arxiv and ogbn-products #232

Uh oh!

alexsandro-santos commented Nov 20, 2025 •

edited by giovanni-br

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Category: A1; Team name: Guris; Dataset: ogbn-arxiv and ogbn-products #232

Are you sure you want to change the base?

Category: A1; Team name: Guris; Dataset: ogbn-arxiv and ogbn-products #232

Uh oh!

Conversation

alexsandro-santos commented Nov 20, 2025 • edited by giovanni-br Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Description

Additional context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexsandro-santos commented Nov 20, 2025 •

edited by giovanni-br

Loading