Skip to content

Conversation

@dleko11
Copy link

@dleko11 dleko11 commented Nov 22, 2025

Co-authored-by: luka-benic [email protected]
Co-authored-by: dleko11 [email protected]

Checklist

  • My pull request has a clear and explanatory title.
  • My pull request passes the Linting test.
  • I added appropriate unit tests and I made sure the code passes all unit tests. (refer to comment below)
  • My PR follows PEP8 guidelines. (refer to comment below)
  • My code is properly documented, using numpy docs conventions, and I made sure the documentation renders properly.
  • I linked to issues and PRs that are relevant to this PR.

Description

This PR extends TopoBench to support edge-level link prediction on both transductive and inductive graph datasets, and adds a tutorial notebook that illustrates how to use the new functionality.

Concretely, the PR introduces:

  • Edge-level split utilities for link prediction (transductive and inductive).
  • A dynamic negative sampling transform integrated into the dataloading pipeline.
  • A dedicated edge-level readout (LinkPredictionReadOut) for link prediction on top of existing GNN backbones (GCN, GAT).
  • Example dataset configurations for Cora (transductive), MUTAG (inductive), and PPI (inductive, predefined splits).
  • A tutorial notebook showing the full workflow end-to-end.

Key Changes (Code)

  • Edge-level splitting

    • load_edge_transductive_splits for single-graph / transductive datasets (e.g. Cora).
    • load_edge_inductive_splits for multi-graph / inductive datasets (e.g. MUTAG, PPI).
    • Both return DataloadDataset objects with:
      • edge_label_index, edge_label (positive and negative candidate edges),
      • consistent handling of val/test negatives vs train-time negatives.
  • Dynamic negative sampling

    • NegativeSamplingTransform in topobench.transforms.data_manipulations:
      • takes positive edges from edge_label_index,
      • samples fresh negatives via torch_geometric.utils.negative_sampling,
      • rebuilds edge_label_index / edge_label each epoch according to neg_pos_ratio and neg_sampling_method.
  • Edge-level readout

    • LinkPredictionReadOut in topobench.nn.readouts:
      • consumes node embeddings x_0 from the backbone,
      • scores candidate edges via dot products,
      • outputs 2-class logits (no-edge, edge) and attaches labels for the loss/evaluator.
  • PPI dataset support

    • New loader (based on torch_geometric.datasets.PPI) that:
      • loads the predefined train/val/test splits from PyG,
      • combines them into a single dataset with a split_idx mapping,
      • is compatible with the inductive edge-level splitting utilities.
  • Configuration-level support

    • task_level: edge and num_classes: 2 for link prediction.
    • Extended split_params for link prediction:
      • learning_setting (transductive / inductive),
      • val_prop, test_prop, train_prop,
      • is_undirected,
      • neg_pos_ratio (dynamic train negatives),
      • neg_sampling_ratio (static val/test negatives),
      • neg_sampling_method.

These changes plug into the existing TopoBench training pipeline without altering the high-level interface (Hydra configs + run.yaml).


Tutorial (Usage Example)

To illustrate the new link prediction support, this PR also adds:

  • tutorials/tutorial_link_prediction.ipynb

The notebook demonstrates:

  • Transductive and inductive link prediction setups demonstrated on the Cora and MUTAG datasets.
  • How the split utilities, negative sampling transform, and LinkPredictionReadOut interact in practice.
  • Running short GCN/GAT experiments and inspecting basic metrics and visualizations of positive/negative edges in the splits.

The tutorial is an example user guide for the new functionality; all core logic lives in the library code.

Issue

Additional context

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@luka-benic
Copy link

luka-benic commented Nov 23, 2025

We fixed some compatibility issues, namely we had a problem with the PPI dataset class from torch_geometric version 2.8.0. which was not compatible with networkx version 2.8.8.

@levtelyatnikov levtelyatnikov added the category-b2 Submission to TDL Challenge 2025: Mission B, Category 2. label Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category-b2 Submission to TDL Challenge 2025: Mission B, Category 2.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants