Category: A1; Team name: ProofTruth; Dataset: Metamath #249

jableable · 2025-11-25T16:36:50Z

Checklist

My pull request has a clear and explanatory title.
My pull request passes the Linting test.
I added appropriate unit tests and I made sure the code passes all unit tests. (refer to comment below)
My PR follows PEP8 guidelines. (refer to comment below)
My code is properly documented, using numpy docs conventions, and I made sure the documentation renders properly.
I linked to issues and PRs that are relevant to this PR.

Description

This pull request adds the Metamath proof-graph dataset for the TAG-DS TopoBench Challenge 2025 (Category A1). The dataset is distributed as a single data.pt file hosted on Hugging Face and contains 20,000 graphs derived from the first 10,000 theorems in the ~45k-theorem Metamath database.

This dataset provides a large-scale, real-world benchmark for graph neural networks on formal reasoning, with deep directed acyclic graph (DAG) structure, long-range dependencies, and a highly non-uniform label distribution reflective of real mathematical theorem usage.

Dataset Structure

Each of the 10k theorems appears twice:

1. Theorem Graph (statement only). A small DAG containing:

hypothesis nodes
a conclusion node
no proof steps

2. Proof Graph (full derivation). A full proof DAG containing:

hypothesis nodes
all intermediate proof steps
the same conclusion node

Train/Val/Test Split Design

All 10,000 theorem-only graphs (indices 0–9,999) are always included in the training set.
Proof graphs (indices 10,000–19,999) undergo fixed random seed 80/10/10 split.
Only proof graphs appear in validation and test splits.

Task: Node-Level Classification

The model must predict, for each node, the justification label (the theorem used at that proof step).

Label space: 3,557 labels (all labels with <5 occurrences in the training set are mapped to an UNK class)
Conclusion masking: In all proof graphs, the final proof step has its embedding zeroed out, forcing the model to infer the correct conclusion from graph context.
Node features: 768‑dimensional CodeBERT embeddings for each Metamath statement.

References:

[1] Metamath Official Site - https://us.metamath.org/index.html
[2] Huggingface Dataset - https://huggingface.co/datasets/jableable/metamath-proof-graphs

jableable added 2 commits November 25, 2025 10:05

Add Metamath dataset, loader, and pipeline test

b92ef0e

Fix lint issues

7ef598f

levtelyatnikov added the category-a1 Submission to TDL Challenge 2025: Mission A, Category 1. label Nov 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Category: A1; Team name: ProofTruth; Dataset: Metamath #249

Category: A1; Team name: ProofTruth; Dataset: Metamath #249

Uh oh!

jableable commented Nov 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Category: A1; Team name: ProofTruth; Dataset: Metamath #249

Are you sure you want to change the base?

Category: A1; Team name: ProofTruth; Dataset: Metamath #249

Uh oh!

Conversation

jableable commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Description

Dataset Structure

Train/Val/Test Split Design

Task: Node-Level Classification

References:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jableable commented Nov 25, 2025 •

edited

Loading