This repository contains a large-scale temporal question answering dataset designed for evaluating and training language models on temporal reasoning tasks. The dataset consists of question-answer pairs with a focus on temporal aspects, covering a wide range of events and entities from 1987 to 2023.
- Size: The dataset comprises 100,228,457 question-answer pairs, making it one of the largest temporal question answering datasets available.
- Question Types: Questions are categorized based on their complexity, including easy and hard questions, each designed to test different levels of temporal reasoning and understanding.
- Content: The dataset covers a diverse range of events and entities, sourced from Wikipedia and Wikidata, ensuring a rich and varied set of questions for evaluation.
- Metadata: Each question-answer pair includes additional metadata, such as entity/event IDs, question difficulty ratings, and temporal attributes, providing valuable information for analysis and model evaluation.
| Name | Total |
|---|---|
| Attribute Event | 83,798 |
| Attribute Entity | 84,079 |
| Attribute Time | 9,454 |
| Comparison Event | 25,353,340 |
| Comparison Entity | 74,678,117 |
| Comparison Time | 54,022,952 |
| Counting Event | 18,325 |
| Counting Entity | 10,798 |
| Counting Time | 12,732 |
| Multi-Hop: | 76,933 |
| Unnamed Event: | 8,707,123 |
| Total: | 100,228,457 |
- Performance Evaluation: The dataset can be used to evaluate the performance of language models on temporal reasoning tasks, including across-time comparison, event/entity detection, and multi-hop reasoning.
- Fine-Tuning: Researchers can leverage this dataset for fine-tuning language models, enhancing their temporal reasoning capabilities and performance on similar tasks.
- Download: The dataset is available at Hugging Face
- Testing Dataset: A small version for testing purpose is available here
This project contains Python scripts designed to generate various types of questions based on event data. The scripts read event attributes from a database, construct questions, and store them back in the database.
- Python 3.x
psycopg2for PostgreSQL database interactionrequestsfor HTTP requestsconfigparserfor reading database configurationSPARQLWrapperfor executing SPARQL queries
- Clone the repository:
git clone <repository_url> cd <repository_folder>
- Install the required Python packages:
pip install psycopg2 configparser pandas
- Configure the database connection:
- Create a
database.inifile with the following format:[postgresql] host=your_host database=your_database user=your_user password=your_password
- Create a
- Ensure your database is set up and populated with the required data.
- Run the question generation files for the desired type of question
Please kindly cite our paper if helps your research:
@inproceedings{gruber-etal-2025-complextempqa,
title = "{C}omplex{T}emp{QA}: A 100m Dataset for Complex Temporal Question Answering",
author = {Gruber, Raphael and
Abdallah, Abdelrahman and
F{\"a}rber, Michael and
Jatowt, Adam},
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-main.463/",
pages = "9111--9123",
ISBN = "979-8-89176-332-6",
}

