This repository provides the reproducible code for our paper "Expressive Temporal Specifications for Reward Monitoring" for AAAI 2026.
Designing informative reward functions is one of the main bottlenecks in reinforcement learning: sparse or badly shaped rewards slow down learning and can push agents towards undesirable behaviours.
This work uses quantitative Linear Temporal Logic on finite traces (
Key advantages:
- Expressive specifications: temporal logic over finite traces with quantitative operators, supporting both reachability and safety objectives and non-Markovian goals, expressed in a formal yet relatively simple, concise manner.
- Automatic monitor synthesis: monitors are constructed in time linear in the size of the formula, with correctness and complexity guarantees.
- Safety-aware monitoring: safety specifications can permanently clamp rewards after a violation, blocking positive returns from other monitors.
- Algorithm-agnostic integration: monitors are composed with the environment as a product MDP; any standard RL algorithm can be used on this extended environment.
- Clone with git
git clone --recurse-submodules --shallow-submodules https://github.com/nightly/quantitative-reward-monitoring - Ensure GraphViz is installed on your system (used for export visualisations)
- Use requirements.txt with pip
pip install -r requirements.txtor, preferablyuv syncif you have Astral's uv installed
quantitative-reward-monitoring/
├── experiments/ # Jupyter notebooks and experiment scripts
│ ├── visualise/ # Plotting utilities and visualisations
├── src/
│ ├── environments/ # Wrappers and helpers for different RL environments
│ │ ├── toy/ # Cliff Walking, Frozen Lake, Taxi gridworlds
│ │ ├── safety_gridworlds/# Conveyor Belt, Island Navigation, Sokoban tasks
│ │ ├── classic_control/ # Acrobot, Cartpole, Mountain Car, Pendulum
│ │ ├── box2d/ # Bipedal Walker, Lunar Lander
│ ├── reward_monitors/
│ │ ├── bool/ # Boolean monitor semantics
│ │ │ ├── syntax/ # Safe LTLf checking (syntactically)
│ │ │ └── …
│ │ ├── quantitative/ # Quantitative monitor semantics
│ │ │ ├── syntax/ # Safe LTLf[F] checking (syntactically)
│ │ │ └── …
│ │ └── … # Shared utilities for monitors
│ ├── rl/
│ │ ├── q_learning.py # Tabular Q‑learning implementation
│ │ ├── ppo_continuous_action.py # PPO for continuous actions
│ │ └── … # Other RL algorithms & helpers (including convergence checking)
│ └── core/ # Core utilities
└── …
To cite the paper or code where useful in your publications, a BibTeX citation is provided as follows:
@article{adalat2025expressive,
title={Expressive Temporal Specifications for Reward Monitoring},
author={Adalat, Omar and Belardinelli, Francesco},
journal={arXiv preprint arXiv:2511.12808},
year={2025}
}Some of the original code & ideas were contributed by:
- Jialue Xu
- Alexander Philip Rader