Quantitative Reward Monitors for Reinforcement Learning

This repository provides the reproducible code for our paper "Expressive Temporal Specifications for Reward Monitoring" for AAAI 2026.

Designing informative reward functions is one of the main bottlenecks in reinforcement learning: sparse or badly shaped rewards slow down learning and can push agents towards undesirable behaviours.

This work uses quantitative Linear Temporal Logic on finite traces ($\mathrm{LTL}_f[\mathcal{F}]$) to define Quantitative Reward Monitors (QRMs), which are a type of finite state machine with registers. Each monitor is synthesized from a temporal specification and returns a scalar "degree of satisfaction" at every time step for a given formula, rather than a single Boolean outcome. This provides dense, temporally aware feedback that guides agents more efficiently towards the specified behaviour.

Key advantages:

Expressive specifications: temporal logic over finite traces with quantitative operators, supporting both reachability and safety objectives and non-Markovian goals, expressed in a formal yet relatively simple, concise manner.
Automatic monitor synthesis: monitors are constructed in time linear in the size of the formula, with correctness and complexity guarantees.
Safety-aware monitoring: safety specifications can permanently clamp rewards after a violation, blocking positive returns from other monitors.
Algorithm-agnostic integration: monitors are composed with the environment as a product MDP; any standard RL algorithm can be used on this extended environment.

Setup instructions

Clone with git git clone --recurse-submodules --shallow-submodules https://github.com/nightly/quantitative-reward-monitoring
Ensure GraphViz is installed on your system (used for export visualisations)
Use requirements.txt with pip pip install -r requirements.txt or, preferably uv sync if you have Astral's uv installed

Code structure

quantitative-reward-monitoring/
├── experiments/               # Jupyter notebooks and experiment scripts
│   ├── visualise/            # Plotting utilities and visualisations
├── src/ 
│   ├── environments/         # Wrappers and helpers for different RL environments
│   │   ├── toy/              # Cliff Walking, Frozen Lake, Taxi gridworlds
│   │   ├── safety_gridworlds/# Conveyor Belt, Island Navigation, Sokoban tasks
│   │   ├── classic_control/  # Acrobot, Cartpole, Mountain Car, Pendulum
│   │   ├── box2d/            # Bipedal Walker, Lunar Lander
│   ├── reward_monitors/
│   │   ├── bool/             # Boolean monitor semantics
│   │   │   ├── syntax/       # Safe LTLf checking (syntactically)
│   │   │   └── …
│   │   ├── quantitative/     # Quantitative monitor semantics
│   │   │   ├── syntax/       # Safe LTLf[F] checking (syntactically)
│   │   │   └── …
│   │   └── …                 # Shared utilities for monitors
│   ├── rl/                  
│   │   ├── q_learning.py     # Tabular Q‑learning implementation
│   │   ├── ppo_continuous_action.py  # PPO for continuous actions
│   │   └── …                 # Other RL algorithms & helpers (including convergence checking)
│   └── core/                 # Core utilities
└── …

Cite

To cite the paper or code where useful in your publications, a BibTeX citation is provided as follows:

@article{adalat2025expressive,
  title={Expressive Temporal Specifications for Reward Monitoring},
  author={Adalat, Omar and Belardinelli, Francesco},
  journal={arXiv preprint arXiv:2511.12808},
  year={2025}
}

Credits

Some of the original code & ideas were contributed by:

Jialue Xu
Alexander Philip Rader

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
experiments		experiments
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quantitative Reward Monitors for Reinforcement Learning

Setup instructions

Code structure

Cite

Credits

About

Uh oh!

Releases

Packages

Languages

nightly/quantitative-reward-monitoring

Folders and files

Latest commit

History

Repository files navigation

Quantitative Reward Monitors for Reinforcement Learning

Setup instructions

Code structure

Cite

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages