PIRO: Proximal Inverse Reward Optimization

This repository contains the implementation of PIRO (Proximal Inverse Reward Optimization), an advanced stable inverse reinforcement learning algorithm.

🛠 Installation

Requirements

Python 3.7+
PyTorch 1.5+
OpenAI Gym
[MuJoCo]
Gymnasium Robotics

Setup

pip install -r requirements.txt
pip install ruamel.yaml

📁 File Structure

PIRO implementation: train/
- trainPIRO.py - Main PIRO training script
- trainML.py - One of baselines -- ML-IRL training
- models/ - Reward function models
SAC agent: common/
Environments: envs/
Configurations: configs/
Utilities: utils/
Baseline methods: baselines/

🚀 Usage

Environment Setup

Before running experiments, set the Python path:

export PYTHONPATH=${PWD}:$PYTHONPATH

Expert Data

During the review process, expert demonstrations from Hugging Face and the D4RL benchmark can be used to reproduce our results. The expert trajectories collected in this study will be released after review process concludes.

Alternatively, generate your own expert data:

# Train expert policy
python common/train_gd.py configs/samples/experts/{env}.yml

# Collect expert demonstrations  
python common/collect.py configs/samples/experts/{env}.yml

#Collect minari dataset
python common/collect_robotic.py

where {env} is one of: hopper, walker2d, halfcheetah, ant......

Training PIRO

Train PIRO on MuJoCo and Robotic environments:

python train/trainPIRO.py configs/samples/agents/{env}.yml

📊 Results

Training logs and models are saved in the logs/ directory with the following structure:

logs/{environment}/exp-{expert_episodes}/{method}/{timestamp}/
├── progress.csv          # Training metrics
├── model/               # Saved reward models  
├── variant.json         # Configuration used
└── plt/                # Plots and visualizations

🎯 Configuration

Experiments are configured using YAML files in configs/. Key parameters:

Environment settings: Environment name, state indices, episode length
Algorithm settings: Learning rates, network architectures, training iterations
Evaluation settings: Number of evaluation episodes, metrics to track

🔬 Baseline Comparisons

The baselines/ directory contains implementations of several IRL methods for comparison:

GAIL, AIRL, BC, IQ-Learn, and others
Each baseline includes its own configuration and training scripts

Our implementation draws inspiration from the structural design of the ML-IRL framework proposed by Zeng et al. [2023], but includes significant modifications tailored to our method and experiments.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PIRO: Proximal Inverse Reward Optimization

🛠 Installation

Requirements

Setup

📁 File Structure

🚀 Usage

Environment Setup

Expert Data

Training PIRO

📊 Results

🎯 Configuration

🔬 Baseline Comparisons

📄 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
baselines		baselines
common		common
configs/samples		configs/samples
envs		envs
train		train
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

PolynomialTime/PIRO

Folders and files

Latest commit

History

Repository files navigation

PIRO: Proximal Inverse Reward Optimization

🛠 Installation

Requirements

Setup

📁 File Structure

🚀 Usage

Environment Setup

Expert Data

Training PIRO

📊 Results

🎯 Configuration

🔬 Baseline Comparisons

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages