Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dpo		dpo
grpo		grpo
pdf		pdf
README.md		README.md

Repository files navigation

Deep Reinforcement Learning Presentations

This repository contains lecture materials from two presentations I delivered at the Faculty of Informatics, Masaryk University, as part of the IV125 Formela Lab Seminar focused on advanced reinforcement learning topics.

Presentations

1. Discovered Policy Optimisation (DPO)

presented on 2025-03-07

Mirror Learning Framework: PPO paradox, drift functions, theoretical foundation
Meta-Learning Discovery: Evolution strategies, automatic algorithm discovery
DPO Formula: From open to closed-form formula, outperforms PPO

2. Group Relative Policy Optimisation (GRPO)

presented on 2025-05-16

GRPO vs PPO: Value model elimination, group-relative advantage
DeepSeekMath: Base model, continual pre-training, SFT, GRPO
DeepSeek-R1: Pure RL, GPT-o1 performance, mathematical reasoning

Disclaimer: Some figures are reproduced from original research papers for educational purposes. All sources are properly cited.

About

Deep RL topics presented at FI MUNI

drl ppo dpo deepseek-math deepseek-r1 grpo

Report repository

Releases

No releases published

Packages

No packages published

Languages

TeX 100.0%