This repository contains lecture materials from two presentations I delivered at the Faculty of Informatics, Masaryk University, as part of the IV125 Formela Lab Seminar focused on advanced reinforcement learning topics.
presented on 2025-03-07
- Mirror Learning Framework: PPO paradox, drift functions, theoretical foundation
- Meta-Learning Discovery: Evolution strategies, automatic algorithm discovery
- DPO Formula: From open to closed-form formula, outperforms PPO
presented on 2025-05-16
- GRPO vs PPO: Value model elimination, group-relative advantage
- DeepSeekMath: Base model, continual pre-training, SFT, GRPO
- DeepSeek-R1: Pure RL, GPT-o1 performance, mathematical reasoning
Disclaimer: Some figures are reproduced from original research papers for educational purposes. All sources are properly cited.