RL: Add GRPO as Policy to the Pool of Objects

**Description/Motivation**
GRPO is the cutting-edge RL algorithm, developed by the DeepSeek team (https://arxiv.org/abs/2402.03300). Hence, it is not available in Stable-Baselines3. Hence, it would be an advantage to have GRPO in our library.


**Task list**
- [ ] 1. Do this
- [ ] 2. Do that


**Related issues**
#...


**Cross references**
...