Skip to content

RL: Add GRPO as Policy to the Pool of Objects #1130

@steveyuwono

Description

@steveyuwono

Description/Motivation
GRPO is the cutting-edge RL algorithm, developed by the DeepSeek team (https://arxiv.org/abs/2402.03300). Hence, it is not available in Stable-Baselines3. Hence, it would be an advantage to have GRPO in our library.

Task list

  • 1. Do this
  • 2. Do that

Related issues
#...

Cross references
...

Metadata

Metadata

Assignees

Labels

RLReinforcement LearningenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions