Prioritized experience replay #1622

AlexPasqua · 2023-07-23T17:34:10Z

Description

Implementation of prioritized replay buffer for DQN.
Closes #1242

Motivation and Context

I have raised an issue to propose this change (required for new features and bug fixes)

In accordance with #1242

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

- Created SumTree (to be ultimated) - Started PrioritizedReplayBuffer - constructor and 'sample' method - to be tested

AlexPasqua · 2023-08-06T15:13:08Z

@araffin could you (or anyone) please have a look at the 2 pytype errors? I don't quite understand how to fix them

stable_baselines3/dqn/prioritized_replay_buffer.py

AlexPasqua · 2023-09-29T10:00:41Z

Thanks @araffin !
Out of curiosity, may I ask why the switch between torch and numpy for the backend?

araffin · 2023-09-29T10:25:56Z

Thanks @araffin ! Out of curiosity, may I ask why the switch between torch and numpy for the backend?

to be consistent with the rest of the buffers and because PyTorch is not needed here (no gpu computation needed).

AlexPasqua · 2023-09-30T16:27:00Z

Hello @araffin ,
as you moved the code to "common", I suppose you plan to make it usable in algorithms other than DQN. At this point, wouldn't it be clearer to put the code into common/buffers.py? Let me know, and in case, I will move it there.

Added list of rainbow extensions, specifying which ones are currently implemented in the library

araffin · 2023-10-02T16:57:02Z

At this point, wouldn't it be clearer to put the code into common/buffers.py?

yes probably, but the most important thing for now is to test the implementation (performance test, check we can reproduce the results from the paper), document it and add additional tests/doc (for sumtree for instance).

richardjozsa · 2023-11-30T22:32:13Z

Just a comment, I've tested this implementation with QR-DQN with Vecenv multiple environment but it fails because of the missing part.

But good job to start the work on it! I hope it will be merged soon! 👍

stable_baselines3/common/prioritized_replay_buffer.py

jbial · 2024-05-28T03:28:13Z

I've just tried validating the implementation on blind cliffwalk and it seems much slower (~an order of magnitude) than the uniform replay buffer. The results below are for a one seed:

Not sure why this is. The details for blind cliffwalk are a bit vague from the paper (no code available as well), but I've tried to implement it as close to the description as possible.

Code for the test is in this gist:
https://gist.github.com/jbial/105299c00dc3bb7960f0f17f2fc4d6c9

richardjozsa · 2024-06-11T20:50:12Z

stable_baselines3/common/prioritized_replay_buffer.py

+        weights = (self.size() * probs) ** -self.beta
+        weights = weights / weights.max()
+
+        # TODO: add proper support for multi env


How could we add proper support for multiple envs? Is there any idea? Does the random line below could work?

not sure yet, the random line below might work but we need to check if it won't affect performance first.

araffin/sbx@b5ce091 should be better, see araffin/sbx#50

araffin · 2024-07-07T18:38:47Z

Some update from my part, I just added CNN support for SBX (SB3 + Jax) DQN, and it is 10x faster than the PyTorch equivalent: araffin/sbx#49

That should allow to test and debug things more quickly on Atari (~1h40 for 10M steps instead of 15h =D)

Perf report: https://wandb.ai/openrlbenchmark/sbx?nw=nwuseraraffin (on-going)

araffin · 2024-07-17T07:33:49Z

Some additional update: when trying to plug the PER implementation of this PR inside the Jax DQN implementation, the experience replay was the bottleneck (by a good margin, making things 40x slower...), so I investigated different ways to speed things up.

After playing with many different implementation (pure python, numpy, jax, jax jitted, ...), I decided to re-use the SB2 "SegmentTree" vectorized implementation and also implement proper multi-env support.
My current progress is here: araffin/sbx#50

(still debugging, but at least I've got the first sign of life and this implementation is so much faster)

richardjozsa · 2024-07-17T17:31:37Z

Hey @araffin , it is great to hear that. Does SBX/Jax means this much speed improvement?

If you think it is ready for testing I can give a try, just let me know when it is ready to be tested. :)

araffin · 2024-07-17T19:41:25Z

Does SBX/Jax means this much speed improvement?

With the right parameters (see the exact command line argument for the RL Zoo in the OpenRL benchmark organization run on W&B), yes, around 10x faster.

If you think it is ready for testing I can give a try, just let me know when it is ready to be tested. :)

SBX version is ready to be tested but so far, I didn't manage to see any gain from the PER. I also experienced some explosion in the qf value when using multiple env (so there is probably a bug here).
I'm also wondering if I need to implement double q-learning (easy) too to compare to the original paper.

richardjozsa · 2024-07-17T20:43:19Z

When I tested this PR I also noticed an explosion in loss, in that time I felt that it is because of the tweaking here and there. and I also noticed that it doesn't give me any advantage over a normal buffer(and I used Dobule DQN, even tried duelling), but I tried to tweak an N-step buffer which had a strong effect on the learning, AFAIK N-step(multi step) is also part of Rainbow and giving substantial part of the success.

The key parts are the distributional, PER and N-step parts, as far as I understand the concept. The others are kinda tasks specific parts and can be detrimental to use them.

bilelsgh · 2025-08-05T12:54:58Z

Hi @araffin @AlexPasqua,
A NeurIPS 2020 paper shows that using Prioritized Experience Replay (PER) is equivalent to adapting the loss function while using uniform experience replay.

The expected gradient of the loss function (1/τ) * |δ(i)|^τ, where τ > 0, when used with PER, is equal to the expected gradient of the following loss under uniform sampling.
https://papers.neurips.cc/paper_files/paper/2020/file/a3bf6e4db673b6449c2f7d13ee6ec9c0-Paper.pdf

This means we can avoid managing a sorted buffer and the associated complexity, while still converging to the same gradient.

I've already implemented this approach. If you find it relevant, I’d be happy to open a PR.

update: cf #2166

AlexPasqua · 2025-09-03T18:39:45Z

Yes, I'm currently not having the most time to work on this specific feature, so if someone can take it on as you were doing in the previous commits and comments, that'd be great. Of course, it's also possible to consider to implement only the propritized approzimation loss in #2166 instead

bilelsgh · 2025-09-22T09:35:24Z

Ok, nice.
The PR for the propritized approzimation loss is pending, let me know if I can help for something.

araffin · 2025-09-25T08:04:48Z

The PR for the propritized approzimation loss is pending, let me know if I can help for something.

Thanks for the PR but for now I would prefer to have a full Rainbow implementation first.
Although there might be equivalent loss, the idea is more to have a reference implementation that matches the original paper.

One of the main blocker currently is to check that the prioritized replay buffer is correctly implemented and to make it faster (I tried with jax in araffin/sbx#50 but couldn't get good results so far, see #1622 (comment)).

bilelsgh · 2025-11-27T13:52:40Z

The PR for the propritized approzimation loss is pending, let me know if I can help for something.

Thanks for the PR but for now I would prefer to have a full Rainbow implementation first. Although there might be equivalent loss, the idea is more to have a reference implementation that matches the original paper.

One of the main blocker currently is to check that the prioritized replay buffer is correctly implemented and to make it faster (I tried with jax in araffin/sbx#50 but couldn't get good results so far, see #1622 (comment)).

Sure, I got you.

Maybe we can add this equivalent loss under a completely different name? That way, we keep 'Prioritized Replay Buffer' reserved for the full Rainbow implementation, but we can still use this PAL method. It’s a very useful feature that is really missing from SB3 right now to introduce priority while keeping a reasonable training time.

(And personally, I really need it for my research work so I'm eager to have it in! haha)

AlexPasqua and others added 8 commits July 22, 2023 21:31

Started PER

c607585

- Created SumTree (to be ultimated) - Started PrioritizedReplayBuffer - constructor and 'sample' method - to be tested

Added "add" method + other improvements

57f1192

Docstrings, type hints, doc

2b9df33

Merge branch 'master' into prioritized-experience-replay

31dc46c

Merge branch 'master' into prioritized-experience-replay

1a32377

Merge branch 'master' into prioritized-experience-replay

ccf7dc3

FIxed for pytype checks (partially)

aee1d30

make format

c51b173

araffin reviewed Aug 6, 2023

View reviewed changes

stable_baselines3/dqn/prioritized_replay_buffer.py Outdated Show resolved Hide resolved

araffin reviewed Aug 6, 2023

View reviewed changes

stable_baselines3/dqn/prioritized_replay_buffer.py Outdated Show resolved Hide resolved

Made pytype ignore type on PER's sample method

18c9d28

araffin added the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Aug 10, 2023

Merge branch 'master' into prioritized-experience-replay

840dde2

araffin removed the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Sep 4, 2023

araffin self-requested a review September 4, 2023 08:52

araffin added 3 commits September 29, 2023 10:19

Merge branch 'master' into prioritized-experience-replay

dcfbf88

Switch to numpy for the backend

fb33732

Move to common and add tests

f984e5c

AlexPasqua marked this pull request as ready for review September 29, 2023 09:58

AlexPasqua and others added 5 commits September 30, 2023 19:40

Updated DQN docs

5edf8bf

Added list of rainbow extensions, specifying which ones are currently implemented in the library

Update doc

2f76038

Rename things to be consistent with buffers.py

42f2f4a

Rename variables and add priority update

007105f

Ignore mypy

cc37cba

Add beta schedule

b60ef03

Merge branch 'master' into prioritized-experience-replay

a043cfd

araffin added 2 commits January 30, 2024 15:54

Merge branch 'master' into prioritized-experience-replay

f6accf9

Merge branch 'master' into prioritized-experience-replay

b21ef33

janakact reviewed May 24, 2024

View reviewed changes

stable_baselines3/common/prioritized_replay_buffer.py Outdated Show resolved Hide resolved

Merge branch 'master' into prioritized-experience-replay

f57444a

araffin mentioned this pull request May 24, 2024

[Feature Request] RAINBOW #622

Open

1 task

Minor fix in PER

be00231

Merge branch 'master' into prioritized-experience-replay

4390ec7

richardjozsa reviewed Jun 11, 2024

View reviewed changes

araffin mentioned this pull request Jul 7, 2024

Add CNN support for DQN araffin/sbx#49

Merged

14 tasks

Merge branch 'master' into prioritized-experience-replay

bee9cbe

araffin added 3 commits July 12, 2024 13:41

Only convert to numpy if needed

fb1a9f7

Increase min priority to avoid division by zero

150b09a

Merge branch 'master' into prioritized-experience-replay

5c0c79d

Merge branch 'master' into prioritized-experience-replay

148e4aa

bilelsgh mentioned this pull request Aug 5, 2025

Add Prioritized Approximation Loss feature #2166

Closed

16 tasks

araffin mentioned this pull request Dec 2, 2025

Implement Beyond the Rainbow (BTR) Algorithm Stable-Baselines-Team/stable-baselines3-contrib#314

Open

bilelsgh mentioned this pull request Dec 5, 2025

Add PAL to approximate Prioritized Experience Replay #2199

Open

16 tasks

Prioritized experience replay #1622

Are you sure you want to change the base?

Prioritized experience replay #1622

Conversation

AlexPasqua commented Jul 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Types of changes

Checklist

Uh oh!

AlexPasqua commented Aug 6, 2023

Uh oh!

Uh oh!

Uh oh!

AlexPasqua commented Sep 29, 2023

Uh oh!

araffin commented Sep 29, 2023

Uh oh!

AlexPasqua commented Sep 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

araffin commented Oct 2, 2023

Uh oh!

richardjozsa commented Nov 30, 2023

Uh oh!

Uh oh!

jbial commented May 28, 2024

Uh oh!

richardjozsa Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

araffin Jun 23, 2024

Choose a reason for hiding this comment

Uh oh!

araffin Jul 17, 2024

Choose a reason for hiding this comment

Uh oh!

araffin commented Jul 7, 2024

Uh oh!

araffin commented Jul 17, 2024

Uh oh!

richardjozsa commented Jul 17, 2024

Uh oh!

araffin commented Jul 17, 2024

Uh oh!

richardjozsa commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bilelsgh commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexPasqua commented Sep 3, 2025

Uh oh!

bilelsgh commented Sep 22, 2025

Uh oh!

araffin commented Sep 25, 2025

Uh oh!

bilelsgh commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

AlexPasqua commented Jul 23, 2023 •

edited

Loading

AlexPasqua commented Sep 30, 2023 •

edited

Loading

richardjozsa commented Jul 17, 2024 •

edited

Loading

bilelsgh commented Aug 5, 2025 •

edited

Loading