[Draft]: Add policy entropy metric #214

SamComber · 2025-08-18T15:50:52Z

Would be good to have an option to understand potential policy entropy collapse. Exhaustion is typically indicative of growing determinism in token decoding and diminishes exploration behaviour (see https://arxiv.org/pdf/2505.22617 paper)

WIP

Need to optimise so we don't create B*L*V intermediate tensors inside the entropy calc (i.e. basically 3x the memory required right now)... will likely batch it and put a warning flag when calculate entropy is enabled ??

qgallouedec · 2025-08-19T16:14:45Z

verifiers/trainers/grpo_trainer.py

            logits = logits / self.temperature
+
+            if compute_entropy:
+                entropy = entropy_from_logits_memory_efficient(logits, chunk_size=32)


hey! if you want something aligned with trl, we have entropy_from_logits that you can directly import

https://github.com/cyyever/trl/blob/2ddf8010881e3bdf215be28ed6fd6f3a8ae2bcf5/trl/trainer/utils.py#L1469

it does the same as here

Oh this is perfect! Thanks for the suggestion

Still getting OOMs with chunk size 1 here in places where we definitely should not be hmm @qgallouedec , does entropy_from_logits internally need to torch.no_grad() perhaps?

SamComber marked this pull request as draft August 18, 2025 15:52

SamComber force-pushed the add-policy-entropy branch from d18f7ec to 08eab04 Compare August 18, 2025 15:53

SamComber changed the title ~~Add policy entropy~~ WIP (need to test thoroughly): Add policy entropy Aug 18, 2025

SamComber changed the title ~~WIP (need to test thoroughly): Add policy entropy~~ WIP: Add policy entropy Aug 18, 2025

SamComber changed the title ~~WIP: Add policy entropy~~ [Draft]: Add policy entropy Aug 18, 2025

SamComber changed the title ~~[Draft]: Add policy entropy~~ [Draft]: Add policy entropy metric Aug 18, 2025

SamComber force-pushed the add-policy-entropy branch 3 times, most recently from bec0673 to 16a14d4 Compare August 18, 2025 19:54

qgallouedec reviewed Aug 19, 2025

View reviewed changes

Add policy entropy reporting

8f4d018

SamComber force-pushed the add-policy-entropy branch from 16a14d4 to 8f4d018 Compare August 19, 2025 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Draft]: Add policy entropy metric #214

[Draft]: Add policy entropy metric #214

Uh oh!

SamComber commented Aug 18, 2025 •

edited

Loading

Uh oh!

qgallouedec Aug 19, 2025

Uh oh!

SamComber Aug 19, 2025

Uh oh!

SamComber Aug 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

[Draft]: Add policy entropy metric #214

Are you sure you want to change the base?

[Draft]: Add policy entropy metric #214

Uh oh!

Conversation

SamComber commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

WIP

Uh oh!

qgallouedec Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

SamComber Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

SamComber Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SamComber commented Aug 18, 2025 •

edited

Loading

SamComber Aug 19, 2025 •

edited

Loading