Skip to content

Commit bec0673

Browse files
committed
Gather per-token entropy before computing stats
1 parent fc6152d commit bec0673

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

verifiers/trainers/grpo_trainer.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1327,6 +1327,7 @@ def compute_loss( # type: ignore
13271327
if self.log_policy_entropy:
13281328
masked_entropy = per_token_entropy * completion_mask
13291329
total_completion_tokens = completion_mask.sum()
1330+
13301331
if total_completion_tokens > 0:
13311332
mean_entropy = masked_entropy.sum() / total_completion_tokens
13321333
gathered_entropy = self.accelerator.gather_for_metrics(mean_entropy)

0 commit comments

Comments
 (0)