Skip to content

Conversation

ysjprojects
Copy link

@ysjprojects ysjprojects commented Sep 5, 2025

What does this PR do?

The GRPO trainer uses self.current_gradient_accumulation_steps in its _compute_loss method, but this attribute is never initialized. When eval_on_start=True, evaluation runs before the training loop starts, causing an AttributeError because the attribute doesn't exist yet.

Fixes #4010

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@qgallouedec
Copy link
Member

I think this fix should be for transformers' Trainer instead, no?

@konstantinjdobler
Copy link

@qgallouedec In the transformers.Trainer, it's fine because they never access current_gradient_accumulation_steps in the eval. However in trl.GRPOTrainer, we do (see #4010).

@konstantinjdobler
Copy link

But I think the eval loss reporting might be broken in any case now (even with this fix in this PR to avoid an Exception). Afaik we don't use gradient accumulation in eval (why would we?), but we still divide losses by the current_gradient_accumulation_steps set during the previous train step. At the very least there will be a bug if the current_gradient_accumulation_steps set during the last train step is smaller than for previous evals (e.g. if using drop_last=False).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

current_gradient_accumulation_steps is undefined when eval_on_start==True

3 participants