-
Notifications
You must be signed in to change notification settings - Fork 200
Add L2NormHook and use it in megatron.py #599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add L2NormHook and use it in megatron.py #599
Conversation
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/compress #599 +/- ##
=================================================
Coverage 74.37% 74.37%
=================================================
Files 182 182
Lines 18219 18219
=================================================
Hits 13550 13550
Misses 4669 4669 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
| max_size = num_heads_per_group_max * num_query_groups_max * self.config.kv_channels | ||
| activation_hook = L2NormHook(max_size=max_size) | ||
| self._register_temp_attribute("_activation_hook", activation_hook) | ||
| # TODO: confusion: why hook_handle is removed manually in export() and not using _register_temp_attribute? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if we register hook_handle as temp attribute, we still need to call hook_handle.remove() to remove the hook so there's no change. Temp attribute will be remove from model i.e. self.hook_handle reference will be dropped but that still doesnt remove the actuall pytorch hook added to the forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand now.
Co-authored-by: Keval Morabia <[email protected]> Signed-off-by: Daniel Korzekwa <[email protected]>
Co-authored-by: Keval Morabia <[email protected]> Signed-off-by: Daniel Korzekwa <[email protected]>
Co-authored-by: Keval Morabia <[email protected]> Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
What does this PR do?
This is the first step towards reusing activation scores logic across Minitron and Puzzle. Next steps:
Questions: