Skip to content

Conversation

@danielkorzekwa
Copy link

What does this PR do?

  • Add L2NormHook and use it in megatron.py
  • Using L2NormHook removes code duplication between _DynamicSelfAttention and _DynamicMLP

This is the first step towards reusing activation scores logic across Minitron and Puzzle. Next steps:

  • complete redesign of megatron.py - move other activation hooks logic to hooks.py
  • then combined those hooks.py with a similar hooks.py functoriality in puzzle (modelopt/torch/_compress/activation_scoring/activation_hooks/hooks.py)

Questions:

  • why in the code before and after this redesign we store temp variables in two ways _register_temp_attribute and self.hook_handle)?
self._register_temp_attribute("_activation_hook", activation_hook)
# TODO: confusion: why hook_handle is removed manually in export() and not using _register_temp_attribute?
self.hook_handle = self.linear_fc2.register_forward_hook(activation_hook)

Signed-off-by: Daniel Korzekwa <[email protected]>
@danielkorzekwa danielkorzekwa requested a review from a team as a code owner November 22, 2025 10:32
@codecov
Copy link

codecov bot commented Nov 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.37%. Comparing base (f10be0d) to head (5aaaf1a).
⚠️ Report is 2 commits behind head on feature/compress.

Additional details and impacted files
@@                Coverage Diff                @@
##           feature/compress     #599   +/-   ##
=================================================
  Coverage             74.37%   74.37%           
=================================================
  Files                   182      182           
  Lines                 18219    18219           
=================================================
  Hits                  13550    13550           
  Misses                 4669     4669           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
max_size = num_heads_per_group_max * num_query_groups_max * self.config.kv_channels
activation_hook = L2NormHook(max_size=max_size)
self._register_temp_attribute("_activation_hook", activation_hook)
# TODO: confusion: why hook_handle is removed manually in export() and not using _register_temp_attribute?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if we register hook_handle as temp attribute, we still need to call hook_handle.remove() to remove the hook so there's no change. Temp attribute will be remove from model i.e. self.hook_handle reference will be dropped but that still doesnt remove the actuall pytorch hook added to the forward

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand now.

danielkorzekwa and others added 4 commits November 26, 2025 12:39
Co-authored-by: Keval Morabia <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Co-authored-by: Keval Morabia <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Co-authored-by: Keval Morabia <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
@danielkorzekwa danielkorzekwa merged commit 8c9cdd4 into feature/compress Nov 26, 2025
21 checks passed
@danielkorzekwa danielkorzekwa deleted the dkorzekwa/pruning_scores_redesign_new branch November 26, 2025 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants