-
Notifications
You must be signed in to change notification settings - Fork 12
Add QK norm to Causal Self Attention Block #414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
99cca4c
initial changes adding qk norm
CYHSM 2f3598d
feat: added qk norm
CYHSM d918c87
added config files
CYHSM f182727
feat: added qk norm to attention block
CYHSM 6a53a31
Merge: main into current branch
693c57d
fix: Fix spelling mistake
f48aa0c
Fix: Adapt configts to latest changes
941be96
fix: Reverse removal of RMSNorm as its imported in the test suite
CYHSM 329beb0
Fix: Adapt configs to latest configs
f1152e5
fix: Compute peak memory correctly on cpu device
5ab3ecf
test: Require A100 GPUs for mfu test
4571ffe
test: Adapt tests to latest changes
e8e8126
test(parallelism): Adapted fsdp2+tp+tt test to recent changes.
BlueCrescent ce00ba3
test: add test for qk norm
CYHSM e692e39
Merge branch 'Modalities:main' into qk_norm
CYHSM 61b4a2c
fixing rebase
CYHSM 7ce6d50
fix: fix tests by making RMSNorm backward compatible
CYHSM 28186f7
Update tests/fsdp2_parallelization/pipeline_parallelism/test_pp_fwd_b…
rrutmann f375c2a
Update src/modalities/models/parallelism/pipeline_parallelism.py
rrutmann 0c4eee8
fix: merge pp branch into qk branch
CYHSM f3bfb58
added compiled model configs for throughput tests
CYHSM 6c625cd
Merge remote-tracking branch 'upstream/main' into qk_norm
CYHSM f25e3d3
additional fixes after review
CYHSM 36769ee
removed config files from PR
CYHSM 5fed18e
fix: apply pre-commit
CYHSM File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be a use-case to use other norms for q and k norm? If not, I would hardcode RMS norm here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LayerNorm can also be used for QK-norm so I would leave it like that. This keeps the experiments also consistent where we tested layer vs rms as all layers were changed to that norm. Just for an experiment where we want layernorm everywhere except the QK values we would need to change that