-
Notifications
You must be signed in to change notification settings - Fork 139
Distillation example #758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Distillation example #758
Conversation
3.test_cases/pytorch/distillation/kubernetes/distill.yaml-template
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution. See comments.
Also please consider adding Slurm submission script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments. Docker images does not build because it cannot find setup.sh
and there is no setup.sh
under src/
. Please, go through a clean run on a new environment and see if other things will break. Waiting on your update to move this forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR Satya! Is this ready for review?
- Flash Attention 2.7.4 | ||
- DeepSpeed | ||
|
||
### Installation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide step by step isntructions to follow in the readme, or provide a link to the FSDP Kuberentes example if it is too burdensome to maintain identical readme instructions (this might be a good idea, to reduce maintence overhead), if you are taking a dependency on the FSDP setup instructions
Issue #, if available:
Distillation example
(7B Arcee model distilled onto 1.5B Qwen model)
Description of changes:
Added a new folder within 3.testcases/pytorch
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
@nadknish and @nghtm