Commit d025471
committed
feat: enable knowledge distillation
There are many different forms of model training which exist. One popular form of training is knowledge distillation, where a student model learns the output distributions from a teacher model. This commit introduces support for knowledge distillation in the training library.
This commit also exposes the `weight_decay` hyperparameter which is often used to help deep learning models generalize.
Lastly, this commit changes the useage from `torch.distributed` to just `dist`, as it is a common module used throughout the codebase.
Signed-off-by: Oleg S <[email protected]>1 parent 8e6be0d commit d025471
2 files changed
+227
-52
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
121 | 121 | | |
122 | 122 | | |
123 | 123 | | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
124 | 134 | | |
125 | 135 | | |
126 | 136 | | |
| |||
179 | 189 | | |
180 | 190 | | |
181 | 191 | | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
182 | 197 | | |
183 | 198 | | |
184 | 199 | | |
| |||
0 commit comments