-
Notifications
You must be signed in to change notification settings - Fork 29.8k
Open
Labels
BenchmarksIssues related to Memory regressions in tests and scriptsIssues related to Memory regressions in tests and scriptsWIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progressLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Description
🖥 Benchmarking transformers
w/ HF Trainer on RTX-3090
We are going to use a special benchmarking tool that will do all the work for us. #14934
This is the index post and specific benchmarks are in their own posts below:
- fp16 vs bf16 vs tf32 vs fp32
- gradient accumulation steps
- gradient checkpointing
- batch size
- optimizers
- combining winning strategies ~2x speed improvement!
- RTX-3090 vs A100
See also the same benchmarks for A100
TODO:
- other suggestions?
Note that each benchmark was run only once, so multiple runs and averaging is probably going to give slightly different results. The purpose here though is to see relative differences roughly and not try to give an exact number.
jiaqianjing, NicoleMayer, mjamroz and SimpleJerryLysandreJik, aqred1, IamGianluca, NouamaneTazi, ozanciga and 7 more
Metadata
Metadata
Assignees
Labels
BenchmarksIssues related to Memory regressions in tests and scriptsIssues related to Memory regressions in tests and scriptsWIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progressLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress