-
Notifications
You must be signed in to change notification settings - Fork 30.1k
Open
Labels
BenchmarksIssues related to Memory regressions in tests and scriptsIssues related to Memory regressions in tests and scriptsWIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progressLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Description
This issue is to document the important transformers
benchmarks in one place, so that they are easy to find.
To add a new benchmark entry post it in an Issue (separately or as a comment in an existing issue) and then link from here. If you have edit rights please add a link directly to this post, otherwise please add a note in the comments and I will update this post.
Please do not post actual benchmarks in the comments of this Issue. This is only an index.
Thank you!
Fastest speed combinations
Precision: fp16 vs bf16 vs tf32 vs fp32
Batch size / gradient accumulation steps
Gradient checkpointing
Optimizers:
- Adam torch vs. apex vs HF vs adafactor: RTX-3090, A100
- re-run the above a year later with the same list of optimizers, plus BNB's 8bit optimizer and fused torch AdamW PCIe 80GB A100
Network / Interconnects:
Metadata
Metadata
Assignees
Labels
BenchmarksIssues related to Memory regressions in tests and scriptsIssues related to Memory regressions in tests and scriptsWIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progressLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress