-
Couldn't load subscription status.
- Fork 29
Open
Description
Which "accuracy" metric(s) should we use for LLM benchmarking?
- MMLU: the first item people choose. it covers several field with multiple-choice questions.
- @mohitmundhragithub please point where/how MLPerf Client use this
- mostly running a full-set of this is gonna take several hours on Android devices
- hence, other choice is TinyMMLU
- TinyMMLU: 100 questions only
- Other tasks such as summarization, Q/A
Metadata
Metadata
Assignees
Labels
No labels