"Accuracy" metric for LLM model(s)

Which "accuracy" metric(s) should we use for LLM benchmarking?

- MMLU: the first item people choose. it covers several field with multiple-choice questions.
  - @mohitmundhragithub please point where/how MLPerf Client use this
  - mostly running a full-set of this is gonna take several hours on Android devices
  - hence, other choice is TinyMMLU
  - TinyMMLU: 100 questions only
- Other tasks such as summarization, Q/A

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

"Accuracy" metric for LLM model(s) #986

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

"Accuracy" metric for LLM model(s) #986

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions