Skip to content

"Accuracy" metric for LLM model(s) #986

@freedomtan

Description

@freedomtan

Which "accuracy" metric(s) should we use for LLM benchmarking?

  • MMLU: the first item people choose. it covers several field with multiple-choice questions.
    • @mohitmundhragithub please point where/how MLPerf Client use this
    • mostly running a full-set of this is gonna take several hours on Android devices
    • hence, other choice is TinyMMLU
    • TinyMMLU: 100 questions only
  • Other tasks such as summarization, Q/A

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions