Throughputs of Long Sequences #12608 #1985
Unanswered
simmonssong
asked this question in
Q&A
Replies: 1 comment
-
interesting catch — yeah some infra optimizations in llama.cpp (like tiled kv-cache) do boost throughput on long sequences, we ran into this when stress-testing long-form prompts + nested reasoning — throughput went up, but logic fidelity went sideways. anyway, great question. if you're testing semantic drift under load too, happy to swap notes. some of those failure patterns are... spooky. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am testing throughputs of input sequences with different lengths. I found that throughput increases with length on several different models and quantization, is this caused by build-in infrastructure optimization of Llama.cpp?
Beta Was this translation helpful? Give feedback.
All reactions