fix: wrong handling chunked response in streaming mode and concurrent session #431
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a fix for an issue that perf_analyzer failed to proceed chunked responses from the server which is enabled for HTTP SSE.
When specifying
--session-concurrencyand--service-kind openaitogether for input payloads which include"stream": true, PA failed during parsing a response which is delta response with a SSE prefix,data:like below. Also, after one fix for this parsing issue, another error,what(): std::future_error: Promise already satisfied, happened. This was caused because PA doesn't properly consider SSE responses and multi-turn payload from the chat history.So, this PR fixes these two problems.
FYI. the result with
--concurrency-rangeinstead of--session-concurrencyfor the same payload excluding"session_id":is below. No issue happened.Note that example input payloads which include
"stream": trueare generated viagenai-perflike below.