-
Notifications
You must be signed in to change notification settings - Fork 16
Implement audio chat provider #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d285a78
to
c9fa7c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just nitpicks :)
a8ac1af
to
264930e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
9adb87e
to
720d619
Compare
A few new things:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting it into two functions is a good idea, and I have some feedback on the audio id portion.
720d619
to
688d951
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
throw new RuntimeException($serviceName . ' text to speech generation failed with: ' . $e->getMessage()); | ||
} | ||
} else { | ||
$output = base64_decode($message['audio']['data']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this audio data expires at a certain time expires_at
. Should we check that and fallback to the above TTS approach to get the new audio and audio id?
Signed-off-by: Julien Veyssier <[email protected]>
Signed-off-by: Julien Veyssier <[email protected]>
Signed-off-by: Julien Veyssier <[email protected]>
Signed-off-by: Julien Veyssier <[email protected]>
Signed-off-by: Julien Veyssier <[email protected]>
…input if using chat endpoint Signed-off-by: Julien Veyssier <[email protected]>
Signed-off-by: Julien Veyssier <[email protected]>
…he server task type is available, handle the case when the chat endpoint does not return audio Signed-off-by: Julien Veyssier <[email protected]>
Signed-off-by: Julien Veyssier <[email protected]>
… multimodal model Signed-off-by: Julien Veyssier <[email protected]>
Signed-off-by: Julien Veyssier <[email protected]>
6a48831
to
3ea9b77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than what @kyteinsky said everything else LGTM.
@kyteinsky @lukasdotcom The expiration is checked where the history array is built. Currently the only place where we do that is when the assistant schedules an audio chat task, in https://github.com/nextcloud/assistant/pull/292/files#diff-bacc1ea613cd8effdacf1a14dd4d819266ed4315d83dc81e94cad93b8d071ff4R591-R592 In integration_openai, we won't receive the expiration date in the history input, we can't check it here. |
Requires nextcloud/assistant#291 (for the mp3 recording).
Related with nextcloud/server#53759
If connected to OpenAI, we can use the chat completion endpoint with the
gpt-4o-audio-preview
model which accepts audio messages as input.If not connected to OpenAI, it's safer to split the process in: STT -> LLM -> TTS.