Improve Japanese TTS/STT UX and dependency handling #260
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
MLX Audio’s Japanese TTS/STT UX had several rough edges: dependencies caused runtime crashes, the TTS language picker allowed unsupported combinations, the download button did not work time to time and transcript playback defaulted to a placeholder clip. This PR focuses on stabilising japanese so users can reliably synthesize and transcribe in Japanese.
Description
dictionaries, and guard language requests).
and generate real downloads.
improve the transcript playback experience.
Changes in the codebase
pyopenjtalk, fugashi[unidic-lite]), alias save_weights for older mlx_lm, bootstrap fugashi/MeCab env vars, and
document the behavior.
auto-select matching Kokoro voices, and implement a working audio download button.
language for auto-detect STT, persist the uploaded audio data URL, feed it into the transcript player, and fix the
playback timer.
Changes outside the codebase
UniDic-lite). The server now configures MeCab automatically; no manual OS-level install steps are required.
Additional information
(e.g., Marvis → English only).
Checklist