-
-
Notifications
You must be signed in to change notification settings - Fork 619
Description
DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to include the video id.
To Reproduce
Steps to reproduce the behavior:
- Fetch video
EbF3XRxISxc
- Hosted on Google Cloud Run
- Proxied through Bright Data Web Unlocker
What code / cli command are you executing?
For example: I am running
def fetch_captions(self, video_id: str) -> str | None:
try:
transcripts = self._youtube_transcript_api.fetch(video_id, preserve_formatting=True)
return "\n".join(snippet.text for snippet in transcripts)
except YouTubeTranscriptApiException:
self._logger.exception(f"YouTubeTranscriptApiException for video {video_id}")
return None
except Exception:
self._logger.exception(f"Error fetching captions for video ID {video_id}")
return None
Which Python version are you using?
Python 3.11
Which version of youtube-transcript-api are you using?
youtube-transcript-api 1.2.1
Expected behavior
Describe what you expected to happen.
Parse to succeed (with English transcript)
Actual behaviour
Describe what is happening instead of the Expected behavior. Add error messages if there are any.
The HTTP request is succeeding but I'm seeing the following error:
2025-07-28 06:01:05 [ClosedCaptions] Error fetching captions for video ID EbF3XRxISxc
Traceback (most recent call last):
File "/worker-a/workera/ingest/ingestion/webpage/platform/youtube/closed_captions.py", line 20, in fetch_captions
.fetch(video_id)
^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 353, in fetch
return TranscriptList.build(
^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 218, in build
translation_languages = [
^
File "/opt/venv/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 220, in <listcomp>
language=translation_language["languageName"]["runs"][0]["text"],
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
KeyError: 'runs'
The raw JSON response is of the form:
{
...
"translationLanguages": [
{
"languageCode": "ar",
"languageName": {
"simpleText": "Arabic"
}
},
{
"languageCode": "zh-Hant",
"languageName": {
"simpleText": "Chinese (Traditional)"
}
},
{
"languageCode": "nl",
"languageName": {
"simpleText": "Dutch"
}
},
{
"languageCode": "en",
"languageName": {
"simpleText": "English"
},
"translationSourceTrackIndices": [
3
]
},
{
"languageCode": "fr",
"languageName": {
"simpleText": "French"
}
},
{
"languageCode": "de",
"languageName": {
"simpleText": "German"
}
},
{
"languageCode": "hi",
"languageName": {
"simpleText": "Hindi"
}
},
{
"languageCode": "id",
"languageName": {
"simpleText": "Indonesian"
}
},
{
"languageCode": "it",
"languageName": {
"simpleText": "Italian"
}
},
{
"languageCode": "ja",
"languageName": {
"simpleText": "Japanese"
}
},
{
"languageCode": "ko",
"languageName": {
"simpleText": "Korean"
}
},
{
"languageCode": "pt",
"languageName": {
"simpleText": "Portuguese"
}
},
{
"languageCode": "ru",
"languageName": {
"simpleText": "Russian"
}
},
{
"languageCode": "es",
"languageName": {
"simpleText": "Spanish"
}
},
{
"languageCode": "th",
"languageName": {
"simpleText": "Thai"
}
},
{
"languageCode": "tr",
"languageName": {
"simpleText": "Turkish"
}
},
{
"languageCode": "uk",
"languageName": {
"simpleText": "Ukrainian"
}
},
{
"languageCode": "vi",
"languageName": {
"simpleText": "Vietnamese"
}
}
],
"defaultAudioTrackIndex": 0,
"defaultTranslationSourceTrackIndices": [
1
]
}
Note that there is no runs
field under languageName
(with text
fields inside), and instead there are simpleText
fields.
Is this a new type of caption the library needs to be taught about?