Skip to content

Unable to handle new simpleText caption type #503

@samkillin

Description

@samkillin

DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to include the video id.

To Reproduce

Steps to reproduce the behavior:

  • Fetch video EbF3XRxISxc
  • Hosted on Google Cloud Run
  • Proxied through Bright Data Web Unlocker

What code / cli command are you executing?

For example: I am running

def fetch_captions(self, video_id: str) -> str | None:
    try:
        transcripts = self._youtube_transcript_api.fetch(video_id, preserve_formatting=True)
        return "\n".join(snippet.text for snippet in transcripts)
    except YouTubeTranscriptApiException:
        self._logger.exception(f"YouTubeTranscriptApiException for video {video_id}")
        return None
    except Exception:
        self._logger.exception(f"Error fetching captions for video ID {video_id}")
        return None

Which Python version are you using?

Python 3.11

Which version of youtube-transcript-api are you using?

youtube-transcript-api 1.2.1

Expected behavior

Describe what you expected to happen.
Parse to succeed (with English transcript)

Actual behaviour

Describe what is happening instead of the Expected behavior. Add error messages if there are any.

The HTTP request is succeeding but I'm seeing the following error:

2025-07-28 06:01:05 [ClosedCaptions] Error fetching captions for video ID EbF3XRxISxc
Traceback (most recent call last):
  File "/worker-a/workera/ingest/ingestion/webpage/platform/youtube/closed_captions.py", line 20, in fetch_captions
    .fetch(video_id)
     ^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 353, in fetch
    return TranscriptList.build(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 218, in build
    translation_languages = [
                            ^
  File "/opt/venv/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 220, in <listcomp>
    language=translation_language["languageName"]["runs"][0]["text"],
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
KeyError: 'runs'

The raw JSON response is of the form:

{
  ...
  "translationLanguages": [
    {
      "languageCode": "ar",
      "languageName": {
        "simpleText": "Arabic"
      }
    },
    {
      "languageCode": "zh-Hant",
      "languageName": {
        "simpleText": "Chinese (Traditional)"
      }
    },
    {
      "languageCode": "nl",
      "languageName": {
        "simpleText": "Dutch"
      }
    },
    {
      "languageCode": "en",
      "languageName": {
        "simpleText": "English"
      },
      "translationSourceTrackIndices": [
        3
      ]
    },
    {
      "languageCode": "fr",
      "languageName": {
        "simpleText": "French"
      }
    },
    {
      "languageCode": "de",
      "languageName": {
        "simpleText": "German"
      }
    },
    {
      "languageCode": "hi",
      "languageName": {
        "simpleText": "Hindi"
      }
    },
    {
      "languageCode": "id",
      "languageName": {
        "simpleText": "Indonesian"
      }
    },
    {
      "languageCode": "it",
      "languageName": {
        "simpleText": "Italian"
      }
    },
    {
      "languageCode": "ja",
      "languageName": {
        "simpleText": "Japanese"
      }
    },
    {
      "languageCode": "ko",
      "languageName": {
        "simpleText": "Korean"
      }
    },
    {
      "languageCode": "pt",
      "languageName": {
        "simpleText": "Portuguese"
      }
    },
    {
      "languageCode": "ru",
      "languageName": {
        "simpleText": "Russian"
      }
    },
    {
      "languageCode": "es",
      "languageName": {
        "simpleText": "Spanish"
      }
    },
    {
      "languageCode": "th",
      "languageName": {
        "simpleText": "Thai"
      }
    },
    {
      "languageCode": "tr",
      "languageName": {
        "simpleText": "Turkish"
      }
    },
    {
      "languageCode": "uk",
      "languageName": {
        "simpleText": "Ukrainian"
      }
    },
    {
      "languageCode": "vi",
      "languageName": {
        "simpleText": "Vietnamese"
      }
    }
  ],
  "defaultAudioTrackIndex": 0,
  "defaultTranslationSourceTrackIndices": [
    1
  ]
}

Note that there is no runs field under languageName (with text fields inside), and instead there are simpleText fields.

Is this a new type of caption the library needs to be taught about?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions