YouTube Transcript API: Developer Guide and Free Alternatives
Learn how to access YouTube transcripts programmatically. This developer guide covers the official YouTube Data API limitations, free third-party libraries like youtube-transcript-api (Python), and REST API alternatives for extracting video captions.
Key Takeaways
- youtube-transcript-api (Python) is the easiest solution—free, no API key needed
- Official YouTube Data API requires OAuth and only works for videos you own
- yt-dlp provides command-line batch processing capabilities
- Add delays between requests to avoid rate limiting
- About 85% of YouTube videos have auto-generated captions available
- You can build your own REST API wrapper using FastAPI or Flask
The easiest way to get YouTube transcripts programmatically is using the youtube-transcript-api Python library—it's free, requires no API key, and works with any video that has captions. The official YouTube Data API can access captions but requires OAuth authentication and only works for videos you own. This guide covers both approaches plus alternative methods.
YouTube Transcript API Options Overview
| Method | Auth Required | Rate Limits | Languages | Best For |
|---|---|---|---|---|
| youtube-transcript-api (Python) | None | Reasonable | All | Most developers |
| YouTube Data API v3 | OAuth | 10,000 units/day | All | Video owners only |
| yt-dlp (CLI) | None | None | All | Batch processing |
| Web scraping | None | Risky | English | Last resort |
Method 1: youtube-transcript-api (Python)
The most popular solution for extracting YouTube transcripts programmatically. Open-source, free, and requires no API keys.
Installation
pip install youtube-transcript-apiBasic Usage
from youtube_transcript_api import YouTubeTranscriptApi
# Get transcript for a video
video_id = "dQw4w9WgXcQ"
transcript = YouTubeTranscriptApi.get_transcript(video_id)
# Print each segment
for segment in transcript:
print(f"[{segment['start']:.2f}] {segment['text']}")Output Format
[
{'text': 'Hey there', 'start': 0.0, 'duration': 1.5},
{'text': 'welcome to the video', 'start': 1.5, 'duration': 2.0},
{'text': 'today we are going to', 'start': 3.5, 'duration': 2.5},
# ... more segments
]Getting Transcripts in Different Languages
from youtube_transcript_api import YouTubeTranscriptApi
# Get Spanish transcript
transcript_es = YouTubeTranscriptApi.get_transcript(
video_id,
languages=['es']
)
# Try multiple languages (falls back)
transcript = YouTubeTranscriptApi.get_transcript(
video_id,
languages=['en', 'en-US', 'en-GB']
)List Available Transcripts
from youtube_transcript_api import YouTubeTranscriptApi
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
for transcript in transcript_list:
print(f"Language: {transcript.language}")
print(f"Language code: {transcript.language_code}")
print(f"Is generated: {transcript.is_generated}")
print(f"Is translatable: {transcript.is_translatable}")
print("---")Translate Transcripts
# Get English transcript and translate to Spanish
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
transcript = transcript_list.find_transcript(['en'])
translated = transcript.translate('es')
text = translated.fetch()Formatting Options
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import (
TextFormatter,
SRTFormatter,
JSONFormatter
)
transcript = YouTubeTranscriptApi.get_transcript(video_id)
# Plain text
text_formatter = TextFormatter()
text_output = text_formatter.format_transcript(transcript)
# SRT format (for subtitles)
srt_formatter = SRTFormatter()
srt_output = srt_formatter.format_transcript(transcript)
# JSON format
json_formatter = JSONFormatter()
json_output = json_formatter.format_transcript(transcript)Error Handling
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api._errors import (
TranscriptsDisabled,
NoTranscriptFound,
VideoUnavailable
)
try:
transcript = YouTubeTranscriptApi.get_transcript(video_id)
except TranscriptsDisabled:
print("Transcripts are disabled for this video")
except NoTranscriptFound:
print("No transcript found for requested language")
except VideoUnavailable:
print("Video is unavailable (private, deleted, etc.)")
except Exception as e:
print(f"An error occurred: {e}")Batch Processing Multiple Videos
from youtube_transcript_api import YouTubeTranscriptApi
video_ids = ["video1_id", "video2_id", "video3_id"]
transcripts = YouTubeTranscriptApi.get_transcripts(
video_ids,
languages=['en']
)
# transcripts is a dict: {video_id: transcript_data}
for video_id, transcript in transcripts[0].items():
print(f"Video: {video_id}")
print(f"Segments: {len(transcript)}")Method 2: YouTube Data API v3
The official API provides caption access but with significant limitations.
Requirements
- Google Cloud project
- YouTube Data API enabled
- OAuth 2.0 credentials
- Video must be owned by authenticated user OR have downloadable captions
Setup Steps
pip install google-api-python-client google-auth-oauthlibList Captions for a Video
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
SCOPES = ['https://www.googleapis.com/auth/youtube.force-ssl']
# Authenticate
flow = InstalledAppFlow.from_client_secrets_file(
'client_secret.json', SCOPES
)
credentials = flow.run_local_server()
# Build API client
youtube = build('youtube', 'v3', credentials=credentials)
# List captions
request = youtube.captions().list(
part="snippet",
videoId="VIDEO_ID"
)
response = request.execute()
for caption in response['items']:
print(f"Track: {caption['snippet']['name']}")
print(f"Language: {caption['snippet']['language']}")
print(f"ID: {caption['id']}")Download Caption Track
# Download specific caption track
request = youtube.captions().download(
id="CAPTION_TRACK_ID",
tfmt="srt" # or "sbv", "vtt"
)
caption_content = request.execute()Limitations of Official API
| Limitation | Details |
|---|---|
| OAuth required | Can't use simple API key |
| Video ownership | Only download captions for videos you own |
| Quota usage | 50 units per caption list, 200 per download |
| Daily quota | 10,000 units/day (free tier) |
| No auto-captions | Can't download auto-generated captions via API |
When to Use Official API
✅ Managing captions on your own videos
✅ Building YouTube management tools
✅ Enterprise applications with quota needs
✅ Need official support/documentation
❌ Extracting transcripts from any video
❌ Simple transcript extraction projects
❌ Rate-limit sensitive applications
Method 3: yt-dlp (Command Line)
For batch processing or when you prefer command-line tools.
Installation
# pip
pip install yt-dlp
# Homebrew (Mac)
brew install yt-dlp
# Chocolatey (Windows)
choco install yt-dlpDownload Subtitles
# Download auto-generated English subtitles
yt-dlp --write-auto-sub --sub-lang en --skip-download "VIDEO_URL"
# Download manual subtitles
yt-dlp --write-sub --sub-lang en --skip-download "VIDEO_URL"
# Convert to SRT format
yt-dlp --write-auto-sub --convert-subs srt --skip-download "VIDEO_URL"
# Download all available subtitles
yt-dlp --all-subs --skip-download "VIDEO_URL"Python Integration
import yt_dlp
def get_subtitles(video_url):
ydl_opts = {
'writeautomaticsub': True,
'writesubtitles': True,
'subtitleslangs': ['en'],
'skip_download': True,
'outtmpl': '%(id)s',
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
info = ydl.extract_info(video_url, download=False)
# Get subtitle data
subtitles = info.get('subtitles', {})
auto_captions = info.get('automatic_captions', {})
return {
'manual': subtitles,
'auto': auto_captions,
'title': info.get('title'),
'duration': info.get('duration')
}
# Usage
result = get_subtitles("https://youtube.com/watch?v=VIDEO_ID")
print(result)Method 4: REST API Alternatives
Some services offer REST APIs for transcript extraction.
Building Your Own API
You can wrap the youtube-transcript-api in a Flask/FastAPI server:
from fastapi import FastAPI, HTTPException
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api._errors import (
TranscriptsDisabled,
NoTranscriptFound
)
app = FastAPI()
@app.get("/transcript/{video_id}")
async def get_transcript(video_id: str, lang: str = "en"):
try:
transcript = YouTubeTranscriptApi.get_transcript(
video_id,
languages=[lang, 'en']
)
return {
"video_id": video_id,
"language": lang,
"segments": transcript,
"text": " ".join([s['text'] for s in transcript])
}
except TranscriptsDisabled:
raise HTTPException(404, "Transcripts disabled for this video")
except NoTranscriptFound:
raise HTTPException(404, "No transcript found")
except Exception as e:
raise HTTPException(500, str(e))
@app.get("/transcript/{video_id}/languages")
async def list_languages(video_id: str):
try:
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
languages = []
for t in transcript_list:
languages.append({
"code": t.language_code,
"name": t.language,
"is_generated": t.is_generated,
"is_translatable": t.is_translatable
})
return {"video_id": video_id, "languages": languages}
except Exception as e:
raise HTTPException(500, str(e))Run the API
pip install fastapi uvicorn
uvicorn main:app --reloadAPI Endpoints
| Endpoint | Method | Description |
|---|---|---|
| /transcript/{video_id} | GET | Get transcript text |
| /transcript/{video_id}?lang=es | GET | Get transcript in Spanish |
| /transcript/{video_id}/languages | GET | List available languages |
Rate Limits and Best Practices
youtube-transcript-api Rate Limits
The library doesn't have official rate limits, but YouTube may throttle:
- Reasonable use: 100-500 requests/hour
- Heavy use: May trigger CAPTCHAs
- Best practice: Add delays between requests
import time
from youtube_transcript_api import YouTubeTranscriptApi
video_ids = ["id1", "id2", "id3", ...]
transcripts = []
for video_id in video_ids:
try:
transcript = YouTubeTranscriptApi.get_transcript(video_id)
transcripts.append((video_id, transcript))
except Exception as e:
print(f"Error for {video_id}: {e}")
time.sleep(1) # 1 second delay between requestsYouTube Data API Quotas
| Operation | Cost (units) |
|---|---|
| captions.list | 50 |
| captions.download | 200 |
| captions.insert | 400 |
| captions.update | 450 |
| captions.delete | 50 |
Daily quota: 10,000 units (free tier)
Error Handling Best Practices
import logging
from tenacity import retry, stop_after_attempt, wait_exponential
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10)
)
def get_transcript_with_retry(video_id):
"""Get transcript with automatic retry on failure."""
try:
return YouTubeTranscriptApi.get_transcript(video_id)
except Exception as e:
logger.warning(f"Attempt failed for {video_id}: {e}")
raise
# Usage
try:
transcript = get_transcript_with_retry("VIDEO_ID")
except Exception as e:
logger.error(f"All attempts failed: {e}")Frequently Asked Questions
Conclusion
Quick start:
pip install youtube-transcript-apifrom youtube_transcript_api import YouTubeTranscriptApi
transcript = YouTubeTranscriptApi.get_transcript("VIDEO_ID")Need a no-code solution? Try NoteLM.ai's YouTube Transcript Generator for instant transcript extraction without any programming.
Try NoteLM.ai →
Written By
The NoteLM team specializes in AI-powered video summarization and learning tools. We are passionate about making video content more accessible and efficient for learners worldwide.
Sources & References
Was this article helpful?