Mourad GHAFIRI

youtube-mcp-server - MCP server for YouTube video transcription and metadata.

A powerful Model Context Protocol (MCP) server for YouTube video transcription and metadata extraction.

Add a comment

Replies

Best
Mourad GHAFIRI
A powerful Model Context Protocol (MCP) server for YouTube video transcription and metadata extraction. This server provides advanced tools for AI agents to retrieve video metadata and generate high-quality transcriptions with native language support. 🌟 Features Metadata Extraction: Retrieve comprehensive video details (title, description, views, duration, etc.) without downloading the video. Smart Transcription: In-Memory Processing: fast, efficient, and disk-I/O free pipeline. VAD (Voice Activity Detection): uses Silero VAD for precise segmentation. Multilingual Support: supports 99 languages. Translation: Transcribe to any supported language. Caching: Intelligent file-based caching to avoid redundant processing. Optimized Performance: Uses yt-dlp for robust extraction. Hardware acceleration (MPS/CUDA) for Whisper inference. Parallel processing for transcription segments.
Jacob Ortony

Very neat project, and I can't help but to think that Whisper introduces complexity, time, and processing could be avoided by using Google Cloud services directly (if YouTube is the target). Using Vertex to perform automatic language detection also seems independently valuable.

yama

The multilingual transcription with native language support is a nice touch. I'm curious about the caching strategy - when processing videos in batches, does the cache handle different language combinations for the same video separately? For instance, if you transcribe to English first and later need Japanese output.

Mourad GHAFIRI

@yamamoto7 when you transcribe a video in english it will be cached in english when you ask to transcribe in Chinese it will be cached in Chinese, they are saved under folder « transcriptions/video_id/{language} »

I don’t use another llm to translate the orginal transcriptions as it will need more resources to run the mcp server and maybe it will be not accurate for segments!

The project is in MIT license, any contribution is highly welcome.

Thank you