← All work

Open-source CLI

Transcripter

The URL-to-text tool that should already exist. Paste a link, get the words as a .txt, fully local on your own machine.

Stack
Python, MLX, yt-dlp
Runs
Locally, no API keys
License
Open source
A real session: one command turns a NASA video into a transcript file, transcribed locally with Whisper.
faster than real time
~5x
platforms supported
4
API keys required
0

The problem

I have a bookmark graveyard. A Twitter pile I will never scroll back through, an Instagram saved folder three hundred reels deep, a YouTube watch-later that is mostly where videos go to quietly die. I save each one for a reason: a technique, an argument, a half-decent recipe.

And more and more, the thing I want to do with a saved post is hand it to an AI agent. But you can’t paste a TikTok into a chat. The agent can’t watch the video or read the auto-captions, so the one bit I saved it for stays locked in a format nothing else can read. So: paste a link, get the words. That’s the whole tool.

It glues proven parts together

No reinventing. yt-dlp downloads the audio and pulls the caption or post text from over 1800 sites. youtube-transcript-api grabs real YouTube captions with no download when they exist.

For everything else, mlx-whisper transcribes the speech locally on the Apple Silicon GPU, with faster-whisper as the CPU fallback elsewhere. Nothing is uploaded, nothing costs money, and no key goes in a config file. The downloaded audio lives in a temp folder for one transcription, then it’s gone.

Diagram showing how transcripter routes each platform: YouTube to real captions or audio, TikTok and Instagram and X video to audio plus Whisper, and X text posts to the public embed endpoint

Each platform takes the cheapest route that works: real captions when they exist, local Whisper when they don’t.

Example of the .txt file transcripter produces, showing the title, URL, platform, uploader, source provenance, the caption, and the transcript body

The output is a plain .txt: metadata, provenance, caption, and the transcript, ready to drop into an agent.

One command, a clean file

Run transcripter "<url>" and it writes transcripts/<platform>-<slug>.txt. Add -o to open it when done, -m medium when accuracy matters more than speed, or --whisper to force a local transcription even when captions already exist.

Every file records where its text came from, so you always know whether you got real captions or a Whisper transcription. The graveyard turns back into a library.

Measured on a real run

Forcing local transcription on a 143-second NASA video, the whole run, download plus Whisper on the GPU, finished end to end in 28.7 seconds. That is roughly five times faster than the audio plays, on the default small model with the model already cached.

Python mlx-whisper faster-whisper yt-dlp youtube-transcript-api Apple Silicon GPU
← All work Next project

Quill →