Open-source CLI
Transcripter
The URL-to-text tool that should already exist. Paste a link, get the words as a .txt, fully local on your own machine.
- faster than real time
- ~5x
- platforms supported
- 4
- API keys required
- 0
The problem
I have a bookmark graveyard. A Twitter pile I will never scroll back through, an Instagram saved folder three hundred reels deep, a YouTube watch-later that is mostly where videos go to quietly die. I save each one for a reason: a technique, an argument, a half-decent recipe.
And more and more, the thing I want to do with a saved post is hand it to an AI agent. But you can’t paste a TikTok into a chat. The agent can’t watch the video or read the auto-captions, so the one bit I saved it for stays locked in a format nothing else can read. So: paste a link, get the words. That’s the whole tool.
It glues proven parts together
No reinventing. yt-dlp downloads the audio and pulls the caption or post text from over 1800 sites. youtube-transcript-api grabs real YouTube captions with no download when they exist.
For everything else, mlx-whisper transcribes the speech locally on the Apple Silicon GPU, with faster-whisper as the CPU fallback elsewhere. Nothing is uploaded, nothing costs money, and no key goes in a config file. The downloaded audio lives in a temp folder for one transcription, then it’s gone.
Each platform takes the cheapest route that works: real captions when they exist, local Whisper when they don’t.
The output is a plain .txt: metadata, provenance, caption, and the transcript, ready to drop into an agent.
One command, a clean file
Run transcripter "<url>" and it writes transcripts/<platform>-<slug>.txt. Add -o to open it when done, -m medium when accuracy matters more than speed, or --whisper to force a local transcription even when captions already exist.
Every file records where its text came from, so you always know whether you got real captions or a Whisper transcription. The graveyard turns back into a library.
Measured on a real run
Forcing local transcription on a 143-second NASA video, the whole run, download plus Whisper on the GPU, finished end to end in 28.7 seconds. That is roughly five times faster than the audio plays, on the default small model with the model already cached.