vox

Vox: local speech-to-text for macOS

Vox turns speech into text using models that run on your machine. Dictation, file transcription, and cleanup all stay local. No audio leaves your device. No subscription, no account.

Hold Alt+Space (or any hotkey you specify) in any app, speak, release. Vox can paste the raw transcript or clean it up into an email, list, or translation before it lands at your cursor.

$5, one-time purchase. All future updates included.

BuymacOS
$ hold alt+space, say something, release
listening...
> "write an email to sarah about the deploy tomorrow morning"
pasted to cursor

Features

System-wide hotkey

Alt+Space (or any hotkey you specify) works in any app. Hold to record or use toggle mode. Vox can copy the transcript, auto-paste it, and stay out of your way.

Multiple models

Whisper gives you 99 languages, auto-detection, and translation to English. Parakeet is fast for English and a smaller set of European languages. Download, delete, and switch models inside the app.

Local AI cleanup

Vox can remove filler, resolve spoken corrections, format emails and bullet lists, or translate before output. It all runs on your machine. No API key, no cloud AI step.

File transcription

Drag in MP3, WAV, M4A, OGG, FLAC, WebM, or MP4 files. Vox queues them, shows progress, and gives you the same local processing as live dictation.

Custom vocabulary

Teach Vox names, acronyms, and terms you use every day. Keep product names, email addresses, and technical words from getting mangled.

Works offline

Once your models are installed, dictation, file transcription, translation, and cleanup keep working without a connection.

Privacy

I know every app says "we care about your privacy." So here's the actual architecture instead of a trust-me statement:

  • Mic audio and file transcription stay on your machine. Whisper, Parakeet, and the local AI cleanup step all run locally.
  • There is no cloud speech-to-text or cloud AI step. Audio and transcripts are not sent to an external API.
  • Network use is limited to model downloads, license activation, occasional online validation, and app updates. No analytics, no telemetry, no crash reports.
  • History, vocabulary, and settings stay on your device.

That is the whole point.

System requirements

macOS 12+ on Apple Silicon or Intel. Universal .dmg.

Needs a modern multi-core CPU, no GPU required, 2 GB free RAM, and enough disk for the app plus whichever speech model you pick (75 MB to 3 GB).

Built-in mics, USB mics, Bluetooth mics, and audio interfaces all work. macOS will ask for Accessibility permission if you want the global hotkey and auto-paste.

Setup

No account to create. No API keys to find. First launch walks you through language, model download, hotkey, and the basics.

  1. Buy Vox and activate once. Pay $5, get your Lemon receipt with the download and license key, then paste the key and purchase email into Vox once.
  2. Grab a model. Pick Whisper for language coverage or Parakeet for fast English. Download inside the app and switch later if you want.
  3. Hold Alt+Space (or your custom hotkey) and talk. Vox can paste the raw transcript or clean it up locally before it lands in the active app.

FAQ

$5, works offline, updates forever. Buy