Published on June 24, 2025 12:30 AM GMT
I've been a heavy user of dictation, off and on, as my wrists havegotten better and worse. I've mostly used the built-in Mac andAndroid recognition: Mac isn't great, Android is pretty good, neitherhas improved much over the past ~5y despite large improvements in whatshould be possible. OpenAI has an open speech recognition model, whisper, and I wanted tohave a go at running it on my Mac.
It looks like for good local performance the best version is whisper.cpp, whichis a plain C/C++ implementation with support for Mac's ML hardware.To get this installed I needed to install XCode (not just the commandline tools, since I needed coremlc
) and then run:
$ sudo xcodebuild -license$ git clone https://github.com/ggerganov/whisper.cpp$ cd whisper.cpp$ python3.11 -m venv whisper_v3.11$ source whisper_v3.11/bin/activate$ pip install "numpy<2"$ pip install ane_transformers$ pip install openai-whisper$ pip install coremltools$ brew install sdl2$ sh ./models/download-ggml-model.sh large-v3-turbo$ PATH="$PATH:/Applications/Xcode.app/Contents/Developer/usr/bin" \ ./models/generate-coreml-model.sh large-v3-turbo$ cmake -B build -DWHISPER_COREML=1 -DWHISPER_SDL2=ON$ cmake --build build -j --config Release
Note that both older (3.10) and newer (3.13) Python versions gave compilation errors.
While I don't know if these are the ideal arguments, I've been using:
$ ~/code/whisper.cpp/build/bin/whisper-stream \ --capture 1 \ --model ~/code/whisper.cpp/models/ggml-large-v3-turbo.bin \ -t 8 --flash-attn --keep-context --keep 1000 \ --file output.txt
By default the output is quite repetitive. For example I dictated:
It looks like for good local performance, the best version to use iswhisper.cpp, which is a plain C/C++ implementation with support forMac's machine learning hardware. To get this installed, I needed toinstall Xcode (not just the command line tools since I neededcoremlc), and then run a whole bunch of commands.
The output was:
It looks like for good local performance the best versionto use is whisper.cpp, which is a plain C++
to use is whisper.cpp which is a plain C/C++ implementation withsupport for maximum
machine learning hardware. To get this insight into the
machine learning hardware. To get this installed, I needed toinstall Xcode, not just the computer
not just the command line tools since I needed core MLC.
not just the command line tools since I needed core MLC, andthen run a whole bunch of commands.
The fix for not enough LLM these days is often more LLM, so I usedClaude Sonnet 4 to clean it up with the prompt "Could you clean upthis Whisper transcript? The things that look like repetitionsaren't, it's just that whisper does a lot of 'restating' as it updatesit's view of what was said." This gave me:
It looks like for good local performance, the best version to use iswhisper.cpp, which is a plain C/C++ implementation with support formaximum machine learning hardware. To get this installed, I needed toinstall Xcode, not just the command line tools since I needed Core ML,and then run a whole bunch of commands.
This is very good! The only two things it seems to have gotten'wrong' are "maximum" for "Mac's" and commas where I was thinkingparens. And neither of these are very wrong: "Mac's" comes outverbally as "max" and "maximum" is also plausible in context; thecommas read fine, perhaps better than my parens.
I set this up a couple weeks ago, and have generally been finding thisquite useful.
Comment via: facebook, mastodon, bluesky, substack
Discuss