少点错误 12小时前
Local Speech Recognition with Whisper
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文作者分享了在Mac上使用whisper.cpp进行语音转录的经验。他详细介绍了安装whisper.cpp的步骤,包括安装必要的软件和依赖,以及编译和运行代码。作者还分享了使用过程中遇到的问题和解决方案,例如输出重复的问题,以及如何使用Claude Sonnet 4来优化转录结果。最终,作者对whisper.cpp的转录效果表示满意,并认为它是一个有用的工具。

🎙️作者尝试在Mac上使用OpenAI的whisper模型进行语音转录,并选择了whisper.cpp,一个支持Mac的ML硬件的C/C++实现。

🛠️为了安装whisper.cpp,作者详细记录了安装XCode、使用git克隆代码、创建Python虚拟环境、安装依赖包(包括numpy、ane_transformers、openai-whisper、coremltools)以及编译的步骤。

⚙️作者分享了运行whisper-stream的命令,并提到默认输出可能会有重复。为了解决这个问题,他使用了Claude Sonnet 4来清理转录文本,获得了更好的结果。

✅经过优化,作者对whisper.cpp的转录效果表示满意,认为其在Mac上的表现良好,并指出它在“最大”和标点符号上存在一些小瑕疵,但整体上实用。

Published on June 24, 2025 12:30 AM GMT

I've been a heavy user of dictation, off and on, as my wrists havegotten better and worse. I've mostly used the built-in Mac andAndroid recognition: Mac isn't great, Android is pretty good, neitherhas improved much over the past ~5y despite large improvements in whatshould be possible. OpenAI has an open speech recognition model, whisper, and I wanted tohave a go at running it on my Mac.

It looks like for good local performance the best version is whisper.cpp, whichis a plain C/C++ implementation with support for Mac's ML hardware.To get this installed I needed to install XCode (not just the commandline tools, since I needed coremlc) and then run:

$ sudo xcodebuild -license$ git clone https://github.com/ggerganov/whisper.cpp$ cd whisper.cpp$ python3.11 -m venv whisper_v3.11$ source whisper_v3.11/bin/activate$ pip install "numpy<2"$ pip install ane_transformers$ pip install openai-whisper$ pip install coremltools$ brew install sdl2$ sh ./models/download-ggml-model.sh large-v3-turbo$ PATH="$PATH:/Applications/Xcode.app/Contents/Developer/usr/bin" \    ./models/generate-coreml-model.sh large-v3-turbo$ cmake -B build -DWHISPER_COREML=1 -DWHISPER_SDL2=ON$ cmake --build build -j --config Release

Note that both older (3.10) and newer (3.13) Python versions gave compilation errors.

While I don't know if these are the ideal arguments, I've been using:

$ ~/code/whisper.cpp/build/bin/whisper-stream \    --capture 1 \    --model ~/code/whisper.cpp/models/ggml-large-v3-turbo.bin \    -t 8 --flash-attn --keep-context --keep 1000 \    --file output.txt

By default the output is quite repetitive. For example I dictated:

It looks like for good local performance, the best version to use iswhisper.cpp, which is a plain C/C++ implementation with support forMac's machine learning hardware. To get this installed, I needed toinstall Xcode (not just the command line tools since I neededcoremlc), and then run a whole bunch of commands.

The output was:

It looks like for good local performance the best versionto use is whisper.cpp, which is a plain C++
to use is whisper.cpp which is a plain C/C++ implementation withsupport for maximum
machine learning hardware. To get this insight into the
machine learning hardware. To get this installed, I needed toinstall Xcode, not just the computer
not just the command line tools since I needed core MLC.
not just the command line tools since I needed core MLC, andthen run a whole bunch of commands.

The fix for not enough LLM these days is often more LLM, so I usedClaude Sonnet 4 to clean it up with the prompt "Could you clean upthis Whisper transcript? The things that look like repetitionsaren't, it's just that whisper does a lot of 'restating' as it updatesit's view of what was said." This gave me:

It looks like for good local performance, the best version to use iswhisper.cpp, which is a plain C/C++ implementation with support formaximum machine learning hardware. To get this installed, I needed toinstall Xcode, not just the command line tools since I needed Core ML,and then run a whole bunch of commands.

This is very good! The only two things it seems to have gotten'wrong' are "maximum" for "Mac's" and commas where I was thinkingparens. And neither of these are very wrong: "Mac's" comes outverbally as "max" and "maximum" is also plausible in context; thecommas read fine, perhaps better than my parens.

I set this up a couple weeks ago, and have generally been finding thisquite useful.

Comment via: facebook, mastodon, bluesky, substack



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

whisper.cpp 语音转录 Mac OpenAI Claude Sonnet 4
相关文章