Agent skill
use-local-whisper
Use when the user wants local voice transcription instead of OpenAI Whisper API. Switches to whisper.cpp running on Apple Silicon. WhatsApp only for now. Requires voice-transcription skill to be applied first.
Install this agent skill to your Project
npx add-skill https://github.com/qwibitai/nanoclaw/tree/main/.claude/skills/use-local-whisper
SKILL.md
Use Local Whisper
Switches voice transcription from OpenAI's Whisper API to local whisper.cpp. Runs entirely on-device — no API key, no network, no cost.
Channel support: Currently WhatsApp only. The transcription module (src/transcription.ts) uses Baileys types for audio download. Other channels (Telegram, Discord, etc.) would need their own audio-download logic before this skill can serve them.
Note: The Homebrew package is whisper-cpp, but the CLI binary it installs is whisper-cli.
Prerequisites
voice-transcriptionskill must be applied first (WhatsApp channel)- macOS with Apple Silicon (M1+) recommended
whisper-cppinstalled:brew install whisper-cpp(provides thewhisper-clibinary)ffmpeginstalled:brew install ffmpeg- A GGML model file downloaded to
data/models/
Phase 1: Pre-flight
Check if already applied
Check if src/transcription.ts already uses whisper-cli:
grep 'whisper-cli' src/transcription.ts && echo "Already applied" || echo "Not applied"
If already applied, skip to Phase 3 (Verify).
Check dependencies are installed
whisper-cli --help >/dev/null 2>&1 && echo "WHISPER_OK" || echo "WHISPER_MISSING"
ffmpeg -version >/dev/null 2>&1 && echo "FFMPEG_OK" || echo "FFMPEG_MISSING"
If missing, install via Homebrew:
brew install whisper-cpp ffmpeg
Check for model file
ls data/models/ggml-*.bin 2>/dev/null || echo "NO_MODEL"
If no model exists, download the base model (148MB, good balance of speed and accuracy):
mkdir -p data/models
curl -L -o data/models/ggml-base.bin "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"
For better accuracy at the cost of speed, use ggml-small.bin (466MB) or ggml-medium.bin (1.5GB).
Phase 2: Apply Code Changes
Ensure WhatsApp fork remote
git remote -v
If whatsapp is missing, add it:
git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git
Merge the skill branch
git fetch whatsapp skill/local-whisper
git merge whatsapp/skill/local-whisper || {
git checkout --theirs package-lock.json
git add package-lock.json
git merge --continue
}
This modifies src/transcription.ts to use the whisper-cli binary instead of the OpenAI API.
Validate
npm run build
Phase 3: Verify
Ensure launchd PATH includes Homebrew
The NanoClaw launchd service runs with a restricted PATH. whisper-cli and ffmpeg are in /opt/homebrew/bin/ (Apple Silicon) or /usr/local/bin/ (Intel), which may not be in the plist's PATH.
Check the current PATH:
grep -A1 'PATH' ~/Library/LaunchAgents/com.nanoclaw.plist
If /opt/homebrew/bin is missing, add it to the <string> value inside the PATH key in the plist. Then reload:
launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist
launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist
Build and restart
npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw
Test
Send a voice note in any registered group. The agent should receive it as [Voice: <transcript>].
Check logs
tail -f logs/nanoclaw.log | grep -i -E "voice|transcri|whisper"
Look for:
Transcribed voice message— successful transcriptionwhisper.cpp transcription failed— check model path, ffmpeg, or PATH
Configuration
Environment variables (optional, set in .env):
| Variable | Default | Description |
|---|---|---|
WHISPER_BIN |
whisper-cli |
Path to whisper.cpp binary |
WHISPER_MODEL |
data/models/ggml-base.bin |
Path to GGML model file |
Troubleshooting
"whisper.cpp transcription failed": Ensure both whisper-cli and ffmpeg are in PATH. The launchd service uses a restricted PATH — see Phase 3 above. Test manually:
ffmpeg -f lavfi -i anullsrc=r=16000:cl=mono -t 1 -f wav /tmp/test.wav -y
whisper-cli -m data/models/ggml-base.bin -f /tmp/test.wav --no-timestamps -nt
Transcription works in dev but not as service: The launchd plist PATH likely doesn't include /opt/homebrew/bin. See "Ensure launchd PATH includes Homebrew" in Phase 3.
Slow transcription: The base model processes ~30s of audio in <1s on M1+. If slower, check CPU usage — another process may be competing.
Wrong language: whisper.cpp auto-detects language. To force a language, you can set WHISPER_LANG and modify src/transcription.ts to pass -l $WHISPER_LANG.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
capabilities
Show what this NanoClaw instance can do — installed skills, available tools, and system info. Read-only. Use when the user asks what the bot can do, what's installed, or runs /capabilities.
status
Quick read-only health check — session context, workspace mounts, tool availability, and task snapshot. Use when the user asks for system status or runs /status.
slack-formatting
Format messages for Slack using mrkdwn syntax. Use when responding to Slack channels (folder starts with "slack_" or JID contains slack identifiers).
agent-browser
Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.
add-voice-transcription
Add voice message transcription to NanoClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.
add-whatsapp
Add WhatsApp as a channel. Can replace other channels entirely or run alongside them. Uses QR code or pairing code for authentication.
Didn't find tool you were looking for?