Agent skills
use-local-whisper

Agent skill

use-local-whisper

Use when the user wants local voice transcription instead of OpenAI Whisper API. Switches to whisper.cpp running on Apple Silicon. WhatsApp only for now. Requires voice-transcription skill to be applied first.

View SKILL.md on GitHub Repository

Stars 27,176

Forks 11,781

Install this agent skill to your Project

npx add-skill https://github.com/qwibitai/nanoclaw/tree/main/.claude/skills/use-local-whisper

SKILL.md

Use Local Whisper

Switches voice transcription from OpenAI's Whisper API to local whisper.cpp. Runs entirely on-device — no API key, no network, no cost.

Channel support: Currently WhatsApp only. The transcription module (src/transcription.ts) uses Baileys types for audio download. Other channels (Telegram, Discord, etc.) would need their own audio-download logic before this skill can serve them.

Note: The Homebrew package is whisper-cpp, but the CLI binary it installs is whisper-cli.

Prerequisites

voice-transcription skill must be applied first (WhatsApp channel)
macOS with Apple Silicon (M1+) recommended
whisper-cpp installed: brew install whisper-cpp (provides the whisper-cli binary)
ffmpeg installed: brew install ffmpeg
A GGML model file downloaded to data/models/

Phase 1: Pre-flight

Check if already applied

Check if src/transcription.ts already uses whisper-cli:

bash

grep 'whisper-cli' src/transcription.ts && echo "Already applied" || echo "Not applied"

If already applied, skip to Phase 3 (Verify).

Check dependencies are installed

bash

whisper-cli --help >/dev/null 2>&1 && echo "WHISPER_OK" || echo "WHISPER_MISSING"
ffmpeg -version >/dev/null 2>&1 && echo "FFMPEG_OK" || echo "FFMPEG_MISSING"

If missing, install via Homebrew:

bash

brew install whisper-cpp ffmpeg

Check for model file

bash

ls data/models/ggml-*.bin 2>/dev/null || echo "NO_MODEL"

If no model exists, download the base model (148MB, good balance of speed and accuracy):

bash

mkdir -p data/models
curl -L -o data/models/ggml-base.bin "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"

For better accuracy at the cost of speed, use ggml-small.bin (466MB) or ggml-medium.bin (1.5GB).

Phase 2: Apply Code Changes

Ensure WhatsApp fork remote

bash

git remote -v

If whatsapp is missing, add it:

bash

git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git

Merge the skill branch

bash

git fetch whatsapp skill/local-whisper
git merge whatsapp/skill/local-whisper || {
  git checkout --theirs package-lock.json
  git add package-lock.json
  git merge --continue
}

This modifies src/transcription.ts to use the whisper-cli binary instead of the OpenAI API.

Validate

bash

npm run build

Phase 3: Verify

Ensure launchd PATH includes Homebrew

The NanoClaw launchd service runs with a restricted PATH. whisper-cli and ffmpeg are in /opt/homebrew/bin/ (Apple Silicon) or /usr/local/bin/ (Intel), which may not be in the plist's PATH.

Check the current PATH:

bash

grep -A1 'PATH' ~/Library/LaunchAgents/com.nanoclaw.plist

If /opt/homebrew/bin is missing, add it to the <string> value inside the PATH key in the plist. Then reload:

bash

launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist
launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist

Build and restart

bash

npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw

Test

Send a voice note in any registered group. The agent should receive it as [Voice: <transcript>].

Check logs

bash

tail -f logs/nanoclaw.log | grep -i -E "voice|transcri|whisper"

Look for:

Transcribed voice message — successful transcription
whisper.cpp transcription failed — check model path, ffmpeg, or PATH

Configuration

Environment variables (optional, set in .env):

Variable	Default	Description
`WHISPER_BIN`	`whisper-cli`	Path to whisper.cpp binary
`WHISPER_MODEL`	`data/models/ggml-base.bin`	Path to GGML model file

Troubleshooting

"whisper.cpp transcription failed": Ensure both whisper-cli and ffmpeg are in PATH. The launchd service uses a restricted PATH — see Phase 3 above. Test manually:

bash

ffmpeg -f lavfi -i anullsrc=r=16000:cl=mono -t 1 -f wav /tmp/test.wav -y
whisper-cli -m data/models/ggml-base.bin -f /tmp/test.wav --no-timestamps -nt

Transcription works in dev but not as service: The launchd plist PATH likely doesn't include /opt/homebrew/bin. See "Ensure launchd PATH includes Homebrew" in Phase 3.

Slow transcription: The base model processes ~30s of audio in <1s on M1+. If slower, check CPU usage — another process may be competing.

Wrong language: whisper.cpp auto-detects language. To force a language, you can set WHISPER_LANG and modify src/transcription.ts to pass -l $WHISPER_LANG.

Maintainer

qwibitai Core maintainer

Source details

Full Name: qwibitai/nanoclaw
Branch: main
Path in repo: .claude/skills/use-local-whisper
License: MIT License
Topics: claude-code ai-agents openclaw claude-skills ai-assistant

Featured Tools

Join Our Newsletter

Add WhatsApp as a channel. Can replace other channels entirely or run alongside them. Uses QR code or pairing code for authentication.

27,176 11,781

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Use Local Whisper

Prerequisites

Phase 1: Pre-flight

Check if already applied

Check dependencies are installed

Check for model file

Phase 2: Apply Code Changes

Ensure WhatsApp fork remote

Merge the skill branch

Validate

Phase 3: Verify

Ensure launchd PATH includes Homebrew

Build and restart

Test

Check logs

Configuration

Troubleshooting

Recommended Agent Skills

capabilities

status

slack-formatting

agent-browser

add-voice-transcription

add-whatsapp