Agent skill
add-voice-transcription
Add voice message transcription to NanoClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.
Install this agent skill to your Project
npx add-skill https://github.com/qwibitai/nanoclaw/tree/main/.claude/skills/add-voice-transcription
SKILL.md
Add Voice Transcription
This skill adds automatic voice message transcription to NanoClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as [Voice: <transcript>].
Phase 1: Pre-flight
Check if already applied
Check if src/transcription.ts exists. If it does, skip to Phase 3 (Configure). The code changes are already in place.
Ask the user
Use AskUserQuestion to collect information:
AskUserQuestion: Do you have an OpenAI API key for Whisper transcription?
If yes, collect it now. If no, direct them to create one at https://platform.openai.com/api-keys.
Phase 2: Apply Code Changes
Prerequisite: WhatsApp must be installed first (skill/whatsapp merged). This skill modifies WhatsApp channel files.
Ensure WhatsApp fork remote
git remote -v
If whatsapp is missing, add it:
git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git
Merge the skill branch
git fetch whatsapp skill/voice-transcription
git merge whatsapp/skill/voice-transcription || {
git checkout --theirs package-lock.json
git add package-lock.json
git merge --continue
}
This merges in:
src/transcription.ts(voice transcription module using OpenAI Whisper)- Voice handling in
src/channels/whatsapp.ts(isVoiceMessage check, transcribeAudioMessage call) - Transcription tests in
src/channels/whatsapp.test.ts openainpm dependency inpackage.jsonOPENAI_API_KEYin.env.example
If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
Validate code changes
npm install --legacy-peer-deps
npm run build
npx vitest run src/channels/whatsapp.test.ts
All tests must pass and build must be clean before proceeding.
Phase 3: Configure
Get OpenAI API key (if needed)
If the user doesn't have an API key:
I need you to create an OpenAI API key:
- Go to https://platform.openai.com/api-keys
- Click "Create new secret key"
- Give it a name (e.g., "NanoClaw Transcription")
- Copy the key (starts with
sk-)Cost:
$0.006 per minute of audio ($0.003 per typical 30-second voice note)
Wait for the user to provide the key.
Add to environment
Add to .env:
OPENAI_API_KEY=<their-key>
Sync to container environment:
mkdir -p data/env && cp .env data/env/env
The container reads environment from data/env/env, not .env directly.
Build and restart
npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS
# Linux: systemctl --user restart nanoclaw
Phase 4: Verify
Test with a voice note
Tell the user:
Send a voice note in any registered WhatsApp chat. The agent should receive it as
[Voice: <transcript>]and respond to its content.
Check logs if needed
tail -f logs/nanoclaw.log | grep -i voice
Look for:
Transcribed voice message— successful transcription with character countOPENAI_API_KEY not set— key missing from.envOpenAI transcription failed— API error (check key validity, billing)Failed to download audio message— media download issue
Troubleshooting
Voice notes show "[Voice Message - transcription unavailable]"
- Check
OPENAI_API_KEYis set in.envAND synced todata/env/env - Verify key works:
curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200 - Check OpenAI billing — Whisper requires a funded account
Voice notes show "[Voice Message - transcription failed]"
Check logs for the specific error. Common causes:
- Network timeout — transient, will work on next message
- Invalid API key — regenerate at https://platform.openai.com/api-keys
- Rate limiting — wait and retry
Agent doesn't respond to voice notes
Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
capabilities
Show what this NanoClaw instance can do — installed skills, available tools, and system info. Read-only. Use when the user asks what the bot can do, what's installed, or runs /capabilities.
status
Quick read-only health check — session context, workspace mounts, tool availability, and task snapshot. Use when the user asks for system status or runs /status.
slack-formatting
Format messages for Slack using mrkdwn syntax. Use when responding to Slack channels (folder starts with "slack_" or JID contains slack identifiers).
agent-browser
Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.
add-whatsapp
Add WhatsApp as a channel. Can replace other channels entirely or run alongside them. Uses QR code or pairing code for authentication.
add-reactions
Add WhatsApp emoji reaction support — receive, send, store, and search reactions.
Didn't find tool you were looking for?