Agent skill
add-image-vision
Add image vision to NanoClaw agents. Resizes and processes WhatsApp image attachments, then sends them to Claude as multimodal content blocks.
Install this agent skill to your Project
npx add-skill https://github.com/qwibitai/nanoclaw/tree/main/.claude/skills/add-image-vision
SKILL.md
Image Vision Skill
Adds the ability for NanoClaw agents to see and understand images sent via WhatsApp. Images are downloaded, resized with sharp, saved to the group workspace, and passed to the agent as base64-encoded multimodal content blocks.
Phase 1: Pre-flight
- Check if
src/image.tsexists — skip to Phase 3 if already applied - Confirm
sharpis installable (native bindings require build tools)
Prerequisite: WhatsApp must be installed first (skill/whatsapp merged). This skill modifies WhatsApp channel files.
Phase 2: Apply Code Changes
Ensure WhatsApp fork remote
git remote -v
If whatsapp is missing, add it:
git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git
Merge the skill branch
git fetch whatsapp skill/image-vision
git merge whatsapp/skill/image-vision || {
git checkout --theirs package-lock.json
git add package-lock.json
git merge --continue
}
This merges in:
src/image.ts(image download, resize via sharp, base64 encoding)src/image.test.ts(8 unit tests)- Image attachment handling in
src/channels/whatsapp.ts - Image passing to agent in
src/index.tsandsrc/container-runner.ts - Image content block support in
container/agent-runner/src/index.ts sharpnpm dependency inpackage.json
If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
Validate code changes
npm install
npm run build
npx vitest run src/image.test.ts
All tests must pass and build must be clean before proceeding.
Phase 3: Configure
-
Rebuild the container (agent-runner changes need a rebuild):
bash./container/build.sh -
Sync agent-runner source to group caches:
bashfor dir in data/sessions/*/agent-runner-src/; do cp container/agent-runner/src/*.ts "$dir" done -
Restart the service:
bashlaunchctl kickstart -k gui/$(id -u)/com.nanoclaw
Phase 4: Verify
- Send an image in a registered WhatsApp group
- Check the agent responds with understanding of the image content
- Check logs for "Processed image attachment":
bash
tail -50 groups/*/logs/container-*.log
Troubleshooting
- "Image - download failed": Check WhatsApp connection stability. The download may timeout on slow connections.
- "Image - processing failed": Sharp may not be installed correctly. Run
npm ls sharpto verify. - Agent doesn't mention image content: Check container logs for "Loaded image" messages. If missing, ensure agent-runner source was synced to group caches.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
capabilities
Show what this NanoClaw instance can do — installed skills, available tools, and system info. Read-only. Use when the user asks what the bot can do, what's installed, or runs /capabilities.
status
Quick read-only health check — session context, workspace mounts, tool availability, and task snapshot. Use when the user asks for system status or runs /status.
slack-formatting
Format messages for Slack using mrkdwn syntax. Use when responding to Slack channels (folder starts with "slack_" or JID contains slack identifiers).
agent-browser
Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.
add-voice-transcription
Add voice message transcription to NanoClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.
add-whatsapp
Add WhatsApp as a channel. Can replace other channels entirely or run alongside them. Uses QR code or pairing code for authentication.
Didn't find tool you were looking for?