Agent skill

add-image-vision

Add image vision to NanoClaw agents. Resizes and processes WhatsApp image attachments, then sends them to Claude as multimodal content blocks.

View SKILL.md on GitHub Repository

Stars 27,176

Forks 11,781

Install this agent skill to your Project

npx add-skill https://github.com/qwibitai/nanoclaw/tree/main/.claude/skills/add-image-vision

SKILL.md

Image Vision Skill

Adds the ability for NanoClaw agents to see and understand images sent via WhatsApp. Images are downloaded, resized with sharp, saved to the group workspace, and passed to the agent as base64-encoded multimodal content blocks.

Phase 1: Pre-flight

Check if src/image.ts exists — skip to Phase 3 if already applied
Confirm sharp is installable (native bindings require build tools)

Prerequisite: WhatsApp must be installed first (skill/whatsapp merged). This skill modifies WhatsApp channel files.

Phase 2: Apply Code Changes

Ensure WhatsApp fork remote

bash

git remote -v

If whatsapp is missing, add it:

bash

git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git

Merge the skill branch

bash

git fetch whatsapp skill/image-vision
git merge whatsapp/skill/image-vision || {
  git checkout --theirs package-lock.json
  git add package-lock.json
  git merge --continue
}

This merges in:

src/image.ts (image download, resize via sharp, base64 encoding)
src/image.test.ts (8 unit tests)
Image attachment handling in src/channels/whatsapp.ts
Image passing to agent in src/index.ts and src/container-runner.ts
Image content block support in container/agent-runner/src/index.ts
sharp npm dependency in package.json

If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.

Validate code changes

bash

npm install
npm run build
npx vitest run src/image.test.ts

All tests must pass and build must be clean before proceeding.

Phase 3: Configure

Rebuild the container (agent-runner changes need a rebuild):
bash
```
./container/build.sh
```

Sync agent-runner source to group caches:

bash

for dir in data/sessions/*/agent-runner-src/; do
  cp container/agent-runner/src/*.ts "$dir"
done

Restart the service:

bash

launchctl kickstart -k gui/$(id -u)/com.nanoclaw

Phase 4: Verify

Send an image in a registered WhatsApp group
Check the agent responds with understanding of the image content
Check logs for "Processed image attachment":
bash
```
tail -50 groups/*/logs/container-*.log
```

Troubleshooting

"Image - download failed": Check WhatsApp connection stability. The download may timeout on slow connections.
"Image - processing failed": Sharp may not be installed correctly. Run npm ls sharp to verify.
Agent doesn't mention image content: Check container logs for "Loaded image" messages. If missing, ensure agent-runner source was synced to group caches.

Maintainer

qwibitai Core maintainer

Source details

Full Name: qwibitai/nanoclaw
Branch: main
Path in repo: .claude/skills/add-image-vision
License: MIT License
Topics: claude-code ai-agents openclaw claude-skills ai-assistant

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

qwibitai/nanoclaw

capabilities

Show what this NanoClaw instance can do — installed skills, available tools, and system info. Read-only. Use when the user asks what the bot can do, what's installed, or runs /capabilities.

27,176 11,781

Explore

qwibitai/nanoclaw

status

Quick read-only health check — session context, workspace mounts, tool availability, and task snapshot. Use when the user asks for system status or runs /status.

27,176 11,781

Explore

qwibitai/nanoclaw

slack-formatting

Format messages for Slack using mrkdwn syntax. Use when responding to Slack channels (folder starts with "slack_" or JID contains slack identifiers).

27,176 11,781

Explore

qwibitai/nanoclaw

agent-browser

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

27,176 11,781

Explore

qwibitai/nanoclaw

add-voice-transcription

Add voice message transcription to NanoClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.

27,176 11,781

Explore

qwibitai/nanoclaw

add-whatsapp

Add WhatsApp as a channel. Can replace other channels entirely or run alongside them. Uses QR code or pairing code for authentication.

27,176 11,781

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Image Vision Skill

Phase 1: Pre-flight

Phase 2: Apply Code Changes

Ensure WhatsApp fork remote

Merge the skill branch

Validate code changes

Phase 3: Configure

Phase 4: Verify

Troubleshooting

Recommended Agent Skills

capabilities

status

slack-formatting

agent-browser

add-voice-transcription

add-whatsapp