Gemini Vision Skill

Generate AI images and videos by invoking Gemini CLI's vision extension. This skill provides access to:

Nano Banana (gemini-2.5-flash-image) - Image generation and transformation
Veo 3 (veo-3.0-generate-001) - Video generation from images
Webcam capture - Live frame capture for AI processing

Prerequisites

Gemini CLI: Must be installed and configured
Vision Extension: Install via:
bash
```
gemini extensions install vision
```
API Key: Set GEMINI_API_KEY environment variable

When to Use This Skill

Use this skill when the user asks to:

Generate images from text prompts
Transform or reimagine existing images
Create AI-generated videos from images
Capture webcam frames for AI processing
Create "nano banana" style images
Generate Veo videos

Available Operations

1. Image Generation (Nano Banana)

Generate images from text prompts or transform existing images.

Command Pattern:

bash

gemini -p "/vision:banana prompt=\"Your creative prompt here\" n=1 out_dir=./output"

Parameters:

Parameter	Default	Description
`prompt`	Required	Creative description of desired image
`n`	1	Number of images to generate
`out_dir`	"."	Output directory for images
`model`	gemini-2.5-flash-image	Image generation model

Models Available:

gemini-2.5-flash-image (default, recommended)
gemini-3-pro-image-preview (newer, experimental)

2. Video Generation (Veo 3)

Generate short videos from images or prompts.

Command Pattern:

bash

gemini -p "/vision:veo prompt=\"Animate this scene\" aspect_ratio=16:9 out_dir=./output"

Parameters:

Parameter	Default	Description
`prompt`	Required	Animation/motion description
`aspect_ratio`	"16:9"	Video aspect ratio (16:9 or 9:16)
`resolution`	auto	Video resolution (e.g., "1080p")
`negative_prompt`	""	What to avoid in video
`veo_model`	veo-3.0-generate-001	Video model

3. Webcam Capture + AI

Capture from webcam and process with AI.

bash

# Start camera
gemini -p "/vision:start"

# Capture and transform
gemini -p "/vision:banana prompt=\"Transform into oil painting\""

# Stop camera
gemini -p "/vision:stop"

Instructions for Claude

When the user requests image or video generation:

Determine the operation type:
- Text-to-image → Use /vision:banana
- Image transformation → Use /vision:banana with input image
- Image-to-video → Use /vision:veo
- Webcam capture → Use /vision:capture or /vision:banana

Construct the Gemini CLI command:

bash

gemini -p "/vision:<command> prompt=\"<user prompt>\" <params>"

Execute via Bash tool:
- Run the command
- Capture the output paths
- Report success and file locations to user
Handle output:
- Images saved as banana_*.png or banana_*.jpg
- Videos saved as veo_*.mp4
- Return the file paths to the user

Example Workflows

Generate a Single Image

User: "Generate an image of a cyberpunk city at sunset"

Action:

bash

gemini -p "/vision:banana prompt=\"A sprawling cyberpunk city at sunset, neon lights reflecting off wet streets, flying cars in the distance, highly detailed, cinematic\" n=1 out_dir=."

Transform an Image

User: "Make this photo look like a Studio Ghibli scene" (with image attached)

Action:

Save the attached image to a temp location
Run:

bash

gemini -p "/vision:banana prompt=\"Transform into Studio Ghibli animation style, soft colors, whimsical atmosphere\" input_paths=['/path/to/image.jpg']"

Generate a Video

User: "Create a video of ocean waves"

Action:

bash

gemini -p "/vision:veo prompt=\"Calm ocean waves gently rolling onto a sandy beach, golden hour lighting, peaceful atmosphere\" aspect_ratio=16:9"

Webcam to Art

User: "Take a photo of me and make it look like a Renaissance painting"

Action:

bash

# Capture and transform in one step
gemini -p "/vision:banana prompt=\"Transform into a Renaissance oil painting, dramatic lighting, classical composition\""

Output Format

Always report results in this format:

## Generated Content

**Type:** Image/Video
**Files:**
- `/path/to/banana_20251227_123456_000.png`

**Prompt Used:** [the prompt]
**Model:** gemini-2.5-flash-image

To view: Open the file path above or use `open /path/to/file`

Error Handling

Common issues and solutions:

Error	Solution
"Camera not found"	Run `/vision:devices` to list cameras
"GEMINI_API_KEY not set"	Export the API key in environment
"Model not available"	Check model ID spelling
"Generation failed"	Try simpler prompt or different model

Script Usage (Alternative)

For programmatic access, use the helper script:

bash

python ~/.claude/skills/gemini-vision/scripts/gemini_vision.py \
  --operation banana \
  --prompt "Your prompt here" \
  --output-dir ./output \
  --count 1

Options:

--operation: banana, veo, capture, devices
--prompt: The generation prompt
--output-dir: Where to save files
--count: Number of images (for banana)
--aspect-ratio: For veo (16:9 or 9:16)
--model: Override default model

Search AI Tools

gemini-vision

Install this agent skill to your Project

SKILL.md

Gemini Vision Skill

Prerequisites

When to Use This Skill

Available Operations

1. Image Generation (Nano Banana)

2. Video Generation (Veo 3)

3. Webcam Capture + AI

Instructions for Claude

Example Workflows

Generate a Single Image

Transform an Image

Generate a Video

Webcam to Art

Output Format

Error Handling

Script Usage (Alternative)