Agent skill
image-generation
Generate and edit images using Gemini Flash Image, and generate videos using Veo. Supports text-to-image, image editing, text-to-video, and image-to-video.
Install this agent skill to your Project
npx add-skill https://github.com/sonichi/sutando/tree/main/skills/image-generation
SKILL.md
Media Generation
Generate images and videos using Gemini APIs.
Image Generation (Gemini Flash Image)
- Text-to-image: Generate images from text descriptions
- Image editing: Modify existing images with natural language
- Background replacement: Change or enhance backgrounds
- Hero/banner creation: Create branded images with text overlays
- Style transfer: Apply artistic styles to photos
Video Generation (Veo)
- Text-to-video: Generate video clips from text prompts
- Image-to-video: Animate a reference image with a prompt
When to Use
- "Generate a hero image for my project"
- "Create a short video of a sunset timelapse"
- "Edit this photo to remove the background"
- "Make a video from this image"
- "Generate a logo with a dark theme"
Usage
# Text-to-image
python3 "$SKILL_DIR/scripts/generate.py" --prompt "A futuristic city skyline at night"
# Edit an existing image
python3 "$SKILL_DIR/scripts/generate.py" --input photo.jpg --prompt "Add dramatic clouds"
# Specify output path
python3 "$SKILL_DIR/scripts/generate.py" --prompt "A cute robot mascot" --output mascot.png
# Text-to-video
python3 "$SKILL_DIR/scripts/generate.py" --video --prompt "A timelapse of a city at sunset"
# Video with portrait aspect ratio
python3 "$SKILL_DIR/scripts/generate.py" --video --prompt "Ocean waves" --aspect 9:16
# Image-to-video (animate a reference image)
python3 "$SKILL_DIR/scripts/generate.py" --video --input scene.jpg --prompt "Animate this scene with gentle wind"
# Specify output
python3 "$SKILL_DIR/scripts/generate.py" --video --prompt "Dancing robot" --output robot.mp4
Options
| Flag | Description | Default |
|---|---|---|
--prompt |
Text prompt describing what to generate | (required) |
--input |
Input image path(s) for editing/reference | None |
--output |
Output file path | generated-{timestamp}.png or .mp4 |
--model |
Gemini model to use | gemini-2.5-flash-image / veo-3.1-generate-preview |
--video |
Generate video instead of image | false |
--aspect |
Video aspect ratio | 16:9 |
--quality |
JPEG quality (1-100, images only) | 90 |
Requirements
google-genaiPython package (pip3 install google-genai)GEMINI_API_KEYin.envor environment- Pillow (
pip3 install Pillow) — for image generation/editing
Notes
- Video generation takes 1-3 minutes (polling every 10s)
- Generated videos are stored on Google servers for 2 days
- Gemini may refuse some prompts (people's faces, copyrighted characters, etc.)
- For image editing, be explicit: "keep the subject unchanged, only modify the background"
- Image output format inferred from extension (.jpg, .png, .webp)
- Maximum input image size: ~20MB
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
x-twitter
Post tweets, search, read mentions, and check engagement on X (Twitter) via API v2.
schedule-crons
claude-codex
Use the local Codex CLI from Claude Code with the user's existing Codex login or API key. Use for Codex reviews, second-opinion analysis, implementation delegation, or non-interactive Codex runs in the current workspace.
phone-conversation
Make conversational phone calls and join Zoom meetings via Twilio + Gemini. Multi-turn AI conversations on the phone on behalf of the user.
screen-record
quota-tracker
Track Claude Code quota usage via Anthropic API rate limit headers. Shows 5h and 7d utilization, reset times, and quota status. Works with both subscription and API key auth.
Didn't find tool you were looking for?