Agent skill
media-generation
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/media-generation
SKILL.md
Media Generation
Image Generation
uv run ~/.claude/skills/media-generation/scripts/generate_image.py \
--prompt "description or editing instructions" \
--filename "output.png" \
[--input-image "source.png"] \
[--resolution 1K|2K|4K]
Resolution
1K(default) — also for: "low res", "1080p"2K— also for: "medium", "2048"4K— also for: "high res", "hi-res", "ultra"
Video Generation
uv run ~/.claude/skills/media-generation/scripts/generate_video.py \
--prompt "video description" \
--filename "output.mp4" \
[--model veo-3.0-generate-preview] \
[--negative "things to avoid"] \
[--input-image "first-frame.png"]
Models
veo-3.0-generate-001(default) — stable, video onlyveo-3.0-fast-generate-001— faster, lower costveo-3.1-generate-preview— supports video extend, audio syncveo-3.1-fast-generate-preview— fast with extend support
Prompting Tips
- Specify camera movements:
"slow zoom in", "pan left", "close-up" - Add
"no talking, no dialogue"if character shouldn't speak - Describe atmosphere:
"rain outside", "purple mystical energy"
Note: Veo requires paid tier. ~$0.40/sec standard, ~$0.15/sec fast.
Music Video from Image + Audio
Overview
- Start with character image + audio track (e.g., from Suno)
- Transcribe audio to get timestamps
- Generate clip 1 from image (veo-3.1)
- Extend each subsequent clip from previous (maintains continuity)
- Stitch clips + overlay audio with ffmpeg
Step 1: Transcribe audio for timing
whisper-ctranslate2 "song.mp3" --model large-v3 --output_dir /tmp --output_format srt
Step 2: Generate first clip from image
# Use veo-3.1 (required for extend feature)
operation = client.models.generate_videos(
model="veo-3.1-generate-preview",
image=types.Image(image_bytes=img_data, mime_type="image/jpeg"),
prompt="character description, scene action, no talking",
)
video1 = operation.result.generated_videos[0]
Step 3: Extend from previous clip
operation = client.models.generate_videos(
model="veo-3.1-generate-preview",
video=previous_video.video, # Pass previous video object
prompt="next scene description, continuous action, no talking",
)
Step 4: Stitch clips + add audio
# Create concat list
printf "file 'clip_01.mp4'\nfile 'clip_02.mp4'\n..." > concat.txt
# Stitch video clips
ffmpeg -f concat -safe 0 -i concat.txt -c copy combined.mp4
# Add audio track
ffmpeg -i combined.mp4 -i song.mp3 -c:v copy -c:a aac -map 0:v -map 1:a final.mp4
Cost estimate
- ~8 sec per clip × $0.40/sec = $3.20/clip
- 4-min song ≈ 30 clips ≈ $96
Audio Generation
- Music: Use Suno (external service)
- Speech: Gemini 2.5 TTS (Flash or Pro) - TBD script
API Key
Uses GEMINI_API_KEY env var, or pass --api-key KEY.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?