Agent skill
fal-ai-media
Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI.
Install this agent skill to your Project
npx add-skill https://github.com/affaan-m/everything-claude-code/tree/main/.agents/skills/fal-ai-media
SKILL.md
fal.ai Media Generation
Generate images, videos, and audio using fal.ai models via MCP.
When to Activate
- User wants to generate images from text prompts
- Creating videos from text or images
- Generating speech, music, or sound effects
- Any media generation task
- User says "generate image", "create video", "text to speech", "make a thumbnail", or similar
MCP Requirement
fal.ai MCP server must be configured. Add to ~/.claude.json:
"fal-ai": {
"command": "npx",
"args": ["-y", "fal-ai-mcp-server"],
"env": { "FAL_KEY": "YOUR_FAL_KEY_HERE" }
}
Get an API key at fal.ai.
MCP Tools
The fal.ai MCP provides these tools:
search— Find available models by keywordfind— Get model details and parametersgenerate— Run a model with parametersresult— Check async generation statusstatus— Check job statuscancel— Cancel a running jobestimate_cost— Estimate generation costmodels— List popular modelsupload— Upload files for use as inputs
Image Generation
Nano Banana 2 (Fast)
Best for: quick iterations, drafts, text-to-image, image editing.
generate(
model_name: "fal-ai/nano-banana-2",
input: {
"prompt": "a futuristic cityscape at sunset, cyberpunk style",
"image_size": "landscape_16_9",
"num_images": 1,
"seed": 42
}
)
Nano Banana Pro (High Fidelity)
Best for: production images, realism, typography, detailed prompts.
generate(
model_name: "fal-ai/nano-banana-pro",
input: {
"prompt": "professional product photo of wireless headphones on marble surface, studio lighting",
"image_size": "square",
"num_images": 1,
"guidance_scale": 7.5
}
)
Common Image Parameters
| Param | Type | Options | Notes |
|---|---|---|---|
prompt |
string | required | Describe what you want |
image_size |
string | square, portrait_4_3, landscape_16_9, portrait_16_9, landscape_4_3 |
Aspect ratio |
num_images |
number | 1-4 | How many to generate |
seed |
number | any integer | Reproducibility |
guidance_scale |
number | 1-20 | How closely to follow the prompt (higher = more literal) |
Image Editing
Use Nano Banana 2 with an input image for inpainting, outpainting, or style transfer:
# First upload the source image
upload(file_path: "/path/to/image.png")
# Then generate with image input
generate(
model_name: "fal-ai/nano-banana-2",
input: {
"prompt": "same scene but in watercolor style",
"image_url": "<uploaded_url>",
"image_size": "landscape_16_9"
}
)
Video Generation
Seedance 1.0 Pro (ByteDance)
Best for: text-to-video, image-to-video with high motion quality.
generate(
model_name: "fal-ai/seedance-1-0-pro",
input: {
"prompt": "a drone flyover of a mountain lake at golden hour, cinematic",
"duration": "5s",
"aspect_ratio": "16:9",
"seed": 42
}
)
Kling Video v3 Pro
Best for: text/image-to-video with native audio generation.
generate(
model_name: "fal-ai/kling-video/v3/pro",
input: {
"prompt": "ocean waves crashing on a rocky coast, dramatic clouds",
"duration": "5s",
"aspect_ratio": "16:9"
}
)
Veo 3 (Google DeepMind)
Best for: video with generated sound, high visual quality.
generate(
model_name: "fal-ai/veo-3",
input: {
"prompt": "a bustling Tokyo street market at night, neon signs, crowd noise",
"aspect_ratio": "16:9"
}
)
Image-to-Video
Start from an existing image:
generate(
model_name: "fal-ai/seedance-1-0-pro",
input: {
"prompt": "camera slowly zooms out, gentle wind moves the trees",
"image_url": "<uploaded_image_url>",
"duration": "5s"
}
)
Video Parameters
| Param | Type | Options | Notes |
|---|---|---|---|
prompt |
string | required | Describe the video |
duration |
string | "5s", "10s" |
Video length |
aspect_ratio |
string | "16:9", "9:16", "1:1" |
Frame ratio |
seed |
number | any integer | Reproducibility |
image_url |
string | URL | Source image for image-to-video |
Audio Generation
CSM-1B (Conversational Speech)
Text-to-speech with natural, conversational quality.
generate(
model_name: "fal-ai/csm-1b",
input: {
"text": "Hello, welcome to the demo. Let me show you how this works.",
"speaker_id": 0
}
)
ThinkSound (Video-to-Audio)
Generate matching audio from video content.
generate(
model_name: "fal-ai/thinksound",
input: {
"video_url": "<video_url>",
"prompt": "ambient forest sounds with birds chirping"
}
)
ElevenLabs (via API, no MCP)
For professional voice synthesis, use ElevenLabs directly:
import os
import requests
resp = requests.post(
"https://api.elevenlabs.io/v1/text-to-speech/<voice_id>",
headers={
"xi-api-key": os.environ["ELEVENLABS_API_KEY"],
"Content-Type": "application/json"
},
json={
"text": "Your text here",
"model_id": "eleven_turbo_v2_5",
"voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
}
)
with open("output.mp3", "wb") as f:
f.write(resp.content)
VideoDB Generative Audio
If VideoDB is configured, use its generative audio:
# Voice generation
audio = coll.generate_voice(text="Your narration here", voice="alloy")
# Music generation
music = coll.generate_music(prompt="upbeat electronic background music", duration=30)
# Sound effects
sfx = coll.generate_sound_effect(prompt="thunder crack followed by rain")
Cost Estimation
Before generating, check estimated cost:
estimate_cost(model_name: "fal-ai/nano-banana-pro", input: {...})
Model Discovery
Find models for specific tasks:
search(query: "text to video")
find(model_name: "fal-ai/seedance-1-0-pro")
models()
Tips
- Use
seedfor reproducible results when iterating on prompts - Start with lower-cost models (Nano Banana 2) for prompt iteration, then switch to Pro for finals
- For video, keep prompts descriptive but concise — focus on motion and scene
- Image-to-video produces more controlled results than pure text-to-video
- Check
estimate_costbefore running expensive video generations
Related Skills
videodb— Video processing, editing, and streamingvideo-editing— AI-powered video editing workflowscontent-engine— Content creation for social platforms
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
python-testing
Python testing best practices using pytest including fixtures, parametrization, mocking, coverage analysis, async testing, and test organization. Use when writing or improving Python tests.
golang-patterns
Go-specific design patterns and best practices including functional options, small interfaces, dependency injection, concurrency patterns, error handling, and package organization. Use when working with Go code to apply idiomatic Go patterns.
e2e-testing
Playwright E2E testing patterns, Page Object Model, configuration, CI/CD integration, artifact management, and flaky test strategies.
agentic-engineering
Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing. Use when AI agents perform most implementation work and humans enforce quality and risk controls.
api-design
REST API design patterns including resource naming, status codes, pagination, filtering, error responses, versioning, and rate limiting for production APIs.
python-patterns
Python-specific design patterns and best practices including protocols, dataclasses, context managers, decorators, async/await, type hints, and package organization. Use when working with Python code to apply Pythonic patterns.
Didn't find tool you were looking for?