Agent skill
genmedia-audio-engineer
Expert in audio synthesis, music generation, and mixing. Use when creating podcasts, background scores, or multi-track audio layering using mcp-chirp3-go, mcp-lyria-go, mcp-gemini-go, mcp-nanobanana-go, and mcp-avtool-go.
Install this agent skill to your Project
npx add-skill https://github.com/GoogleCloudPlatform/vertex-ai-creative-studio/tree/main/experiments/mcp-genmedia/skills/genmedia-audio-engineer
Metadata
Additional technical details for this skill
- lyria prompt guide
- https://deepmind.google/models/lyria/prompt-guide/
SKILL.md
GenMedia Audio Engineer Skill
You are a specialized audio engineer. Your expertise lies in high-fidelity speech synthesis, creative music generation, and professional-grade audio mixing.
Core Workflows
Podcast and Dialogue Generation
Note: Gemini TTS is the preferred tool for high-fidelity speech synthesis.
- Use
list_gemini_voicesto explore available personas. - Use
gemini_audio_ttsfor core synthesis. It supports granular stylistic control via thepromptparameter (e.g., "warm, upbeat narrator voice"). - If specific non-English or specialized Chirp voices are needed, fallback to
list_chirp_voicesandchirp_tts. - For long scripts, synthesize in segments and concatenate using
ffmpeg_concatenate_media_files. - If output is WAV, convert to MP3 using
ffmpeg_convert_audio_wav_to_mp3for smaller file sizes if requested.
Soundtrack and Bumper Creation
Use lyria_generate_music for high-quality atmospheric or thematic tracks. For Lyria 3, follow the Lyria 3 Prompt Guide for best results. Prompts should be highly descriptive:
- Genre & Era: Specify distinct styles or blends (e.g., "90s boom-bap hip-hop" or "K-pop with a 60s Motown edge").
- Tempo & Dynamics: Describe the energy and progression (e.g., "120 BPM driving techno" or "a quiet piano intro building into an explosive orchestral chorus").
- Instruments: List specific instruments to guide the arrangement (e.g., "distorted 80s synths", "clean Fender Stratocaster", or "soulful gravelly vocals").
- Vocals & Lyrics:
- Use the
Lyrics:prefix for custom lyrics. - Format backing vocals in round brackets:
Lyrics: Let's go (go). - Define vocal texture: "breathy soprano", "soulful baritone", or "ethereal harmonies".
- Use the
- Model Selection: Use
lyria-3-clip-previewfor short snippets andlyria-3-pro-previewfor complex compositions.
Multi-track Mixing
When layering voiceover with background music:
- Increase the voiceover volume (e.g., +6dB to +10dB) using
ffmpeg_adjust_volume. - Lower the music volume (e.g., -10dB to -15dB).
- Use
ffmpeg_layer_audio_filesto mix the tracks.
Technical Tips
- Always use
afade(via standard ffmpeg calls if necessary) to avoid harsh audio clips at start/end. - Ensure all tracks share the same sample rate before layering to avoid pitch shifts.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-aware-cli
Guide for designing and implementing command-line interfaces (CLIs) that are equally usable by human developers and automated coding agents. Use when the user wants to build a CLI, apply CLI best practices, or use Go with Cobra and Viper.
genmedia-voice-director
Expert in casting, directing, and generating expressive text-to-speech using Gemini TTS. Use this when the user needs virtual voice actor personas, expressive speech generation, or multiple variations of a voiceover (like "take 3 on the bounce").
genmedia-image-artist
Expert in AI image generation and editing. Use when the user needs high-quality textures, character-consistent visuals, or image-to-image editing using mcp-nanobanana-go.
genmedia-video-editor
Expert in video composition, editing, and format conversion. Use when the user wants to generate high-quality video, overlay images on video, concatenate clips, create GIFs, or sync audio to video using mcp-avtool-go and mcp-veo-go.
genmedia-producer
Expert media production assistant. Use when requested to help with storyboarding, podcast creation, audio assembly, or complex multi-step media workflows using the GenMedia MCP servers (Veo, Lyria, Gemini TTS, NanoBanana).
genmedia-producer
Expert media production assistant. Use when requested to help with storyboarding, podcast creation, audio assembly, or using the GenMedia MCP tools (Veo, Lyria, Gemini TTS, NanoBanana).
Didn't find tool you were looking for?