Agent skill

google-tts

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

View SKILL.md on GitHub Repository

Stars 184

Forks 18

Install this agent skill to your Project

npx add-skill https://github.com/sanjay3290/ai-skills/tree/main/skills/google-tts

SKILL.md

Google Cloud Text-to-Speech

Converts text and documents into audio using Google Cloud TTS API. Supports Neural2, WaveNet, Studio, and Standard voices across 40+ languages.

Setup

API key via GOOGLE_TTS_API_KEY env var or skills/google-tts/config.json with {"api_key": "..."}. Requires ffmpeg for multi-chunk documents. Optional: pip install PyPDF2 python-docx for PDF/DOCX.

Commands

List Voices

bash

python skills/google-tts/scripts/google_tts.py voices --language en-US --type Neural2
python skills/google-tts/scripts/google_tts.py voices --json

Text-to-Speech

bash

# From text or document (PDF, DOCX, MD, TXT)
python skills/google-tts/scripts/google_tts.py tts --text "Hello world" --output ~/Downloads/hello.mp3
python skills/google-tts/scripts/google_tts.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3

# With voice, rate, pitch, encoding options
python skills/google-tts/scripts/google_tts.py tts --file doc.md --voice en-US-Neural2-F --rate 0.9 --encoding MP3 --output ~/Downloads/out.mp3

Podcast Generation

Takes a JSON script with alternating speakers, synthesizes each with a different voice.

json

[
  {"speaker": "host1", "text": "Welcome to our podcast!"},
  {"speaker": "host2", "text": "Thanks for having me..."}
]

bash

python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --output ~/Downloads/podcast.mp3
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --voice1 en-US-Neural2-J --voice2 en-US-Neural2-H --rate 0.9 --output ~/Downloads/podcast.mp3

Workflow

Single-Voice Narration

If user provides a file path, use --file. For generated content, write clean prose to /tmp/tts_input.md first.
Default voice: en-US-Neural2-D (male) or en-US-Neural2-F (female). Use Neural2 for best quality/cost balance.
Generate: python skills/google-tts/scripts/google_tts.py tts --file /tmp/tts_input.md --output ~/Downloads/recording.mp3
Report file location and size. Default output to ~/Downloads/.

Podcast from Document

Extract text: python skills/google-tts/scripts/extract.py /path/to/document.pdf
Generate a two-host conversation script as JSON:
- Natural discussion, not verbatim reading. Host 1 leads, Host 2 reacts/analyzes.
- Include intro and outro. Vary turn lengths. Keep turns under 4000 chars.
Write script to /tmp/podcast_script.json
Generate: python skills/google-tts/scripts/google_tts.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3
Clean up temp files.

Reference

Recommended voice type: Neural2 (~$4/1M chars, high quality)
Speaking rate: 0.25-4.0 (0.85-0.95 good for technical content)
Pitch: -20.0 to 20.0 semitones
Encodings: MP3 (default), LINEAR16 (.wav), OGG_OPUS (.ogg)
API limit: 5000 bytes/request. Script auto-chunks at sentence boundaries.

Maintainer

sanjay3290 Core maintainer

Source details

Full Name: sanjay3290/ai-skills
Branch: main
Path in repo: skills/google-tts
License: Apache License 2.0
Topics: claude-code agent-skills mcp claude-skills postgresql ai-skills notebooklm atlassian azure-devops confluence deep-research elevenlabs gmail google-calendar google-drive google-workspace imagen jira mysql text-to-speech

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

sanjay3290/ai-skills

google-calendar

Interact with Google Calendar - list calendars, view events, create/update/delete events, and find free time. Use when user asks to: check calendar, schedule a meeting, create an event, find available time, list upcoming events, delete or update a calendar event, or respond to meeting invitations. Lightweight alternative to full Google Workspace MCP server with standalone OAuth authentication.

184 18

Explore

sanjay3290/ai-skills

azure-devops

Manage Azure DevOps projects, work items, repos, PRs, pipelines, wikis, test plans, security alerts, variable groups, environments/approvals, branch policies, and attachments. Use when user asks to: manage sprints, create/update work items, list repos, create PRs, run pipelines, search code, manage wiki pages, check security alerts, manage variable groups, approve deployments, or configure branch policies. Covers 13 domains with 99 tools via REST API.

184 18

Explore

sanjay3290/ai-skills

manus

Delegate complex, long-running tasks to Manus AI agent for autonomous execution. Use when user says 'use manus', 'delegate to manus', 'send to manus', 'have manus do', 'ask manus', 'check manus sessions', or when tasks require deep web research, market analysis, product comparisons, stock analysis, competitive research, document generation, data analysis, or multi-step workflows that benefit from autonomous agent execution with parallel processing.

184 18

Explore

sanjay3290/ai-skills

google-sheets

Read and write Google Sheets spreadsheets - get content, update cells, append rows, fetch specific ranges, search for spreadsheets, and view metadata. Use when user asks to: read a spreadsheet, update cells, add data to Google Sheets, find a spreadsheet, check sheet contents, export spreadsheet data, or get cell values. Lightweight integration with standalone OAuth authentication supporting full read/write access.

184 18

Explore

sanjay3290/ai-skills

google-drive

Interact with Google Drive - search files, find folders, list contents, download files, upload files, create folders, move, copy, rename, and trash files. Use when user asks to: search Google Drive, find a file/folder, list Drive contents, download or upload files, create folders, move files, or organize Drive content. Lightweight integration with standalone OAuth authentication supporting full read/write access.

184 18

Explore

sanjay3290/ai-skills

deep-research

Execute autonomous multi-step research using Google Gemini Deep Research Agent. Use for: market analysis, competitive landscaping, literature reviews, technical research, due diligence. Takes 2-10 minutes but produces detailed, cited reports. Costs $2-5 per task.

184 18

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Google Cloud Text-to-Speech

Setup

Commands

List Voices

Text-to-Speech

Podcast Generation

Workflow

Single-Voice Narration

Podcast from Document

Reference

Recommended Agent Skills

google-calendar

azure-devops

manus

google-sheets

google-drive

deep-research