Agent skill
elevenlabs
Convert documents and text to audio using ElevenLabs text-to-speech. Use this skill when the user wants to create a podcast, narrate a document, read aloud text, generate audio from a file, or convert text to speech.
Install this agent skill to your Project
npx add-skill https://github.com/sanjay3290/ai-skills/tree/main/skills/elevenlabs
Metadata
Additional technical details for this skill
- author
- sanjay3290
- version
- 1.0
SKILL.md
ElevenLabs - Text-to-Speech & Podcast Skill
Overview
This skill converts text and documents into high-quality audio using ElevenLabs TTS API. It supports two modes: single-voice narration and two-host conversational podcast generation.
When to Use This Skill
Activate when the user mentions:
- "create podcast", "generate podcast", "podcast from document"
- "narrate document", "narrate this file", "read aloud"
- "text to speech", "TTS", "convert to audio"
- "audio from document", "audio version of"
Setup
Config at skills/elevenlabs/config.json:
{
"api_key": "your-elevenlabs-api-key",
"default_voice": "JBFqnCBsd6RMkjVDRZzb",
"default_model": "eleven_multilingual_v2",
"podcast_voice1": "JBFqnCBsd6RMkjVDRZzb",
"podcast_voice2": "EXAVITQu4vr4xnSDxMaL"
}
Only api_key is required. Or set ELEVENLABS_API_KEY env var.
Dependencies: pip install PyPDF2 python-docx (only needed for PDF/DOCX files).
Requires ffmpeg for multi-chunk narration and podcasts.
Commands
List Voices
python skills/elevenlabs/scripts/elevenlabs.py voices
python skills/elevenlabs/scripts/elevenlabs.py voices --json
Use this to find voice IDs for the user.
Single-Voice TTS
# From text
python skills/elevenlabs/scripts/elevenlabs.py tts --text "Hello world" --output ~/Downloads/hello.mp3
# From document
python skills/elevenlabs/scripts/elevenlabs.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3
# With specific voice
python skills/elevenlabs/scripts/elevenlabs.py tts --file doc.md --voice VOICE_ID --output out.mp3
The script handles text extraction, chunking at sentence boundaries (~4000 chars), TTS per chunk with voice continuity, and ffmpeg concatenation automatically.
Podcast Generation
Podcast mode requires a JSON script file with conversation segments:
[
{"speaker": "host1", "text": "Welcome to our podcast! Today we're diving into..."},
{"speaker": "host2", "text": "That's right! I found the section on..."},
{"speaker": "host1", "text": "Let's break that down..."}
]
python skills/elevenlabs/scripts/elevenlabs.py podcast --script /tmp/script.json --voice1 ID1 --voice2 ID2 --output ~/Downloads/podcast.mp3
Podcast Workflow (for Claude)
When the user asks to create a podcast from a document:
-
Extract the document text:
bashpython skills/elevenlabs/scripts/extract.py /path/to/document.pdf -
Generate a two-host conversation script from the extracted text. Follow these guidelines:
- Write as a natural, engaging discussion between two hosts
- Host 1 typically leads/introduces topics, Host 2 adds analysis and reactions
- Start with a brief intro welcoming listeners and stating the topic
- End with a summary/outro
- Keep each turn under 3000 characters
- Vary turn lengths - mix short reactions with longer explanations
- Use conversational language: "That's a great point", "What I found interesting was..."
- Reference specific details from the source document
- Avoid reading the document verbatim - discuss and interpret it
-
Write the script as a JSON array to a temp file:
python# Write to /tmp/podcast_script.json [ {"speaker": "host1", "text": "Welcome to today's episode..."}, {"speaker": "host2", "text": "Thanks for having me..."}, ... ] -
Generate the podcast:
bashpython skills/elevenlabs/scripts/elevenlabs.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3 -
Clean up the temp script file.
Tips
- Run
voicesfirst to let the user pick voices they like - For podcasts, suggest voice pairs with contrasting qualities (e.g., one deep, one bright)
- Default output to
~/Downloads/unless the user specifies otherwise - For large documents, warn the user about character usage on their ElevenLabs plan
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
google-calendar
Interact with Google Calendar - list calendars, view events, create/update/delete events, and find free time. Use when user asks to: check calendar, schedule a meeting, create an event, find available time, list upcoming events, delete or update a calendar event, or respond to meeting invitations. Lightweight alternative to full Google Workspace MCP server with standalone OAuth authentication.
azure-devops
Manage Azure DevOps projects, work items, repos, PRs, pipelines, wikis, test plans, security alerts, variable groups, environments/approvals, branch policies, and attachments. Use when user asks to: manage sprints, create/update work items, list repos, create PRs, run pipelines, search code, manage wiki pages, check security alerts, manage variable groups, approve deployments, or configure branch policies. Covers 13 domains with 99 tools via REST API.
manus
Delegate complex, long-running tasks to Manus AI agent for autonomous execution. Use when user says 'use manus', 'delegate to manus', 'send to manus', 'have manus do', 'ask manus', 'check manus sessions', or when tasks require deep web research, market analysis, product comparisons, stock analysis, competitive research, document generation, data analysis, or multi-step workflows that benefit from autonomous agent execution with parallel processing.
google-sheets
Read and write Google Sheets spreadsheets - get content, update cells, append rows, fetch specific ranges, search for spreadsheets, and view metadata. Use when user asks to: read a spreadsheet, update cells, add data to Google Sheets, find a spreadsheet, check sheet contents, export spreadsheet data, or get cell values. Lightweight integration with standalone OAuth authentication supporting full read/write access.
google-drive
Interact with Google Drive - search files, find folders, list contents, download files, upload files, create folders, move, copy, rename, and trash files. Use when user asks to: search Google Drive, find a file/folder, list Drive contents, download or upload files, create folders, move files, or organize Drive content. Lightweight integration with standalone OAuth authentication supporting full read/write access.
deep-research
Execute autonomous multi-step research using Google Gemini Deep Research Agent. Use for: market analysis, competitive landscaping, literature reviews, technical research, due diligence. Takes 2-10 minutes but produces detailed, cited reports. Costs $2-5 per task.
Didn't find tool you were looking for?