Agent skill
doubao-tts
Generate high-quality speech audio using Doubao (豆包/Volcengine) TTS API. Use this skill when the user asks to generate audio, podcasts, voiceovers, or text-to-speech output.
Install this agent skill to your Project
npx add-skill https://github.com/xvirobotics/metabot/tree/main/.claude/skills/doubao-tts
SKILL.md
Doubao TTS — 豆包语音合成
Generate high-quality speech audio from text using Volcengine's Doubao TTS API. Supports short-form (real-time) and long-form (async, up to 100K characters) synthesis.
When to Use
- User asks to generate audio, podcasts, voiceovers, or narration
- User wants text-to-speech for any content
- User asks to "read this aloud" or "make an audio version"
Quick Usage
Use the doubao-tts CLI tool (installed at bin/doubao-tts):
# Short text (real-time, < 300 chars)
bin/doubao-tts "你好世界" -o output.mp3
# Long text from file (async mode, up to 100K chars)
bin/doubao-tts -f article.txt -o podcast.mp3
# Pipe content
echo "Hello world" | bin/doubao-tts -o hello.mp3
# Choose voice
bin/doubao-tts "你好" -v zh_male_aojiaobazong_moon_bigtts -o output.mp3
# Adjust speed/volume/pitch
bin/doubao-tts "你好" --speed 1.2 --volume 1.5 -o output.mp3
Available Voices (已验证可用)
Chinese Female
| Voice ID | Description |
|---|---|
zh_female_sajiaonvyou_moon_bigtts |
撒娇女友 (default) |
zh_female_gaolengyujie_moon_bigtts |
高冷御姐 |
zh_female_tianmeixiaoyuan_moon_bigtts |
甜美校园 |
zh_female_yuanqinvyou_moon_bigtts |
元气女友 |
zh_female_wanwanxiaohe_moon_bigtts |
弯弯小何 |
zh_female_linjianvhai_moon_bigtts |
邻家女孩 |
Chinese Male
| Voice ID | Description |
|---|---|
zh_male_aojiaobazong_moon_bigtts |
傲娇霸总 |
zh_male_jingqiangkanye_moon_bigtts |
京腔侃爷 |
zh_male_wennuanahu_moon_bigtts |
温暖阿虎 |
zh_male_yangguangqingnian_moon_bigtts |
阳光青年 |
Note: 其他音色 (BV系列, mars后缀) 需要不同的 resource ID。如需更多音色,请在火山引擎控制台开通对应资源。
API Details
Environment Variables (already configured in MetaBot .env)
VOLCENGINE_TTS_APPID=<app_id>
VOLCENGINE_TTS_ACCESS_KEY=<access_key>
VOLCENGINE_TTS_RESOURCE_ID=volc.service_type.10029 (optional)
Short-form API (real-time, < 300 chars)
- Endpoint:
https://openspeech.bytedance.com/api/v3/tts/unidirectional - Response: chunked JSON with base64 audio in
datafield - Latency: < 1 second
Long-form API (async, up to 100K chars)
- Submit:
POST https://openspeech.bytedance.com/api/v1/tts_async/submit - Query:
GET https://openspeech.bytedance.com/api/v1/tts_async/query?appid=X&task_id=Y - Response:
audio_url(valid for 1 hour) - Latency: seconds to minutes depending on text length
Workflow for Podcasts
- Write the script — Create the podcast script as markdown or plain text
- Generate audio — Use
bin/doubao-tts -f script.txt -v zh_male_aojiaobazong_moon_bigtts -o podcast.mp3 - Copy to outputs —
cp podcast.mp3 /tmp/metabot-outputs/<chatId>/to send to user - For multi-voice podcasts, generate each speaker's segments separately, then concatenate with
ffmpeg
Multi-Voice Podcast Example
# Generate segments for different speakers
bin/doubao-tts -f host_lines.txt -v zh_male_aojiaobazong_moon_bigtts -o host.mp3
bin/doubao-tts -f guest_lines.txt -v zh_female_gaolengyujie_moon_bigtts -o guest.mp3
# Concatenate (requires ffmpeg)
echo "file 'host.mp3'" > list.txt
echo "file 'guest.mp3'" >> list.txt
ffmpeg -f concat -safe 0 -i list.txt -c copy podcast.mp3
Raw curl (if CLI not available)
# Short-form
curl -X POST "https://openspeech.bytedance.com/api/v3/tts/unidirectional" \
-H "Content-Type: application/json" \
-H "X-Api-App-Id: $VOLCENGINE_TTS_APPID" \
-H "X-Api-Access-Key: $VOLCENGINE_TTS_ACCESS_KEY" \
-H "X-Api-Resource-Id: volc.service_type.10029" \
-H "X-Api-Request-Id: $(uuidgen)" \
-d '{
"req_params": {
"text": "你好世界",
"speaker": "zh_female_sajiaonvyou_moon_bigtts",
"audio_params": {"format": "mp3", "sample_rate": 24000}
}
}' | python3 -c "
import sys, json, base64
chunks = []
for line in sys.stdin:
line = line.strip()
if not line: continue
try:
d = json.loads(line)
if d.get('data'): chunks.append(base64.b64decode(d['data']))
except: pass
sys.stdout.buffer.write(b''.join(chunks))
" > output.mp3
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
frontend-design
Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
metamemory
Read and write shared memory documents. Use this when you need to save knowledge, notes, research findings, or project context for future reference across sessions. Also use it to look up previously stored information.
metaskill
The meta-skill: create AI agent teams, individual agents, or custom skills for any project. Use when the user wants to generate a complete .claude/ agent team, create a single agent, or create a single skill.
metabot
MetaBot HTTP API for agent collaboration: talk to other bots, schedule tasks, manage bots and peers. Use when the user wants to delegate work to another bot, schedule tasks, create/remove bots, or check peer status.
skill-hub
Discover, search, and install shared skills from the Skill Hub registry. Use when the user wants to find available skills, share a skill with other bots, or install a skill from the hub.
voice
Convert text to speech audio using mb voice CLI. Use when the user asks you to speak, say something aloud, generate audio, or produce a voice recording.
Didn't find tool you were looking for?