Agent skill
voice-reply
Generate voice replies using OpenAI TTS API and send audio responses.
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/voice-reply-leovigna-jarvis-template
SKILL.md
Voice Reply Skill
Generate spoken audio responses using OpenAI's Text-to-Speech API.
When to Use
-
When the user asks to "reply by voice", "voice reply", "speak this", or similar voice-related requests.
-
Command trigger: When user sends
/voice_note, resend the last message as a voice note.- Remove all emojis from the text before converting to speech
- Rephrase if needed to make it sound natural when spoken (e.g., convert bullet points to flowing sentences)
How to Use
1. Prepare the Response
Write your response without emojis — they don't translate well to speech.
2. Generate Audio
Important: Use opus format for Telegram voice notes (shows waveform bubble).
bash
curl https://api.openai.com/v1/audio/speech \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini-tts",
"input": "<your text here>",
"voice": "echo",
"speed": 1.2,
"response_format": "opus"
}' -s --output /tmp/voice_reply.ogg
3. Send the Audio
Copy to outbound folder and send via message tool:
bash
mkdir -p /home/exedev/.clawdbot/media/outbound
cp /tmp/voice_reply.ogg /home/exedev/.clawdbot/media/outbound/voice_reply.ogg
Then use the message tool with asVoice: true for proper voice message format:
json
{
"action": "send",
"channel": "telegram",
"to": "<user_id>",
"media": "/home/exedev/.clawdbot/media/outbound/voice_reply.ogg",
"asVoice": true
}
Important:
- Use
.ogg(opus) format — required for Telegram voice notes asVoice: truesends as voice bubble with waveformmessagecaption is optional for voice notes
Configuration Options
Voice Options
| Voice | Description |
|---|---|
alloy |
Neutral, balanced |
echo |
Warm, conversational (default) |
fable |
British, expressive |
onyx |
Deep, authoritative |
nova |
Friendly, upbeat |
shimmer |
Soft, calm |
Speed
- Range:
0.25to4.0 - Default:
1.2(slightly faster than normal)
Model
gpt-4o-mini-tts— Fast, cost-effectivetts-1— Standard qualitytts-1-hd— High definition
Example Workflow
bash
# 1. Generate audio (opus format for Telegram voice notes)
curl https://api.openai.com/v1/audio/speech \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini-tts",
"input": "Hey Oscar! Your main task today is Create Task Skill. Let me know if you need help!",
"voice": "echo",
"speed": 1.2,
"response_format": "opus"
}' -s --output /tmp/reply.ogg
# 2. Copy to outbound
cp /tmp/reply.ogg ~/.clawdbot/media/outbound/reply.ogg
# 3. Send via message tool with asVoice: true
Tips
- Keep responses concise for voice — long text becomes tiring to listen to
- Avoid special characters, URLs, and code blocks
- Use natural language, as if speaking to someone
- Numbers and dates should be written out naturally
References
Didn't find tool you were looking for?