Voice Reply Skill

Generate spoken audio responses using OpenAI's Text-to-Speech API.

When to Use

When the user asks to "reply by voice", "voice reply", "speak this", or similar voice-related requests.
Command trigger: When user sends /voice_note, resend the last message as a voice note.
- Remove all emojis from the text before converting to speech
- Rephrase if needed to make it sound natural when spoken (e.g., convert bullet points to flowing sentences)

How to Use

1. Prepare the Response

Write your response without emojis — they don't translate well to speech.

2. Generate Audio

Important: Use opus format for Telegram voice notes (shows waveform bubble).

bash

curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini-tts",
    "input": "<your text here>",
    "voice": "echo",
    "speed": 1.2,
    "response_format": "opus"
  }' -s --output /tmp/voice_reply.ogg

3. Send the Audio

Copy to outbound folder and send via message tool:

bash

mkdir -p /home/exedev/.clawdbot/media/outbound
cp /tmp/voice_reply.ogg /home/exedev/.clawdbot/media/outbound/voice_reply.ogg

Then use the message tool with asVoice: true for proper voice message format:

json

{
  "action": "send",
  "channel": "telegram",
  "to": "<user_id>",
  "media": "/home/exedev/.clawdbot/media/outbound/voice_reply.ogg",
  "asVoice": true
}

Important:

Use .ogg (opus) format — required for Telegram voice notes
asVoice: true sends as voice bubble with waveform
message caption is optional for voice notes

Configuration Options

Voice Options

Voice	Description
`alloy`	Neutral, balanced
`echo`	Warm, conversational (default)
`fable`	British, expressive
`onyx`	Deep, authoritative
`nova`	Friendly, upbeat
`shimmer`	Soft, calm

Speed

Range: 0.25 to 4.0
Default: 1.2 (slightly faster than normal)

Model

gpt-4o-mini-tts — Fast, cost-effective
tts-1 — Standard quality
tts-1-hd — High definition

Example Workflow

bash

# 1. Generate audio (opus format for Telegram voice notes)
curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini-tts",
    "input": "Hey Oscar! Your main task today is Create Task Skill. Let me know if you need help!",
    "voice": "echo",
    "speed": 1.2,
    "response_format": "opus"
  }' -s --output /tmp/reply.ogg

# 2. Copy to outbound
cp /tmp/reply.ogg ~/.clawdbot/media/outbound/reply.ogg

# 3. Send via message tool with asVoice: true

Tips

Keep responses concise for voice — long text becomes tiring to listen to
Avoid special characters, URLs, and code blocks
Use natural language, as if speaking to someone
Numbers and dates should be written out naturally

Search AI Tools

voice-reply

Install this agent skill to your Project

SKILL.md

Voice Reply Skill

When to Use

How to Use

1. Prepare the Response

2. Generate Audio

3. Send the Audio

Configuration Options

Voice Options

Speed

Model

Example Workflow

Tips

References