Agent skill
text-to-speech-optimization
Advanced text-to-speech optimization with expressional tone guides, natural speech patterns, prosody control, and human-like conversation techniques to create authentic-sounding AI voices. Use when generating natural speech, creating engaging content, or producing professional voiceovers.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/text-to-speech-optimization
SKILL.md
Text-to-Speech Optimization
Comprehensive guide to creating natural, human-like AI-generated speech that listeners can't distinguish from real voices.
Naturalness Techniques (Making AI Sound Human)
1. Conversational Speech Patterns
Strategic Imperfections (Key to Naturalness):
Real humans don't speak in perfect sentences. Add these elements sparingly:
# Self-corrections (very effective)
"We should meet on Tuesday... actually, make that Wednesday."
"The price is around fifty... wait, no, closer to sixty dollars."
"I was thinking we could... hmm, let me rephrase that."
# Thoughtful hesitations
"Well... I think we should consider..."
"You know, that's actually... that's a really good point."
"So, um, basically what happened was..."
# Trailing thoughts
"Maybe we could try... yeah, that might work."
"I guess we'll see how it goes..."
"It's just that... never mind, it's not important."
# Incomplete sentences (conversational)
"The thing is—"
"But what about—oh, right, that makes sense."
"I was gonna say—wait, what was I saying?"
Filler Words (Use Sparingly - 1-2 per paragraph max):
# Thoughtful fillers
"So, like, here's what I'm thinking..."
"It's, um, kind of complicated, you know?"
"Well, basically, it works like this..."
"I mean, you could say that, but..."
# Natural transitions
"Anyway, moving on to..."
"Right, so here's the deal..."
"Okay, so let me explain..."
"Now, here's where it gets interesting..."
IMPORTANT: Overuse kills naturalness. Mix 80% clean speech with 20% natural imperfections.
2. Prosody & Emotional Expression
Emotional Tone Markers:
# EXCITEMENT / HIGH ENERGY
"OH MY GOSH! [excited, fast] This is INCREDIBLE!
I can't believe we actually did it! [breathless]
This is going to change EVERYTHING!"
Characteristics:
- Higher pitch
- Faster pace
- More emphasis (caps)
- Shorter sentences
- Exclamation points
# CALM / SOOTHING
"Take a deep breath... [soft, slow]
Just relax. [gentle]
Everything... is going to be okay. [reassuring]
There's no need to rush."
Characteristics:
- Lower, softer tone
- Slower pace
- Longer pauses
- Gentle inflection
- Ellipses for pauses
# PROFESSIONAL / AUTHORITATIVE
"According to our research, [confident]
the data clearly demonstrates
that this approach [measured pause]
yields superior results."
Characteristics:
- Clear, measured pace
- No hesitations
- Precise language
- Confident delivery
- Professional vocabulary
# CONVERSATIONAL / FRIENDLY
"Hey there! [warm]
So glad you could join us.
You know what? [friendly]
I think you're really gonna love this."
Characteristics:
- Warm, approachable
- Casual language
- Questions for engagement
- Personal pronouns (you, we)
- Relatable examples
# MYSTERIOUS / INTRIGUING
"But then... [pause, lower voice]
something unexpected happened.
Nobody... [whispered] could have predicted it.
The truth was far stranger... [dramatic pause]
than anyone imagined."
Characteristics:
- Varied pacing
- Strategic pauses
- Lower volume at times
- Building tension
- Dramatic reveals
# URGENT / INTENSE
"Listen! [sharp, fast]
We need to act NOW!
There's no time to waste!
Quick! [intense] Before it's too late!"
Characteristics:
- Fast pace
- Short sentences
- High energy
- Imperative verbs
- Sense of immediacy
# SAD / SOMBER
"I'm so sorry... [soft, slow]
This is... really difficult to say.
We tried everything we could, but...
[long pause]
I wish things had turned out differently."
Characteristics:
- Slower pace
- Lower energy
- Longer pauses
- Softer delivery
- Empathetic tone
# CONFUSED / UNCERTAIN
"Wait... I don't quite understand.
So you're saying that... [hesitant]
Hmm. [thinking]
I'm not sure I follow...
Could you explain that again?"
Characteristics:
- Hesitations
- Questions
- Thinking sounds (hmm, uh)
- Uncertain phrasing
- Seeking clarification
3. Pacing & Rhythm Variation
Dynamic Pacing (Critical for Naturalness):
# SLOW for Emphasis / Clarity
"Now... listen... very... carefully.
This is the most important part.
Each. Word. Matters."
# FAST for Excitement / Lists
"And then boom it happened so fast I could barely see it and wow just incredible absolutely amazing you should have been there!"
# MIXED for Natural Flow (Most Common)
"So here's what we're going to do. [normal]
First, [pause] we'll start with the basics... [slower]
get a good foundation going.
Then! [faster, excited] We'll ramp up the pace,
add in the advanced features,
and before you know it... [return to normal]
we'll have something really special."
Strategic Pauses:
# Before Important Points
"[pause] Now here's the key thing to understand..."
"And the winner is... [dramatic pause] ...Sarah!"
# After Complex Information
"The quantum entanglement phenomenon occurs when...
[brief pause for processing]
...paired particles maintain correlation."
# For Emphasis
"This is not... [pause] ...optional."
"We need to be clear about one thing. [pause] This changes everything."
# Natural Breath Points
"This is a really comprehensive explanation about
how the entire system works from start to finish...
[natural breath pause]
...and why it matters for your business."
4. Inflection & Pitch Variation
Question Patterns:
# Rising Inflection (Yes/No Questions)
"Don't you think?" [up]
"Right?" [up]
"You know what I mean?" [up]
"Isn't that amazing?" [up]
# Falling Inflection (Wh-Questions)
"What happened next?" [down]
"Where did you go?" [down]
"How does it work?" [down]
# Mixed for Engagement
"You see what I'm saying? [up]
That's pretty cool, right? [up]
And you know what the best part is? [up, then down on answer]
It just works."
List Intonation:
"We need three things:
coffee, [slight down]
sugar, [slight down]
milk, [slight down]
and cream." [down, final]
"First, [up, continuing]
second, [up, continuing]
third, [up, continuing]
and finally." [down, ending]
5. Emphasis & Stress Patterns
Word Emphasis:
# Capitalize for Strong Emphasis
"This is REALLY important."
"I LOVE this approach!"
"NEVER do that again."
# Italics for Moderate Emphasis (in scripts)
"I *think* we should try it."
"That's *exactly* what I mean."
# Repetition for Impact
"Listen. Listen carefully. This matters."
"No. No, that's not right."
"Yes! Yes, absolutely yes!"
Sentence Stress:
# Stress Different Words = Different Meaning
"I didn't say she stole the money." [someone else said it]
"I didn't SAY she stole the money." [I implied it]
"I didn't say SHE stole the money." [someone else stole it]
"I didn't say she STOLE the money." [she acquired it another way]
"I didn't say she stole THE money." [she stole something else]
6. Breath Sounds & Natural Breaks
Breath Markers:
# After Physical Exertion
"Whew! [quick breath] That was intense!"
"I just ran up those stairs and [breath] wow, I'm out of shape!"
# Before Important Statements
"[deep breath] Okay, here's what we're going to do..."
"[breath] This is the moment we've been waiting for."
# After Long Explanations
"So that's how the entire process works from beginning to end...
[natural breath]
...and that's why it's so effective."
# Emotional Breaths
"[shaky breath] I can't believe this is happening..."
"[sigh] Well, I guess that's that."
Timing Naturalness:
# Varied Pause Lengths
Short pause: , (comma)
Medium pause: ... (ellipsis)
Long pause: . [pause] (period + marker)
Dramatic pause: — (em dash)
Example:
"Well, I was thinking... [medium]
maybe we could try something different. [long]
You know? [short] Like—[dramatic] a complete redesign."
Parameter Optimization
Stability Settings (Expressiveness vs Consistency)
HIGHLY EXPRESSIVE (0.2-0.4):
- Conversational content
- Character voices
- Emotional storytelling
- Podcasts, vlogs
BALANCED (0.5-0.6):
- General narration
- Educational content
- Product descriptions
- Most use cases
CONSISTENT (0.7-0.9):
- Professional narration
- Audiobooks
- Corporate content
- Technical documentation
Similarity Boost (Voice Matching)
LOW (0.5-0.65):
- More variation
- Creative interpretation
- Less strict matching
SWEET SPOT (0.70-0.85):
- Natural but consistent
- Recommended for most uses
- Good balance
HIGH (0.85-1.0):
- Very close to original
- Voice cloning
- Brand consistency critical
Speed / Pace
SLOW (0.70-0.90):
- Technical explanations
- Meditation, calm content
- Elderly audience
- Complex information
NATURAL (0.95-1.05):
- General content
- Conversational
- Most versatile
FAST (1.05-1.20):
- Energetic content
- Urgent information
- Young audience
- Exciting moments
Style Exaggeration (Use Sparingly)
NONE (0):
- Most natural
- Recommended default
- No extra processing
MODERATE (0.3-0.6):
- Slight style amplification
- Character emphasis
- Costs more compute
STRONG (0.7-1.0):
- Very exaggerated
- Special use cases
- Higher latency/cost
Script Writing for Natural Speech
Write for the Ear, Not the Eye
BAD (Written Language):
"Our comprehensive solution provides organizations with the capability to
leverage advanced analytics in order to optimize operational efficiency and
maximize return on investment through data-driven decision-making processes."
GOOD (Spoken Language):
"So here's what we do. [conversational]
We help companies use their data... [pause] better.
Basically, we give you the tools to make smarter decisions,
save money, and... you know... [natural] just run things more smoothly.
It's pretty straightforward, actually."
Punctuation for Prosody
Comprehensive Punctuation Guide:
. (Period)
Use: Full stop, complete pause
"First sentence. Second sentence."
, (Comma)
Use: Brief pause, list items
"Well, I think, perhaps we should try it."
... (Ellipsis)
Use: Thoughtful pause, trailing off
"Hmm... that's interesting..."
"Maybe we could... yeah, that works."
— (Em Dash)
Use: Interruption, sudden change
"I was thinking—wait, that's not right."
"The results were—oh my goodness—incredible!"
! (Exclamation)
Use: Energy, excitement, surprise
"This is amazing!"
"Stop!"
"Wow!"
? (Question Mark)
Use: Rising inflection, inquiry
"Don't you think?"
"What happened next?"
: (Colon)
Use: Introduction, emphasis coming
"Here's the thing: we need to act now."
"Remember this: timing is everything."
; (Semicolon)
Use: Related thoughts, slight pause
"We tried our best; it wasn't enough."
" " (Quotes)
Use: Direct speech, emphasis
He said, "This changes everything."
The so-called "solution" didn't work.
( ) (Parentheses)
Use: Aside, lower volume
"The price (believe it or not) is free."
"This method (which I invented) works great."
*italics* (In scripts)
Use: Moderate emphasis
"That's *exactly* what I mean."
**CAPS** (In scripts)
Use: Strong emphasis, volume
"This is REALLY important!"
Dialogue vs Narration
Dialogue (More Natural Imperfections):
"Hey, so, like... [conversational]
I was thinking, you know,
maybe we could grab coffee or something?
I mean, if you're free, of course.
No pressure or anything."
Narration (Cleaner, But Still Natural):
"The morning sun rose slowly over the mountains,
casting long shadows across the valley below.
Sarah stood at the window... watching, waiting.
She knew... [thoughtful]
everything was about to change."
Advanced Naturalness Techniques
Vocal Variety Layering
Combine multiple techniques:
"So, listen... [pause + soft]
I'm gonna tell you something [conversational + friendly]
that's gonna absolutely BLOW your mind! [excited + loud]
Ready? [question + anticipation]
[dramatic pause]
It's... [whispered + mysterious]
actually really simple. [normal + relieved]"
Techniques used:
✓ Pacing variation (slow → fast → slow)
✓ Volume changes (normal → loud → whisper → normal)
✓ Emotional shifts (calm → excited → mysterious → relieved)
✓ Strategic pauses (dramatic, thoughtful)
✓ Tone markers (conversational, friendly, excited)
Contextual Adaptation
Match Tone to Content:
# Bad News (Somber, slower, softer)
"I'm sorry to have to tell you this, but...
[gentle]
the test results came back, and...
[pause, empathetic]
they're not what we hoped for."
# Good News (Upbeat, faster, energetic)
"Guess what?! [excited]
You're NOT gonna believe this!
We got the contract! [triumphant]
All that hard work... [grateful] it paid off!"
# Technical Explanation (Clear, measured)
"The algorithm works by first analyzing
the input data, [pause for clarity]
then applying a series of transformations,
and finally [deliberate]
outputting the optimized result."
# Storytelling (Varied, engaging)
"It was a dark and stormy night... [mysterious, slow]
The wind howled through the trees. [atmospheric]
And then! [sudden, fast] Without warning!
A figure appeared in the doorway. [dramatic pause]
Nobody moved. [whispered, tense]
Nobody... [breath] dared."
Multi-Take Strategy
// Generate variations, choose best
const variations = {
expressive: {
stability: 0.3,
similarity_boost: 0.7,
note: "Most natural, conversational"
},
balanced: {
stability: 0.5,
similarity_boost: 0.8,
note: "Professional, versatile"
},
consistent: {
stability: 0.7,
similarity_boost: 0.85,
note: "Polished, reliable"
}
}
// Test all three
// A/B test with audience
// Choose best for context
Common Mistakes & Fixes
Mistake 1: Too Robotic
Problem:
"Hello. Welcome to our presentation. Today we will discuss important topics."
Solution:
"Hey there! Welcome. [warm]
So... [conversational] today we're gonna talk about
some really important stuff.
You're gonna love this."
Why it works:
- Contractions (we're, gonna)
- Conversational markers (hey, so)
- Enthusiasm
- Direct address (you)
Mistake 2: Too Many Fillers
Problem:
"So, um, like, you know, I was, um, thinking, you know, that maybe, like, we could, um..."
Solution:
"So, I was thinking... [slight pause]
maybe we could try a different approach?
What do you think?"
Why it works:
- Limit fillers to 1-2 per paragraph
- Replace some with pauses
- Clear main message
Mistake 3: Monotone Delivery
Problem:
All sentences same pace, same emphasis, same inflection.
Solution:
"First point: [slower, emphasis] really important.
Second! [faster, excited] This is cool!
And third... [thoughtful pause]
well, you'll see."
Why it works:
- Varied pacing
- Different emotional tones
- Strategic emphasis
Testing & Quality Control
Naturalness Checklist
- Varied pacing (not monotone)
- Appropriate emotional tone
- Strategic pauses
- Natural inflection patterns
- Some imperfections (if conversational)
- Emphasis on key words
- Contextually appropriate energy
- No obvious AI artifacts
- Sounds like a real person
- Engaging and believable
A/B Testing
Generate two versions:
Version A: Conservative (fewer techniques)
Version B: Natural (full techniques)
Test with audience
Track engagement metrics
Refine based on feedback
Document what works
Resources
- ElevenLabs Voice Lab
- Speech Pattern Analysis Tools
- Professional Voice Actor Guides
- Conversational AI Best Practices
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?