Agent skills
voice-system-expert

Agent skill

voice-system-expert

Use when working with Quetrex's voice interface, OpenAI Realtime API, WebRTC, or echo cancellation. Knows Quetrex's specific voice architecture decisions and patterns. CRITICAL - prevents breaking working voice system.

View SKILL.md on GitHub Repository

Stars 232

Forks 15

Install this agent skill to your Project

npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/barnhardt-enterprises-inc/voice-system-expert

SKILL.md

Quetrex Voice System Expert

CRITICAL: Read This First

Quetrex's voice system architecture is extensively documented and battle-tested. Before making ANY changes to voice-related code, you MUST read:

ADR-001-VOICE-ECHO-CANCELLATION.md (definitive architectural decision)
docs/architecture/VOICE-SYSTEM.md (technical implementation)
docs/features/voice-interface.md (user-facing features)

Location: src/lib/openai-realtime.ts

Core Architecture Decision

ALWAYS-ON MICROPHONE + BROWSER AEC

This is Decision 4 from VOICE-SYSTEM.md and the definitive approach documented in ADR-001.

typescript

// ✅ CORRECT: Always-on microphone
const mediaStream = await navigator.mediaDevices.getUserMedia({
  audio: {
    echoCancellation: true,  // CRITICAL - browser handles echo cancellation
    noiseSuppression: true,
    autoGainControl: true,
  },
})

// Microphone track stays ENABLED throughout conversation
// Browser's native AEC prevents feedback loops
// Server-side VAD (Voice Activity Detection) handles turn detection

How It Works

Audio Pipeline

User speaks
    ↓
Microphone (always enabled, echoCancellation: true)
    ↓
WebRTC → OpenAI Realtime API
    ↓
Server-side VAD detects speech
    ↓
OpenAI processes and responds
    ↓
Audio response via WebRTC
    ↓
HTMLAudioElement playback (stays in browser pipeline)
    ↓
Browser AEC compares mic input + speaker output
    ↓
Echo automatically canceled (no feedback loop)

Why This Works

Browser Echo Cancellation Requirements:

Both microphone input AND speaker output must be in browser's audio graph
Audio must flow through browser's WebRTC stack
No manual intervention needed (browser handles it)

This is the industry standard:

ChatGPT voice mode
Google Meet
Zoom
Discord
Microsoft Teams

They ALL use always-on microphone + browser AEC.

DO NOT DO THESE THINGS

❌ DON'T: Toggle Microphone Track

typescript

// ❌ WRONG - This breaks echo cancellation
async function pauseRecording() {
  microphone.enabled = false // DON'T DO THIS
}

async function resumeRecording() {
  microphone.enabled = true // DON'T DO THIS
}

Why this fails:

Not industry standard
Causes audio glitches
Can break echo cancellation
Adds artificial delays
More complex state management
No real benefit

❌ DON'T: Route Audio Outside Browser

typescript

// ❌ WRONG - AudioWorklet bypass to native audio
const workletNode = new AudioWorkletNode(audioContext, 'bypass-processor')
workletNode.port.postMessage({ cmd: 'route-to-native' })

Why this fails:

Breaks browser AEC (audio leaves browser pipeline)
Causes echo/feedback loops
Requires complex native audio handling
Platform-specific implementations
Already tried and failed (see abandoned-approaches/)

❌ DON'T: Implement Custom Echo Cancellation

typescript

// ❌ WRONG - Custom AEC implementation
class CustomEchoCanceller {
  cancelEcho(input: AudioBuffer, output: AudioBuffer) {
    // Complex DSP code...
  }
}

Why this fails:

Reinventing the wheel
Browser AEC is battle-tested by billions of users
Extremely complex to implement correctly
Requires deep DSP knowledge
Performance issues
Platform-specific tuning needed

❌ DON'T: Add Artificial Delays

typescript

// ❌ WRONG - Delays for echo prevention
await new Promise(resolve => setTimeout(resolve, 500))
await playAudio()
await new Promise(resolve => setTimeout(resolve, 500))
resumeRecording()

Why this fails:

Not necessary (browser AEC handles it)
Degrades user experience
Adds latency
Still doesn't prevent echo if implementation is wrong

What You CAN Change

✅ DO: Adjust Audio Settings

typescript

// ✅ Can tweak these settings
const constraints = {
  audio: {
    echoCancellation: true,      // MUST be true
    noiseSuppression: true,       // Can adjust
    autoGainControl: true,        // Can adjust
    sampleRate: 24000,            // Can change for quality
    channelCount: 1,              // Mono is fine for voice
  },
}

✅ DO: Handle Connection States

typescript

// ✅ Can manage connection lifecycle
async function connectVoice() {
  // Setup WebRTC connection
  // Start streaming
  // Handle connection events
}

async function disconnectVoice() {
  // Clean up WebRTC connection
  // Stop streaming
  // Release microphone
}

✅ DO: Add UI Feedback

typescript

// ✅ Can add visual indicators
function onVoiceActivity(active: boolean) {
  if (active) {
    // Show "listening" indicator
    // Animate microphone icon
  } else {
    // Show "idle" indicator
  }
}

✅ DO: Handle Errors

typescript

// ✅ Can improve error handling
try {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true })
} catch (error) {
  if (error.name === 'NotAllowedError') {
    // Show permission request UI
  } else if (error.name === 'NotFoundError') {
    // Show "no microphone found" error
  }
}

Implementation Location

Primary file: src/lib/openai-realtime.ts

Key functions:

setupMediaStream() - Initializes microphone with correct constraints
connectToOpenAI() - Establishes WebRTC connection
handleAudioResponse() - Plays AI responses via HTMLAudioElement

DO NOT modify:

Microphone enable/disable logic (should stay always-on)
Echo cancellation settings (must be true)
Audio routing (must stay in browser pipeline)

CAN modify:

UI feedback and visual indicators
Error handling and user messages
Connection retry logic
Audio quality settings (sample rate, etc.)

Why This Architecture Was Chosen

From ADR-001:

Option A: Trust Industry Pattern (CHOSEN)

✅ Used by ChatGPT, Google Meet, Zoom, Discord
✅ Browser vendors optimize for this pattern
✅ Works across all platforms (macOS, Windows, Linux, mobile)
✅ No custom implementation needed
✅ Proven reliability

Options Rejected:

❌ Manual microphone toggling (not industry standard)
❌ AudioWorklet native bypass (breaks AEC)
❌ Custom echo cancellation (reinventing wheel)
❌ Native Rust WebRTC (2-4 weeks work for no benefit)

Platform Context: Web Application

IMPORTANT: Quetrex is now a pure web application, not a Tauri desktop app.

Why this matters for voice:

Browser echo cancellation works perfectly in all browsers
No WKWebView limitations (that was the Tauri problem)
Universal compatibility (Chrome, Safari, Firefox, Edge)
No platform-specific audio handling needed
Just works™️

The WKWebView Problem (Historical): Quetrex was originally a Tauri desktop app. On macOS, Tauri uses WKWebView, which has a bug: WebRTC audio playback doesn't work. This forced us to try workarounds (AudioWorklet bypass, native audio routing), which all broke echo cancellation.

Solution: Convert to web application. Now echo cancellation works perfectly everywhere.

Testing Voice Changes

If you must modify voice code:

Test on real browsers (not just dev tools)
Test the echo scenario:
- Turn up speaker volume
- Start voice conversation
- Verify no feedback loop/echo
Test across platforms:
- Chrome (most common)
- Safari (macOS, iOS)
- Firefox
- Edge
Test edge cases:
- Poor network connection
- Microphone permission denied
- Mid-conversation disconnect

When to Consult Documentation

Before any voice changes, read:

docs/decisions/ADR-001-VOICE-ECHO-CANCELLATION.md
docs/architecture/VOICE-SYSTEM.md
docs/development/abandoned-approaches/

If you see mentions of:

Tauri → Ignore (Quetrex is web app now)
WKWebView → Ignore (not relevant anymore)
AudioWorklet bypass → Don't implement (already tried, failed)
Manual mic toggling → Don't implement (not industry standard)

Summary

The Golden Rule: Trust browser echo cancellation. Always-on microphone + echoCancellation: true + audio in browser pipeline = Perfect echo cancellation.

DO:

Keep microphone enabled throughout conversation
Use echoCancellation: true
Keep audio in browser (HTMLAudioElement)
Let OpenAI Realtime API handle VAD
Trust the industry pattern

DON'T:

Toggle microphone track
Route audio outside browser
Implement custom echo cancellation
Add artificial delays
Try to "improve" what already works

If in doubt: Read ADR-001 and VOICE-SYSTEM.md. The architecture is thoroughly documented for a reason.

Maintainer

aiskillstore Core maintainer

Source details

Full Name: aiskillstore/marketplace
Branch: main
Path in repo: skills/barnhardt-enterprises-inc/voice-system-expert
Topics: claude-code claude codex-skills skills codex claude-skills ai-skills

Featured Tools

Join Our Newsletter

Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.

232 15

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Quetrex Voice System Expert

CRITICAL: Read This First

Core Architecture Decision

ALWAYS-ON MICROPHONE + BROWSER AEC

How It Works

Audio Pipeline

Why This Works

DO NOT DO THESE THINGS

❌ DON'T: Toggle Microphone Track

❌ DON'T: Route Audio Outside Browser

❌ DON'T: Implement Custom Echo Cancellation

❌ DON'T: Add Artificial Delays

What You CAN Change

✅ DO: Adjust Audio Settings

✅ DO: Handle Connection States

✅ DO: Add UI Feedback

✅ DO: Handle Errors

Implementation Location

Why This Architecture Was Chosen

Platform Context: Web Application

Testing Voice Changes

When to Consult Documentation

Summary

Recommended Agent Skills

perigon-backend

perigon-agent

perigon-angular

fastapi-mastery

context7-efficient

browser-use