Agent skill
voice-system-expert
Use when working with Quetrex's voice interface, OpenAI Realtime API, WebRTC, or echo cancellation. Knows Quetrex's specific voice architecture decisions and patterns. CRITICAL - prevents breaking working voice system.
Install this agent skill to your Project
npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/barnhardt-enterprises-inc/voice-system-expert
SKILL.md
Quetrex Voice System Expert
CRITICAL: Read This First
Quetrex's voice system architecture is extensively documented and battle-tested. Before making ANY changes to voice-related code, you MUST read:
- ADR-001-VOICE-ECHO-CANCELLATION.md (definitive architectural decision)
- docs/architecture/VOICE-SYSTEM.md (technical implementation)
- docs/features/voice-interface.md (user-facing features)
Location: src/lib/openai-realtime.ts
Core Architecture Decision
ALWAYS-ON MICROPHONE + BROWSER AEC
This is Decision 4 from VOICE-SYSTEM.md and the definitive approach documented in ADR-001.
// ✅ CORRECT: Always-on microphone
const mediaStream = await navigator.mediaDevices.getUserMedia({
audio: {
echoCancellation: true, // CRITICAL - browser handles echo cancellation
noiseSuppression: true,
autoGainControl: true,
},
})
// Microphone track stays ENABLED throughout conversation
// Browser's native AEC prevents feedback loops
// Server-side VAD (Voice Activity Detection) handles turn detection
How It Works
Audio Pipeline
User speaks
↓
Microphone (always enabled, echoCancellation: true)
↓
WebRTC → OpenAI Realtime API
↓
Server-side VAD detects speech
↓
OpenAI processes and responds
↓
Audio response via WebRTC
↓
HTMLAudioElement playback (stays in browser pipeline)
↓
Browser AEC compares mic input + speaker output
↓
Echo automatically canceled (no feedback loop)
Why This Works
Browser Echo Cancellation Requirements:
- Both microphone input AND speaker output must be in browser's audio graph
- Audio must flow through browser's WebRTC stack
- No manual intervention needed (browser handles it)
This is the industry standard:
- ChatGPT voice mode
- Google Meet
- Zoom
- Discord
- Microsoft Teams
They ALL use always-on microphone + browser AEC.
DO NOT DO THESE THINGS
❌ DON'T: Toggle Microphone Track
// ❌ WRONG - This breaks echo cancellation
async function pauseRecording() {
microphone.enabled = false // DON'T DO THIS
}
async function resumeRecording() {
microphone.enabled = true // DON'T DO THIS
}
Why this fails:
- Not industry standard
- Causes audio glitches
- Can break echo cancellation
- Adds artificial delays
- More complex state management
- No real benefit
❌ DON'T: Route Audio Outside Browser
// ❌ WRONG - AudioWorklet bypass to native audio
const workletNode = new AudioWorkletNode(audioContext, 'bypass-processor')
workletNode.port.postMessage({ cmd: 'route-to-native' })
Why this fails:
- Breaks browser AEC (audio leaves browser pipeline)
- Causes echo/feedback loops
- Requires complex native audio handling
- Platform-specific implementations
- Already tried and failed (see abandoned-approaches/)
❌ DON'T: Implement Custom Echo Cancellation
// ❌ WRONG - Custom AEC implementation
class CustomEchoCanceller {
cancelEcho(input: AudioBuffer, output: AudioBuffer) {
// Complex DSP code...
}
}
Why this fails:
- Reinventing the wheel
- Browser AEC is battle-tested by billions of users
- Extremely complex to implement correctly
- Requires deep DSP knowledge
- Performance issues
- Platform-specific tuning needed
❌ DON'T: Add Artificial Delays
// ❌ WRONG - Delays for echo prevention
await new Promise(resolve => setTimeout(resolve, 500))
await playAudio()
await new Promise(resolve => setTimeout(resolve, 500))
resumeRecording()
Why this fails:
- Not necessary (browser AEC handles it)
- Degrades user experience
- Adds latency
- Still doesn't prevent echo if implementation is wrong
What You CAN Change
✅ DO: Adjust Audio Settings
// ✅ Can tweak these settings
const constraints = {
audio: {
echoCancellation: true, // MUST be true
noiseSuppression: true, // Can adjust
autoGainControl: true, // Can adjust
sampleRate: 24000, // Can change for quality
channelCount: 1, // Mono is fine for voice
},
}
✅ DO: Handle Connection States
// ✅ Can manage connection lifecycle
async function connectVoice() {
// Setup WebRTC connection
// Start streaming
// Handle connection events
}
async function disconnectVoice() {
// Clean up WebRTC connection
// Stop streaming
// Release microphone
}
✅ DO: Add UI Feedback
// ✅ Can add visual indicators
function onVoiceActivity(active: boolean) {
if (active) {
// Show "listening" indicator
// Animate microphone icon
} else {
// Show "idle" indicator
}
}
✅ DO: Handle Errors
// ✅ Can improve error handling
try {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true })
} catch (error) {
if (error.name === 'NotAllowedError') {
// Show permission request UI
} else if (error.name === 'NotFoundError') {
// Show "no microphone found" error
}
}
Implementation Location
Primary file: src/lib/openai-realtime.ts
Key functions:
setupMediaStream()- Initializes microphone with correct constraintsconnectToOpenAI()- Establishes WebRTC connectionhandleAudioResponse()- Plays AI responses via HTMLAudioElement
DO NOT modify:
- Microphone enable/disable logic (should stay always-on)
- Echo cancellation settings (must be true)
- Audio routing (must stay in browser pipeline)
CAN modify:
- UI feedback and visual indicators
- Error handling and user messages
- Connection retry logic
- Audio quality settings (sample rate, etc.)
Why This Architecture Was Chosen
From ADR-001:
Option A: Trust Industry Pattern (CHOSEN)
- ✅ Used by ChatGPT, Google Meet, Zoom, Discord
- ✅ Browser vendors optimize for this pattern
- ✅ Works across all platforms (macOS, Windows, Linux, mobile)
- ✅ No custom implementation needed
- ✅ Proven reliability
Options Rejected:
- ❌ Manual microphone toggling (not industry standard)
- ❌ AudioWorklet native bypass (breaks AEC)
- ❌ Custom echo cancellation (reinventing wheel)
- ❌ Native Rust WebRTC (2-4 weeks work for no benefit)
Platform Context: Web Application
IMPORTANT: Quetrex is now a pure web application, not a Tauri desktop app.
Why this matters for voice:
- Browser echo cancellation works perfectly in all browsers
- No WKWebView limitations (that was the Tauri problem)
- Universal compatibility (Chrome, Safari, Firefox, Edge)
- No platform-specific audio handling needed
- Just works™️
The WKWebView Problem (Historical): Quetrex was originally a Tauri desktop app. On macOS, Tauri uses WKWebView, which has a bug: WebRTC audio playback doesn't work. This forced us to try workarounds (AudioWorklet bypass, native audio routing), which all broke echo cancellation.
Solution: Convert to web application. Now echo cancellation works perfectly everywhere.
Testing Voice Changes
If you must modify voice code:
- Test on real browsers (not just dev tools)
- Test the echo scenario:
- Turn up speaker volume
- Start voice conversation
- Verify no feedback loop/echo
- Test across platforms:
- Chrome (most common)
- Safari (macOS, iOS)
- Firefox
- Edge
- Test edge cases:
- Poor network connection
- Microphone permission denied
- Mid-conversation disconnect
When to Consult Documentation
Before any voice changes, read:
docs/decisions/ADR-001-VOICE-ECHO-CANCELLATION.mddocs/architecture/VOICE-SYSTEM.mddocs/development/abandoned-approaches/
If you see mentions of:
- Tauri → Ignore (Quetrex is web app now)
- WKWebView → Ignore (not relevant anymore)
- AudioWorklet bypass → Don't implement (already tried, failed)
- Manual mic toggling → Don't implement (not industry standard)
Summary
The Golden Rule: Trust browser echo cancellation. Always-on microphone + echoCancellation: true + audio in browser pipeline = Perfect echo cancellation.
DO:
- Keep microphone enabled throughout conversation
- Use
echoCancellation: true - Keep audio in browser (HTMLAudioElement)
- Let OpenAI Realtime API handle VAD
- Trust the industry pattern
DON'T:
- Toggle microphone track
- Route audio outside browser
- Implement custom echo cancellation
- Add artificial delays
- Try to "improve" what already works
If in doubt: Read ADR-001 and VOICE-SYSTEM.md. The architecture is thoroughly documented for a reason.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
perigon-backend
Perigon ASP.NET Core + EF Core + Aspire conventions
perigon-agent
Pointers for Copilot/agents to apply Perigon conventions
perigon-angular
Angular 21+ standalone/Material/signal conventions for Perigon WebApp
fastapi-mastery
Comprehensive FastAPI development skill covering REST API creation, routing, request/response handling, validation, authentication, database integration, middleware, and deployment. Use when working with FastAPI projects, building APIs, implementing CRUD operations, setting up authentication/authorization, integrating databases (SQL/NoSQL), adding middleware, handling WebSockets, or deploying FastAPI applications. Triggered by requests involving .py files with FastAPI code, API endpoint creation, Pydantic models, or FastAPI-specific features.
context7-efficient
Token-efficient library documentation fetcher using Context7 MCP with 86.8% token savings through intelligent shell pipeline filtering. Fetches code examples, API references, and best practices for JavaScript, Python, Go, Rust, and other libraries. Use when users ask about library documentation, need code examples, want API usage patterns, are learning a new framework, need syntax reference, or troubleshooting with library-specific information. Triggers include questions like "Show me React hooks", "How do I use Prisma", "What's the Next.js routing syntax", or any request for library/framework documentation.
browser-use
Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.
Didn't find tool you were looking for?