Agent skill
wavecap-silence
Tune WaveCap silence detection and voice activity settings. Use when the user wants to adjust chunk boundaries, reduce clipping, tune VAD sensitivity, or optimize latency vs accuracy trade-offs.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/wavecap-silence
SKILL.md
WaveCap Silence Detection & Chunking Skill
Use this skill to tune how WaveCap detects speech boundaries and groups audio into chunks for transcription.
Configuration Location
Silence/chunking settings are in the whisper: section:
- User config:
/Users/thw/Projects/WaveCap/state/config.yaml - Default config:
/Users/thw/Projects/WaveCap/backend/default-config.yaml
Core Parameters
Silence Threshold (amplitude floor for speech detection)
whisper:
silenceThreshold: 0.015 # RMS amplitude (0.0-1.0)
| Value | Effect |
|---|---|
| 0.005-0.01 | Very sensitive, detects quiet speech, may trigger on noise |
| 0.015-0.02 | Balanced for most radio sources |
| 0.025-0.04 | Less sensitive, ignores background noise, may miss soft speech |
Symptoms & Fixes:
- Speech cut off at start → Lower threshold (e.g., 0.015 → 0.01)
- False triggers on noise → Raise threshold (e.g., 0.015 → 0.025)
Silence Hold Seconds (wait before cutting)
whisper:
silenceHoldSeconds: 1.8 # Seconds of silence before flush
| Value | Effect |
|---|---|
| 0.5-0.8 | Fast updates, may clip trailing words |
| 1.0-1.5 | Balanced latency vs completeness |
| 1.8-2.5 | Safer for complete sentences, higher latency |
Symptoms & Fixes:
- Words cut off at end → Increase (e.g., 1.2 → 1.8)
- Updates feel slow → Decrease (e.g., 1.8 → 1.2)
Silence Lookback Seconds (evaluation window)
whisper:
silenceLookbackSeconds: 3.0 # Window for silence detection
| Value | Effect |
|---|---|
| 1.0-2.0 | Responsive but may trigger on brief pauses |
| 3.0-4.0 | More stable, requires sustained silence |
| 5.0+ | Very stable, may delay chunk boundaries |
Active Samples Percentage (VAD sensitivity)
whisper:
activeSamplesInLookbackPct: 0.10 # 10% of samples must be active
| Value | Effect |
|---|---|
| 0.05 (5%) | Very sensitive, keeps chunks open longer |
| 0.10 (10%) | Balanced default |
| 0.20 (20%) | Requires more sustained speech to keep chunk open |
Chunk Length (maximum chunk duration)
whisper:
chunkLength: 20 # Maximum seconds before forced flush
| Value | Effect |
|---|---|
| 10-15 | Low latency, may break mid-sentence |
| 20-30 | Balanced for real-time monitoring |
| 45-60 | Better sentence structure, higher latency |
Minimum Chunk Duration
whisper:
minChunkDurationSeconds: 12 # Minimum before silence triggers flush
| Value | Effect |
|---|---|
| 4-8 | Frequent small updates |
| 10-15 | Balanced, avoids tiny fragments |
| 15-20 | Fewer but larger chunks |
Context Overlap (between chunks)
whisper:
contextSeconds: 1.5 # Overlap for Whisper context
| Value | Effect |
|---|---|
| 0.5 | Low latency, may lose speech at boundaries |
| 1.0-2.0 | Balanced overlap for boundary recovery |
| 3.0-5.0 | Maximum safety, more redundant processing |
View Current Settings
grep -E "(silence|chunk|context|active)" /Users/thw/Projects/WaveCap/state/config.yaml
Diagnose Clipping Issues
Check for premature clipping (speech cut at start)
curl -s http://localhost:8000/api/transcriptions/export | \
jq '[.[] | select(.text | startswith("..."))] | length' | \
xargs -I{} echo "Transcriptions starting with '...': {}"
Check for late clipping (speech cut at end)
curl -s http://localhost:8000/api/transcriptions/export | \
jq '[.[] | select(.text | test("[a-z]$"))] | length' | \
xargs -I{} echo "Transcriptions ending mid-word: {}"
Analyze audio amplitude at boundaries
cd /Users/thw/Projects/WaveCap/backend && source .venv/bin/activate && python3 << 'EOF'
import numpy as np
from pathlib import Path
RECORDINGS = Path("/Users/thw/Projects/WaveCap/state/recordings")
def analyze_boundaries(filepath):
with open(filepath, 'rb') as f:
f.read(44)
data = f.read()
samples = np.frombuffer(data, dtype=np.int16).astype(np.float32) / 32768.0
if len(samples) < 1600:
return None
first_50ms = np.sqrt(np.mean(samples[:800]**2))
last_50ms = np.sqrt(np.mean(samples[-800:]**2))
return first_50ms, last_50ms
print("Boundary analysis (high RMS at start/end = potential clipping):")
print("-" * 70)
files = sorted(RECORDINGS.glob("*.wav"), key=lambda x: x.stat().st_mtime, reverse=True)[:15]
clip_start = clip_end = 0
for f in files:
result = analyze_boundaries(f)
if result:
first, last = result
issues = []
if first > 0.06: issues.append("HIGH_START"); clip_start += 1
if last > 0.06: issues.append("HIGH_END"); clip_end += 1
status = ", ".join(issues) if issues else "OK"
print(f"{f.name[-45:]}: start={first:.4f} end={last:.4f} [{status}]")
print("-" * 70)
print(f"Summary: {clip_start} potential start clips, {clip_end} potential end clips")
EOF
Tuning Scenarios
Reduce Clipping (speech being cut off)
whisper:
silenceThreshold: 0.012 # More sensitive
silenceHoldSeconds: 2.0 # Wait longer before cutting
silenceLookbackSeconds: 3.5 # Larger evaluation window
contextSeconds: 2.0 # More overlap
activeSamplesInLookbackPct: 0.08 # Keep chunks open easier
Lower Latency (faster updates)
whisper:
silenceThreshold: 0.025 # Less sensitive
silenceHoldSeconds: 0.8 # Cut quickly on silence
silenceLookbackSeconds: 1.5 # Smaller window
chunkLength: 15 # Shorter max chunks
minChunkDurationSeconds: 6 # Allow smaller chunks
contextSeconds: 0.5 # Minimal overlap
Noisy Environment (false triggers)
whisper:
silenceThreshold: 0.035 # Ignore low-level noise
silenceHoldSeconds: 1.0 # Quick silence detection
activeSamplesInLookbackPct: 0.20 # Require 20% active
Quiet Environment (soft speakers)
whisper:
silenceThreshold: 0.008 # Very sensitive
silenceHoldSeconds: 2.0 # Patient with pauses
activeSamplesInLookbackPct: 0.05 # 5% is enough
Apply Changes
launchctl stop com.wavecap.server && sleep 2 && launchctl start com.wavecap.server
Latency vs Accuracy Trade-offs
| Priority | chunkLength | minChunk | silenceHold | context |
|---|---|---|---|---|
| Low latency | 15 | 6 | 0.8 | 0.5 |
| Balanced | 20 | 12 | 1.5 | 1.5 |
| High accuracy | 45 | 20 | 2.5 | 3.0 |
Tips
- Changes take effect on service restart
- Test with live audio after changes
- Monitor clipping rates over 50+ transcriptions before final tuning
silenceThresholdandsilenceHoldSecondshave the biggest impact- Higher
contextSecondshelps but adds processing overhead
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?