Agent skill

wavecap-silence

Tune WaveCap silence detection and voice activity settings. Use when the user wants to adjust chunk boundaries, reduce clipping, tune VAD sensitivity, or optimize latency vs accuracy trade-offs.

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/wavecap-silence

SKILL.md

WaveCap Silence Detection & Chunking Skill

Use this skill to tune how WaveCap detects speech boundaries and groups audio into chunks for transcription.

Configuration Location

Silence/chunking settings are in the whisper: section:

User config: /Users/thw/Projects/WaveCap/state/config.yaml
Default config: /Users/thw/Projects/WaveCap/backend/default-config.yaml

Core Parameters

Silence Threshold (amplitude floor for speech detection)

yaml

whisper:
  silenceThreshold: 0.015  # RMS amplitude (0.0-1.0)

Value	Effect
0.005-0.01	Very sensitive, detects quiet speech, may trigger on noise
0.015-0.02	Balanced for most radio sources
0.025-0.04	Less sensitive, ignores background noise, may miss soft speech

Symptoms & Fixes:

Speech cut off at start → Lower threshold (e.g., 0.015 → 0.01)
False triggers on noise → Raise threshold (e.g., 0.015 → 0.025)

Silence Hold Seconds (wait before cutting)

yaml

whisper:
  silenceHoldSeconds: 1.8  # Seconds of silence before flush

Value	Effect
0.5-0.8	Fast updates, may clip trailing words
1.0-1.5	Balanced latency vs completeness
1.8-2.5	Safer for complete sentences, higher latency

Symptoms & Fixes:

Words cut off at end → Increase (e.g., 1.2 → 1.8)
Updates feel slow → Decrease (e.g., 1.8 → 1.2)

Silence Lookback Seconds (evaluation window)

yaml

whisper:
  silenceLookbackSeconds: 3.0  # Window for silence detection

Value	Effect
1.0-2.0	Responsive but may trigger on brief pauses
3.0-4.0	More stable, requires sustained silence
5.0+	Very stable, may delay chunk boundaries

Active Samples Percentage (VAD sensitivity)

yaml

whisper:
  activeSamplesInLookbackPct: 0.10  # 10% of samples must be active

Value	Effect
0.05 (5%)	Very sensitive, keeps chunks open longer
0.10 (10%)	Balanced default
0.20 (20%)	Requires more sustained speech to keep chunk open

Chunk Length (maximum chunk duration)

yaml

whisper:
  chunkLength: 20  # Maximum seconds before forced flush

Value	Effect
10-15	Low latency, may break mid-sentence
20-30	Balanced for real-time monitoring
45-60	Better sentence structure, higher latency

Minimum Chunk Duration

yaml

whisper:
  minChunkDurationSeconds: 12  # Minimum before silence triggers flush

Value	Effect
4-8	Frequent small updates
10-15	Balanced, avoids tiny fragments
15-20	Fewer but larger chunks

Context Overlap (between chunks)

yaml

whisper:
  contextSeconds: 1.5  # Overlap for Whisper context

Value	Effect
0.5	Low latency, may lose speech at boundaries
1.0-2.0	Balanced overlap for boundary recovery
3.0-5.0	Maximum safety, more redundant processing

View Current Settings

bash

grep -E "(silence|chunk|context|active)" /Users/thw/Projects/WaveCap/state/config.yaml

Diagnose Clipping Issues

Check for premature clipping (speech cut at start)

bash

curl -s http://localhost:8000/api/transcriptions/export | \
  jq '[.[] | select(.text | startswith("..."))] | length' | \
  xargs -I{} echo "Transcriptions starting with '...': {}"

Check for late clipping (speech cut at end)

bash

curl -s http://localhost:8000/api/transcriptions/export | \
  jq '[.[] | select(.text | test("[a-z]$"))] | length' | \
  xargs -I{} echo "Transcriptions ending mid-word: {}"

Analyze audio amplitude at boundaries

bash

cd /Users/thw/Projects/WaveCap/backend && source .venv/bin/activate && python3 << 'EOF'
import numpy as np
from pathlib import Path

RECORDINGS = Path("/Users/thw/Projects/WaveCap/state/recordings")

def analyze_boundaries(filepath):
    with open(filepath, 'rb') as f:
        f.read(44)
        data = f.read()
        samples = np.frombuffer(data, dtype=np.int16).astype(np.float32) / 32768.0
        if len(samples) < 1600:
            return None
        first_50ms = np.sqrt(np.mean(samples[:800]**2))
        last_50ms = np.sqrt(np.mean(samples[-800:]**2))
        return first_50ms, last_50ms

print("Boundary analysis (high RMS at start/end = potential clipping):")
print("-" * 70)
files = sorted(RECORDINGS.glob("*.wav"), key=lambda x: x.stat().st_mtime, reverse=True)[:15]
clip_start = clip_end = 0
for f in files:
    result = analyze_boundaries(f)
    if result:
        first, last = result
        issues = []
        if first > 0.06: issues.append("HIGH_START"); clip_start += 1
        if last > 0.06: issues.append("HIGH_END"); clip_end += 1
        status = ", ".join(issues) if issues else "OK"
        print(f"{f.name[-45:]}: start={first:.4f} end={last:.4f} [{status}]")
print("-" * 70)
print(f"Summary: {clip_start} potential start clips, {clip_end} potential end clips")
EOF

Tuning Scenarios

Reduce Clipping (speech being cut off)

yaml

whisper:
  silenceThreshold: 0.012        # More sensitive
  silenceHoldSeconds: 2.0        # Wait longer before cutting
  silenceLookbackSeconds: 3.5    # Larger evaluation window
  contextSeconds: 2.0            # More overlap
  activeSamplesInLookbackPct: 0.08  # Keep chunks open easier

Lower Latency (faster updates)

yaml

whisper:
  silenceThreshold: 0.025        # Less sensitive
  silenceHoldSeconds: 0.8        # Cut quickly on silence
  silenceLookbackSeconds: 1.5    # Smaller window
  chunkLength: 15                # Shorter max chunks
  minChunkDurationSeconds: 6     # Allow smaller chunks
  contextSeconds: 0.5            # Minimal overlap

Noisy Environment (false triggers)

yaml

whisper:
  silenceThreshold: 0.035        # Ignore low-level noise
  silenceHoldSeconds: 1.0        # Quick silence detection
  activeSamplesInLookbackPct: 0.20  # Require 20% active

Quiet Environment (soft speakers)

yaml

whisper:
  silenceThreshold: 0.008        # Very sensitive
  silenceHoldSeconds: 2.0        # Patient with pauses
  activeSamplesInLookbackPct: 0.05  # 5% is enough

Apply Changes

bash

launchctl stop com.wavecap.server && sleep 2 && launchctl start com.wavecap.server

Latency vs Accuracy Trade-offs

Priority	chunkLength	minChunk	silenceHold	context
Low latency	15	6	0.8	0.5
Balanced	20	12	1.5	1.5
High accuracy	45	20	2.5	3.0

Tips

Changes take effect on service restart
Test with live audio after changes
Monitor clipping rates over 50+ transcriptions before final tuning
silenceThreshold and silenceHoldSeconds have the biggest impact
Higher contextSeconds helps but adds processing overhead

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/wavecap-silence
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

WaveCap Silence Detection & Chunking Skill

Configuration Location

Core Parameters

Silence Threshold (amplitude floor for speech detection)

Silence Hold Seconds (wait before cutting)

Silence Lookback Seconds (evaluation window)

Active Samples Percentage (VAD sensitivity)

Chunk Length (maximum chunk duration)

Minimum Chunk Duration

Context Overlap (between chunks)

View Current Settings

Diagnose Clipping Issues

Check for premature clipping (speech cut at start)

Check for late clipping (speech cut at end)

Analyze audio amplitude at boundaries

Tuning Scenarios

Reduce Clipping (speech being cut off)

Lower Latency (faster updates)

Noisy Environment (false triggers)

Quiet Environment (soft speakers)

Apply Changes

Latency vs Accuracy Trade-offs

Tips

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state