Agent skills
wake-word-detection

Agent skill

wake-word-detection

Expert skill for implementing wake word detection with openWakeWord. Covers audio monitoring, keyword spotting, privacy protection, and efficient always-listening systems for JARVIS voice assistant.

View SKILL.md on GitHub Repository

Stars 32

Forks 3

Install this agent skill to your Project

npx add-skill https://github.com/martinholovsky/claude-skills-generator/tree/main/skills/wake-word-detection

SKILL.md

Wake Word Detection Skill

1. Overview

Risk Level: MEDIUM - Continuous audio monitoring, privacy implications, resource constraints

You are an expert in wake word detection with deep expertise in openWakeWord, keyword spotting, and always-listening systems.

Primary Use Cases:

JARVIS activation phrase detection ("Hey JARVIS")
Always-listening with minimal resource usage
Offline wake word detection (no cloud dependency)

2. Core Principles

TDD First - Write tests before implementation code
Performance Aware - Optimize for CPU, memory, and latency
Privacy Preserving - Never store audio, minimize buffers
Accuracy Focused - Minimize false positives/negatives
Resource Efficient - Target <5% CPU, <100MB memory

3. Core Responsibilities

3.1 Privacy-First Monitoring

Process locally - Never send audio to external services
Buffer minimally - Only keep audio needed for detection
Discard non-wake - Immediately discard non-wake audio
User control - Easy disable/pause functionality

3.2 Efficiency Requirements

Minimal CPU usage (<5% average)
Low memory footprint (<100MB)
Low latency detection (<500ms)
Low false positive rate (<1 per hour)

4. Technical Foundation

python

# requirements.txt
openwakeword>=0.6.0
numpy>=1.24.0
sounddevice>=0.4.6
onnxruntime>=1.16.0

5. Implementation Workflow (TDD)

Step 1: Write Failing Test First

python

# tests/test_wake_word.py
import pytest
import numpy as np
from unittest.mock import Mock, patch

class TestWakeWordDetector:
    """TDD tests for wake word detection."""

    def test_detection_accuracy_threshold(self):
        """Test that detector respects confidence threshold."""
        from wake_word import SecureWakeWordDetector

        detector = SecureWakeWordDetector(threshold=0.7)
        callback = Mock()
        test_audio = np.random.randn(16000).astype(np.float32)

        with patch.object(detector.model, 'predict') as mock_predict:
            # Below threshold - should not trigger
            mock_predict.return_value = {"hey_jarvis": np.array([0.5])}
            detector._test_process(test_audio, callback)
            callback.assert_not_called()

            # Above threshold - should trigger
            mock_predict.return_value = {"hey_jarvis": np.array([0.8])}
            detector._test_process(test_audio, callback)
            callback.assert_called_once()

    def test_buffer_cleared_after_detection(self):
        """Test privacy: buffer cleared immediately after detection."""
        from wake_word import SecureWakeWordDetector

        detector = SecureWakeWordDetector()
        detector.audio_buffer.extend(np.zeros(16000))

        with patch.object(detector.model, 'predict') as mock_predict:
            mock_predict.return_value = {"hey_jarvis": np.array([0.9])}
            detector._process_audio()

        assert len(detector.audio_buffer) == 0, "Buffer must be cleared"

    def test_cpu_usage_under_threshold(self):
        """Test CPU usage stays under 5%."""
        import psutil
        import time
        from wake_word import SecureWakeWordDetector

        detector = SecureWakeWordDetector()
        process = psutil.Process()
        start_time = time.time()

        while time.time() - start_time < 10:
            audio = np.random.randn(1600).astype(np.float32)
            detector.audio_buffer.extend(audio)
            if len(detector.audio_buffer) >= 16000:
                detector._process_audio()

        avg_cpu = process.cpu_percent() / psutil.cpu_count()
        assert avg_cpu < 5, f"CPU usage too high: {avg_cpu}%"

    def test_memory_footprint(self):
        """Test memory usage stays under 100MB."""
        import tracemalloc
        from wake_word import SecureWakeWordDetector

        tracemalloc.start()
        detector = SecureWakeWordDetector()

        for _ in range(600):
            audio = np.random.randn(1600).astype(np.float32)
            detector.audio_buffer.extend(audio)

        current, peak = tracemalloc.get_traced_memory()
        tracemalloc.stop()

        peak_mb = peak / 1024 / 1024
        assert peak_mb < 100, f"Memory too high: {peak_mb}MB"

Step 2: Implement Minimum to Pass

python

class SecureWakeWordDetector:
    def __init__(self, threshold=0.5):
        self.threshold = threshold
        self.model = Model(wakeword_models=["hey_jarvis"])
        self.audio_buffer = deque(maxlen=24000)

    def _test_process(self, audio, callback):
        predictions = self.model.predict(audio)
        for model_name, scores in predictions.items():
            if np.max(scores) > self.threshold:
                self.audio_buffer.clear()
                callback(model_name, np.max(scores))
                break

Step 3: Run Full Verification

bash

pytest tests/test_wake_word.py -v
pytest --cov=wake_word --cov-report=term-missing

6. Implementation Patterns

Pattern 1: Secure Wake Word Detector

python

from openwakeword.model import Model
import numpy as np
import sounddevice as sd
from collections import deque
import structlog

logger = structlog.get_logger()

class SecureWakeWordDetector:
    """Privacy-preserving wake word detection."""

    def __init__(self, model_path: str = None, threshold: float = 0.5, sample_rate: int = 16000):
        if model_path:
            self.model = Model(wakeword_models=[model_path])
        else:
            self.model = Model(wakeword_models=["hey_jarvis"])

        self.threshold = threshold
        self.sample_rate = sample_rate
        self.buffer_size = int(sample_rate * 1.5)
        self.audio_buffer = deque(maxlen=self.buffer_size)
        self.is_listening = False
        self.on_wake = None

    def start(self, callback):
        """Start listening for wake word."""
        self.on_wake = callback
        self.is_listening = True

        def audio_callback(indata, frames, time, status):
            if not self.is_listening:
                return
            audio = indata[:, 0] if len(indata.shape) > 1 else indata
            self.audio_buffer.extend(audio)
            if len(self.audio_buffer) >= self.sample_rate:
                self._process_audio()

        self.stream = sd.InputStream(
            samplerate=self.sample_rate, channels=1, dtype=np.float32,
            callback=audio_callback, blocksize=int(self.sample_rate * 0.1)
        )
        self.stream.start()

    def _process_audio(self):
        """Process audio buffer for wake word."""
        audio = np.array(list(self.audio_buffer))
        predictions = self.model.predict(audio)

        for model_name, scores in predictions.items():
            if np.max(scores) > self.threshold:
                self.audio_buffer.clear()  # Privacy: clear immediately
                if self.on_wake:
                    self.on_wake(model_name, np.max(scores))
                break

    def stop(self):
        """Stop listening."""
        self.is_listening = False
        if hasattr(self, 'stream'):
            self.stream.stop()
            self.stream.close()
        self.audio_buffer.clear()

Pattern 2: False Positive Reduction

python

class RobustDetector:
    """Reduce false positives with confirmation."""

    def __init__(self, detector: SecureWakeWordDetector):
        self.detector = detector
        self.detection_history = []
        self.confirmation_window = 2.0
        self.min_confirmations = 2

    def on_potential_wake(self, model: str, confidence: float):
        now = time.time()
        self.detection_history.append({"time": now, "confidence": confidence})
        self.detection_history = [d for d in self.detection_history if now - d["time"] < self.confirmation_window]

        if len(self.detection_history) >= self.min_confirmations:
            avg_confidence = np.mean([d["confidence"] for d in self.detection_history])
            if avg_confidence > 0.6:
                self.detection_history.clear()
                return True
        return False

7. Performance Patterns

Pattern 1: Model Quantization

python

# Good - Use quantized ONNX model
import onnxruntime as ort

class QuantizedDetector:
    def __init__(self, model_path: str):
        sess_options = ort.SessionOptions()
        sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
        self.session = ort.InferenceSession(model_path, sess_options, providers=['CPUExecutionProvider'])

# Bad - Full precision model
class SlowDetector:
    def __init__(self, model_path: str):
        self.session = ort.InferenceSession(model_path)  # No optimization

Pattern 2: Efficient Audio Buffering

python

# Good - Pre-allocated numpy buffer with circular indexing
class EfficientBuffer:
    def __init__(self, size: int):
        self.buffer = np.zeros(size, dtype=np.float32)
        self.write_idx = 0
        self.size = size

    def append(self, audio: np.ndarray):
        n = len(audio)
        end_idx = (self.write_idx + n) % self.size
        if end_idx > self.write_idx:
            self.buffer[self.write_idx:end_idx] = audio
        else:
            self.buffer[self.write_idx:] = audio[:self.size - self.write_idx]
            self.buffer[:end_idx] = audio[self.size - self.write_idx:]
        self.write_idx = end_idx

# Bad - Individual appends
class SlowBuffer:
    def append(self, audio: np.ndarray):
        for sample in audio:  # Slow!
            self.buffer.append(sample)

Pattern 3: VAD Preprocessing

python

# Good - Skip inference on silence
import webrtcvad

class VADOptimizedDetector:
    def __init__(self):
        self.vad = webrtcvad.Vad(2)
        self.detector = SecureWakeWordDetector()

    def process(self, audio: np.ndarray):
        audio_int16 = (audio * 32767).astype(np.int16)
        if not self.vad.is_speech(audio_int16.tobytes(), 16000):
            return None  # Skip expensive inference
        return self.detector._process_audio()

# Bad - Always run inference
class WastefulDetector:
    def process(self, audio: np.ndarray):
        return self.detector._process_audio()  # Even on silence

Pattern 4: Batch Inference

python

# Good - Process multiple windows in single inference
class BatchDetector:
    def __init__(self, batch_size: int = 4):
        self.batch_size = batch_size
        self.pending_windows = []

    def add_window(self, audio: np.ndarray):
        self.pending_windows.append(audio)
        if len(self.pending_windows) >= self.batch_size:
            batch = np.stack(self.pending_windows)
            results = self.model.predict_batch(batch)
            self.pending_windows.clear()
            return results
        return None

Pattern 5: Memory-Mapped Models

python

# Good - Memory-map large model files
import mmap

class MmapModelLoader:
    def __init__(self, model_path: str):
        self.file = open(model_path, 'rb')
        self.mmap = mmap.mmap(self.file.fileno(), 0, access=mmap.ACCESS_READ)

# Bad - Load entire model into memory
class EagerModelLoader:
    def __init__(self, model_path: str):
        with open(model_path, 'rb') as f:
            self.model_data = f.read()  # Entire model in RAM

8. Security Standards

python

class PrivacyController:
    """Ensure privacy in always-listening system."""

    def __init__(self):
        self.is_enabled = True
        self.last_activity = time.time()

    def check_privacy_mode(self) -> bool:
        if self._is_dnd_enabled():
            return False
        if time.time() - self.last_activity > 3600:
            return False
        return self.is_enabled

# Data minimization
MAX_BUFFER_SECONDS = 2.0
def on_wake_detected():
    audio_buffer.clear()  # Delete immediately

9. Common Mistakes

python

# BAD - Stores all audio
def on_audio(chunk):
    with open("audio.raw", "ab") as f:
        f.write(chunk)

# GOOD - Discard after processing
def on_audio(chunk):
    buffer.extend(chunk)
    process_buffer()

# BAD - Large buffer
buffer = deque(maxlen=sample_rate * 60)  # 1 minute!

# GOOD - Minimal buffer
buffer = deque(maxlen=sample_rate * 1.5)  # 1.5 seconds

10. Pre-Implementation Checklist

Phase 1: Before Writing Code

Read TDD workflow section completely
Set up test file with detection accuracy tests
Define threshold and performance targets
Identify which performance patterns apply
Review privacy requirements

Phase 2: During Implementation

Write failing test for each feature first
Implement minimal code to pass test
Apply performance patterns (VAD, quantization)
Buffer size minimal (<2 seconds)
Audio cleared after detection

Phase 3: Before Committing

All tests pass: pytest tests/test_wake_word.py -v
Coverage >80%: pytest --cov=wake_word
False positive rate <1/hour tested
CPU usage <5% measured
Memory usage <100MB verified
Audio never stored to disk

11. Summary

Your goal is to create wake word detection that is:

Private: Audio processed locally, minimal retention
Efficient: Low CPU (<5%), low memory (<100MB)
Accurate: Low false positive rate (<1/hour)
Test-Driven: All features have tests first

Critical Reminders:

Write tests before implementation
Never store audio to disk
Keep buffer minimal (<2 seconds)
Apply performance patterns (VAD, quantization)

Maintainer

martinholovsky Core maintainer

Source details

Full Name: martinholovsky/claude-skills-generator
Branch: main
Path in repo: skills/wake-word-detection
License: The Unlicense

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

martinholovsky/claude-skills-generator

prompt-engineering

Expert skill for prompt engineering and task routing/orchestration. Covers secure prompt construction, injection prevention, multi-step task orchestration, and LLM output validation for JARVIS AI assistant.

32 3

Explore

martinholovsky/claude-skills-generator

windows-ui-automation

Expert in Windows UI Automation (UIA) and Win32 APIs for desktop automation. Specializes in accessible, secure automation of Windows applications including element discovery, input simulation, and process interaction. HIGH-RISK skill requiring strict security controls for system access.

32 3

Explore

martinholovsky/claude-skills-generator

accessibility-wcag

32 3

Explore

martinholovsky/claude-skills-generator

devsecops-expert

Expert DevSecOps engineer specializing in secure CI/CD pipelines, shift-left security, security automation, and compliance as code. Use when implementing security gates, container security, infrastructure scanning, secrets management, or building secure supply chains.

32 3

Explore

martinholovsky/claude-skills-generator

kanidm-expert

Expert in Kanidm modern identity management system specializing in user/group management, OAuth2/OIDC, LDAP, RADIUS, SSH key management, WebAuthn, and MFA. Deep expertise in secure authentication flows, credential policies, access control, and platform integrations. Use when implementing identity management, SSO, authentication systems, or securing access to infrastructure.

32 3

Explore

martinholovsky/claude-skills-generator

motion-design

32 3

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Wake Word Detection Skill

1. Overview

2. Core Principles

3. Core Responsibilities

3.1 Privacy-First Monitoring

3.2 Efficiency Requirements

4. Technical Foundation

5. Implementation Workflow (TDD)

Step 1: Write Failing Test First

Step 2: Implement Minimum to Pass

Step 3: Run Full Verification

6. Implementation Patterns

Pattern 1: Secure Wake Word Detector

Pattern 2: False Positive Reduction

7. Performance Patterns

Pattern 1: Model Quantization

Pattern 2: Efficient Audio Buffering

Pattern 3: VAD Preprocessing

Pattern 4: Batch Inference

Pattern 5: Memory-Mapped Models

8. Security Standards

9. Common Mistakes

10. Pre-Implementation Checklist

Phase 1: Before Writing Code

Phase 2: During Implementation

Phase 3: Before Committing

11. Summary

Recommended Agent Skills

prompt-engineering

windows-ui-automation

accessibility-wcag

devsecops-expert

kanidm-expert

motion-design