Agent skill
wavecap-llm
Configure WaveCap LLM-based transcription correction. Use when the user wants to enable/disable LLM correction, change models, tune prompts, or optimize correction quality on Apple Silicon.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/wavecap-llm
SKILL.md
WaveCap LLM Correction Tuning Skill
Use this skill to configure the optional LLM-based post-processing that corrects Whisper transcription errors using local models on Apple Silicon.
Requirements
- Apple Silicon Mac (M1/M2/M3/M4)
- mlx-lm package installed:
pip install mlx-lm - Sufficient RAM for model (1B=2GB, 3B=6GB, 8B=16GB)
Configuration Location
LLM settings are in the llm: section:
- User config:
/Users/thw/Projects/WaveCap/state/config.yaml
Basic Configuration
Enable/Disable LLM Correction
llm:
enabled: true # false to disable
Model Selection
llm:
model: llama-3.2-3b # Model name or HuggingFace path
| Model | Size | RAM | Speed | Quality | Use Case |
|---|---|---|---|---|---|
| llama-3.2-1b | 1B | ~2GB | Fastest | Good | Low RAM, quick fixes |
| qwen-2.5-1.5b | 1.5B | ~3GB | Very fast | Good | Balanced small model |
| llama-3.2-3b | 3B | ~6GB | Fast | Very good | Recommended |
| qwen-2.5-3b | 3B | ~6GB | Fast | Very good | Alternative to Llama |
| llama-3.1-8b | 8B | ~16GB | Moderate | Excellent | High quality |
| llama-3.2-8b | 8B | ~16GB | Moderate | Excellent | Latest 8B |
Generation Parameters
llm:
temperature: 0.1 # 0.0-1.0, lower = more deterministic
maxTokens: 256 # Max output tokens
minTextLength: 10 # Skip texts shorter than this
| Parameter | Default | Effect |
|---|---|---|
| temperature | 0.1 | Lower = consistent, higher = creative |
| maxTokens | 256 | Enough for single sentence corrections |
| minTextLength | 10 | Skip very short texts |
Domain Terms
Add domain-specific vocabulary to preserve:
llm:
domainTerms:
- SITREP
- SAPOL
- SES
- CFS
- Adelaide
- Noarlunga
- Para Hills
These terms are injected into the correction prompt to prevent the LLM from "fixing" correct jargon.
View Current Settings
grep -A20 "^llm:" /Users/thw/Projects/WaveCap/state/config.yaml
Check LLM Status
curl -s http://localhost:8000/api/health | jq
Full Configuration Example
llm:
enabled: true
model: llama-3.2-3b
temperature: 0.1
maxTokens: 256
minTextLength: 10
domainTerms:
- SITREP
- SAPOL
- SES
- CFS
- MFS
- Adelaide
- Noarlunga
- Aldinga
- Para Hills
- Gawler
- Sturt
- Metro
- Roger
- Wilco
- Over
- Out
Tuning Scenarios
Maximum Quality (8GB+ RAM)
llm:
enabled: true
model: llama-3.1-8b
temperature: 0.05
maxTokens: 300
Balanced (6GB RAM)
llm:
enabled: true
model: llama-3.2-3b
temperature: 0.1
maxTokens: 256
Low Memory (4GB RAM)
llm:
enabled: true
model: llama-3.2-1b
temperature: 0.1
maxTokens: 200
Disabled (Whisper only)
llm:
enabled: false
Apply Changes
launchctl stop com.wavecap.server && sleep 2 && launchctl start com.wavecap.server
Compare Before/After Correction
The LLM corrector stores both original and corrected text. Check improvements:
curl -s http://localhost:8000/api/transcriptions/export | \
jq '[.[] | select(.llmCorrectedText != null and .llmCorrectedText != .text)] |
.[:5] | .[] | {
original: .text,
corrected: .llmCorrectedText
}'
Monitor LLM Performance
Check correction rate
curl -s http://localhost:8000/api/transcriptions/export | \
jq '{
total: length,
with_llm_correction: [.[] | select(.llmCorrectedText != null)] | length,
actually_changed: [.[] | select(.llmCorrectedText != null and .llmCorrectedText != .text)] | length
}'
Check for over-correction
Look for cases where LLM made unwanted changes:
curl -s http://localhost:8000/api/transcriptions/export | \
jq '[.[] | select(.llmCorrectedText != null and .llmCorrectedText != .text)] |
.[-10:] | .[] | {
original: (.text | .[0:60]),
corrected: (.llmCorrectedText | .[0:60])
}'
Troubleshooting
Model Download Issues
Models are downloaded from HuggingFace on first use. Check downloads:
ls -la ~/.cache/huggingface/hub/ | grep mlx
Memory Issues
If you see memory errors, try a smaller model:
llm:
model: llama-3.2-1b # Smallest option
Slow Corrections
- Use a smaller model
- Reduce maxTokens
- Check Activity Monitor for GPU usage
LLM Not Correcting
Check if enabled and model loaded:
curl -s http://localhost:8000/api/health | jq
tail -50 /Users/thw/Projects/WaveCap/state/logs/backend.log | grep -i llm
Available Models (MLX Hub)
The following models are pre-configured:
| Alias | HuggingFace Repo |
|---|---|
| llama-3.2-1b | mlx-community/Llama-3.2-1B-Instruct-4bit |
| llama-3.2-3b | mlx-community/Llama-3.2-3B-Instruct-4bit |
| llama-3.1-8b | mlx-community/Meta-Llama-3.1-8B-Instruct-4bit |
| llama-3.2-8b | mlx-community/Llama-3.2-8B-Instruct-4bit |
| qwen-2.5-1.5b | mlx-community/Qwen2.5-1.5B-Instruct-4bit |
| qwen-2.5-3b | mlx-community/Qwen2.5-3B-Instruct-4bit |
You can also use any MLX-compatible model path directly:
llm:
model: mlx-community/Mistral-7B-Instruct-v0.3-4bit
Tips
- Start with
llama-3.2-3bfor best balance of speed and quality - Keep temperature low (0.05-0.15) for consistent corrections
- Add all domain-specific terms to prevent unwanted "corrections"
- Monitor the correction diff to ensure quality
- LLM correction adds ~100-500ms latency per transcription
- Disable if running on Intel Mac or limited RAM
Didn't find tool you were looking for?