Agent skills
livekit-stt-selfhosted

Agent skill

livekit-stt-selfhosted

Build self-hosted speech-to-text APIs using Hugging Face models (Whisper, Wav2Vec2) and create LiveKit voice agent plugins. Use when building STT infrastructure, creating custom LiveKit plugins, deploying self-hosted transcription services, or integrating Whisper/HF models with LiveKit agents. Includes FastAPI server templates, LiveKit plugin implementation, model selection guides, and production deployment patterns.

View SKILL.md on GitHub Repository

Stars 1

Forks 3

Install this agent skill to your Project

npx add-skill https://github.com/Okeysir198/P20251122-claude-skills/tree/main/.claude/skills/livekit-stt-selfhosted

SKILL.md

LiveKit Self-Hosted STT Plugin

Build self-hosted speech-to-text APIs and LiveKit voice agent plugins using Hugging Face models.

Overview

This skill provides templates and guidance for:

Building a self-hosted STT API server using FastAPI + Whisper/HF models
Creating a LiveKit plugin that connects to your self-hosted API
Deploying and scaling in production

Quick Start

Option 1: Build Both (API + Plugin)

When user wants complete setup:

Create API Server:

bash

python scripts/setup_api_server.py my-stt-server --model openai/whisper-medium
cd my-stt-server
pip install -r requirements.txt
python main.py

Create Plugin:

bash

python scripts/setup_plugin.py custom-stt
cd livekit-plugins-custom-stt
pip install -e .

Use in LiveKit Agent:

python

from livekit.plugins import custom_stt

stt=custom_stt.STT(api_url="ws://localhost:8000/ws/transcribe")

Option 2: API Server Only

When user only needs the API server:

Use scripts/setup_api_server.py with desired model
See references/api_server_guide.md for implementation details
Template in assets/api-server/

Option 3: Plugin Only

When user has existing API and needs LiveKit plugin:

Use scripts/setup_plugin.py with plugin name
See references/plugin_implementation.md for details
Template in assets/plugin-template/

Model Selection

Help user choose the right model:

Use Case	Recommended Model	Rationale
Best accuracy	`openai/whisper-large-v3`	SOTA quality, requires GPU
Production balance	`openai/whisper-medium`	Good quality, reasonable speed
Real-time/fast	`openai/whisper-small`	Fast, acceptable quality
CPU-only	`openai/whisper-tiny`	Can run without GPU
English-only	`facebook/wav2vec2-large-960h`	Optimized for English

For detailed comparison and optimization tips, see references/models_comparison.md.

Implementation Workflow

Building the API Server

Use the template: Start with assets/api-server/main.py
Key components:
- FastAPI app with WebSocket endpoint
- Model loading at startup (kept in memory)
- Audio buffer management
- WebSocket protocol for streaming
Customization points:
- Model selection (change MODEL_ID in .env)
- Audio processing parameters
- Batch size and optimization
- Error handling

For complete implementation guide, see references/api_server_guide.md.

Building the LiveKit Plugin

Use the template: Start with assets/plugin-template/
Required implementations:
- _recognize_impl() - Non-streaming recognition
- stream() - Return SpeechStream instance
- SpeechStream class - Handle streaming
Key considerations:
- Audio format conversion (16kHz, mono, 16-bit PCM)
- WebSocket connection management
- Event emission (interim/final transcripts)
- Error handling and cleanup

For complete implementation guide, see references/plugin_implementation.md.

Deployment

Development

bash

# API Server
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

# Test WebSocket
ws://localhost:8000/ws/transcribe

Production

Docker (Recommended):

bash

docker-compose up

Kubernetes: Use manifests in deployment guide

Cloud Platforms: AWS ECS, GCP Cloud Run, Azure Container Instances

For complete deployment guide including scaling, monitoring, and security, see references/deployment.md.

WebSocket Protocol

Client → Server

Audio: Binary (16-bit PCM, 16kHz)
Config: {"type": "config", "language": "en"}
End: {"type": "end"}

Server → Client

Interim: {"type": "interim", "text": "..."}
Final: {"type": "final", "text": "...", "language": "en"}
Error: {"type": "error", "message": "..."}

Common Tasks

Change Model

Edit .env:

bash

MODEL_ID=openai/whisper-small  # Faster model

Add Language Support

In plugin usage:

python

stt=custom_stt.STT(language="es")  # Spanish
stt=custom_stt.STT(detect_language=True)  # Auto-detect

Enable GPU

In API server:

bash

DEVICE=cuda:0  # Use GPU

Scale Horizontally

Deploy multiple API server instances behind load balancer. See references/deployment.md for Nginx configuration.

Troubleshooting

Out of Memory

Use smaller model (whisper-small or whisper-tiny)
Reduce batch_size in pipeline
Enable low_cpu_mem_usage=True

Slow Transcription

Ensure GPU is enabled (DEVICE=cuda:0)
Use FP16 precision (automatic on GPU)
Increase batch_size
Use smaller model

Connection Issues

Verify WebSocket support in load balancer
Check firewall rules
Increase timeout settings

Scripts

scripts/setup_api_server.py - Generate API server from template
scripts/setup_plugin.py - Generate LiveKit plugin from template

References

Load these as needed for detailed information:

references/api_server_guide.md - Complete API implementation guide
references/plugin_implementation.md - LiveKit plugin development
references/models_comparison.md - Model selection and optimization
references/deployment.md - Production deployment best practices

Assets

Ready-to-use templates:

assets/api-server/ - Complete FastAPI server with Whisper
assets/plugin-template/ - LiveKit STT plugin structure

Best Practices

Keep models in memory - Load once at startup, not per request
Use appropriate model size - Balance quality vs. speed for your use case
Process audio in chunks - 1-second chunks work well for streaming
Implement proper cleanup - Close WebSocket connections gracefully
Monitor metrics - Track latency, throughput, GPU utilization
Use Docker - Ensures consistent deployments
Enable authentication - Secure production APIs
Scale horizontally - Use load balancer for high availability

Maintainer

Okeysir198 Core maintainer

Source details

Full Name: Okeysir198/P20251122-claude-skills
Branch: main
Path in repo: .claude/skills/livekit-stt-selfhosted

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

Okeysir198/P20251122-claude-skills

mcp-builder

Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).

1 3

Explore

Okeysir198/P20251122-claude-skills

canvas-design

Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.

1 3

Explore

Okeysir198/P20251122-claude-skills

skill-creator

Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.

1 3

Explore

Okeysir198/P20251122-claude-skills

livekit-nextjs-frontend

Build and review production-grade web and mobile frontends using LiveKit with Next.js. Covers real-time video/audio/data communication, WebRTC connections, track management, and best practices for LiveKit React components.

1 3

Explore

Okeysir198/P20251122-claude-skills

livekit-agent-tools

Comprehensive guide for building functional tools for LiveKit voice agents using the @function_tool decorator. Use when creating tools for LiveKit agents to enable capabilities like API calls, database queries, multi-agent coordination, or any external integrations. Covers tool design, RunContext handling, interruption patterns, parameter documentation, testing, and production best practices.

1 3

Explore

Okeysir198/P20251122-claude-skills

webapp-testing

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

1 3

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

LiveKit Self-Hosted STT Plugin

Overview

Quick Start

Option 1: Build Both (API + Plugin)

Option 2: API Server Only

Option 3: Plugin Only

Model Selection

Implementation Workflow

Building the API Server

Building the LiveKit Plugin

Deployment

Development

Production

WebSocket Protocol

Client → Server

Server → Client

Common Tasks

Change Model

Add Language Support

Enable GPU

Scale Horizontally

Troubleshooting

Out of Memory

Slow Transcription

Connection Issues

Scripts

References

Assets

Best Practices

Recommended Agent Skills

mcp-builder

canvas-design

skill-creator

livekit-nextjs-frontend

livekit-agent-tools

webapp-testing