Agent skill
assemblyai-streaming
This skill should be used when working with AssemblyAI’s Speech-to-Text and LLM Gateway APIs, especially for streaming/live transcription, meeting notetakers, and voice agents that need low-latency transcripts and audio analysis.
Install this agent skill to your Project
npx add-skill https://github.com/ratacat/claude-skills/tree/main/skills/assembly-ai-streaming
Metadata
Additional technical details for this skill
- focus
- streaming-stt, meeting-notetaker, voice-agent, llm-gateway
- skill version
- 1.0.0
- upstream docs
- https://www.assemblyai.com/docs
SKILL.md
AssemblyAI Streaming & Live Transcription Skill
Overview
Use this skill to build and maintain code that talks to AssemblyAI’s:
- Streaming Speech-to-Text (STT) via WebSockets (
wss://streaming.assemblyai.com/v3/ws) - Async / pre-recorded STT via REST (
https://api.assemblyai.com/v2/transcript) - LLM Gateway for applying Claude/GPT/Gemini-style models to transcripts (
https://llm-gateway.assemblyai.com)
The emphasis is on streaming/live transcription, meeting notetakers, and voice agents, while still covering async workflows and post-processing.
This skill assumes a Claude Code environment with access to Python (preferred) and Bash.
When to Use
Use this skill when:
- Implementing real-time transcription from a microphone, telephony stream, or audio file.
- Building a live meeting notetaker (Zoom/Teams/Meet), especially with summaries, action items, and highlights.
- Implementing a voice agent where latency and natural turn-taking matter.
- Migrating from other STT providers (OpenAI/Deepgram/Google/AWS/etc.) to AssemblyAI.
- Applying LLMs to audio via LLM Gateway for summaries, Q&A, topic tagging, or custom prompts.
Do not use this skill when:
- The task is generic HTTP client usage with no AssemblyAI-specific logic.
- The request clearly targets a different STT vendor.
- The environment cannot safely store or use an API key.
AssemblyAI Mental Model
1. Products to care about
-
Pre-recorded Speech-to-Text (Async)
- REST API:
POST /v2/transcript→GET /v2/transcript/{id} - Designed for files from URLs, uploads, S3, etc.
- Supports extra models: summarization, topic detection, sentiment, PII redaction, chapters, etc.
- REST API:
-
Streaming Speech-to-Text
- WebSocket:
wss://streaming.assemblyai.com/v3/ws - Low-latency, immutable transcripts (~300ms).
- Turn detection built in; fits voice agents and live captioning.
- WebSocket:
-
LLM Gateway
- REST API:
POST /v1/chat/completionsathttps://llm-gateway.assemblyai.com - Unified access to multiple LLMs (Claude, GPT, Gemini, etc.).
- Designed for “LLM over transcripts” workflows.
- REST API:
2. Key model knobs (Async)
speech_models:["slam-1", "universal"]etc.- Slam-1: best English accuracy + keyterms_prompt, good for medical/technical conversations.
- Universal: multilingual coverage; good default if language is unknown.
language_codevslanguage_detection:- Use
language_codewhen the language is known. - Use
language_detection: truewhen unknown; optionally setlanguage_confidence_threshold.
- Use
keyterms_prompt:- Domain words/phrases to boost (med terms, product names, etc.).
- Extra intelligence:
summarization,iab_categories,content_safety,entity_detection,auto_chapters,sentiment_analysis,speaker_labels,auto_highlights,redact_pii, etc.
3. Key model knobs (Streaming)
Connection URL:
- US:
wss://streaming.assemblyai.com/v3/ws - EU:
wss://streaming.eu.assemblyai.com/v3/ws
Important query parameters:
sample_rate(required): e.g.16000format_turns(bool): return formatted final transcripts; avoid for low-latency voice agents.speech_model:universal-streaming-english(default) oruniversal-streaming-multi.- `keyterms_p
rompt: JSON-encoded list of terms, e.g. ["AssemblyAI", "Slam-1", "Keanu Reeves"]`.
- Turn detection:
end_of_turn_confidence_threshold(0.0–1.0, default ~0.4)min_end_of_turn_silence_when_confident(ms, default ~400)max_turn_silence(ms, default ~1280)
Headers:
- Use either
Authorization: <API_KEY>or a short-livedtokenquery parameter issued by your backend.
Messages:
- Client sends:
- Binary audio chunks (50–1000ms each).
- Optional JSON messages:
{"type": "UpdateConfig", ...},{"type": "Terminate"},{"type": "ForceEndpoint"}.
- Server sends:
Beginevent withid,expires_at.Turnevents with:transcript(immutable partials/finals),utterance(complete semantic chunk),end_of_turn(bool),turn_is_formatted(bool),wordsarray with timestamps/confidences.
Terminationevent with summary stats.
4. Regions and data residency
- Async:
- US:
https://api.assemblyai.com - EU:
https://api.eu.assemblyai.com
- US:
- Streaming:
- US:
wss://streaming.assemblyai.com/v3/ws - EU:
wss://streaming.eu.assemblyai.com/v3/ws
- US:
Always keep base URLs consistent per project; don’t mix US/EU endpoints for the same data.
Security & API Keys
- Always require an AssemblyAI API key and keep it out of source in Claude Code output:
- Use environment variables:
ASSEMBLYAI_API_KEY. - Or placeholders (
"<YOUR_API_KEY>") in snippets.
- Use environment variables:
- For browser/client code:
- Do not embed the API key.
- Instruct the user to generate temporary streaming tokens on their backend and pass only the token into the WebSocket connection.
- Never print real keys in logs or comments.
High-Level Workflow Patterns
Decision tree
-
Is the audio live?
- Yes → Use Streaming STT.
- No → Use Async STT.
-
Is latency critical (<1s) for responses?
- Yes → Streaming with
format_turns=falseand careful turn detection. - No → Async, then Summarization/Chapters/etc.
- Yes → Streaming with
-
Do transcripts leave the backend?
- Yes → Consider
redact_pii(and optionallyredact_pii_audio) before sharing. - No → Use raw transcripts as needed.
- Yes → Consider
-
Need LLM-based processing (Q&A, structured summaries)?
- Yes → Pipe transcripts into LLM Gateway via
chat/completions.
- Yes → Pipe transcripts into LLM Gateway via
How Claude Should Work with This Skill
General principles
- Prefer official AssemblyAI SDKs (Python/JS) when available; fall back to
requests/websocket-clientonly if SDK cannot be installed. - Always:
- Validate HTTP responses and WebSocket status.
- Surface useful error messages (
status,errorfields in transcript JSON). - Respect documented min/max chunk sizes (50–1000ms of audio per binary message).
- For voice-agent code, optimize for:
- Immutable partials (
transcript) andutterancefield. - Minimal latency, avoid extra formatting passes.
- Immutable partials (
Recipe 1 – Minimal Streaming from Microphone (Python SDK)
Goal: Stream mic audio to AssemblyAI and print transcripts in real time.
Use this when the environment has Python and assemblyai + pyaudio installed, and the user wants a quick streaming demo.
import assemblyai as aai
from assemblyai.streaming import v3 as aai_stream
import pyaudio
API_KEY = "<YOUR_API_KEY>"
aai.settings.api_key = API_KEY
SAMPLE_RATE = 16000
CHUNK_MS = 50
FRAMES_PER_BUFFER = int(SAMPLE_RATE * (CHUNK_MS / 1000.0))
def main():
client = aai_stream.StreamingClient(
aai_stream.StreamingClientOptions(
api_key=API_KEY,
api_host="streaming.assemblyai.com", # or "streaming.eu.assemblyai.com"
)
)
def on_begin(_client, event: aai_stream.BeginEvent):
print(f"Session started: {event.id}, expires at {event.expires_at}")
def on_turn(_client, event: aai_stream.TurnEvent):
# Use immutable transcript text
text = (event.transcript or "").strip()
if not text:
return
# Use formatted finals only for display; keep unformatted for LLMs
if event.turn_is_formatted:
print(f"[FINAL] {text}")
else:
print(f"[PARTIAL] {text}", end="\r")
def on_terminated(_client, event: aai_stream.TerminationEvent):
print(f"\nTerminated. Audio duration={event.audio_duration_seconds}s")
def on_error(_client, error: aai_stream.StreamingError):
print(f"\nStreaming error: {error}")
client.on(aai_stream.StreamingEvents.Begin, on_begin)
client.on(aai_stream.StreamingEvents.Turn, on_turn)
client.on(aai_stream.StreamingEvents.Termination, on_terminated)
client.on(aai_stream.StreamingEvents.Error, on_error)
client.connect(
aai_stream.StreamingParameters(
sample_rate=SAMPLE_RATE,
format_turns=False, # better latency for voice agents
)
)
pa = pyaudio.PyAudio()
stream = pa.open(
format=pyaudio.paInt16,
channels=1,
rate=SAMPLE_RATE,
input=True,
frames_per_buffer=FRAMES_PER_BUFFER,
)
try:
print("Speak into your microphone (Ctrl+C to stop)...")
def audio_gen():
while True:
yield stream.read(FRAMES_PER_BUFFER, exception_on_overflow=False)
client.stream(audio_gen())
except KeyboardInterrupt:
pass
finally:
client.disconnect(terminate=True)
stream.stop_stream()
stream.close()
pa.terminate()
if __name__ == "__main__":
main()
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
brave-search
Use when user asks to search the web, look something up online, find current/recent/latest information, or needs cited answers. Triggers on "search", "look up", "find out about", "what is the current/latest", image searches, news lookups. NOT for searching code/files—only for web/internet searches.
bug-reproduction-validator
Use this agent when you receive a bug report or issue description and need to verify whether the reported behavior is actually a bug. This agent will attempt to reproduce the issue systematically, validate the steps to reproduce, and confirm whether the behavior deviates from expected functionality. <example>\nContext: The user has reported a potential bug in the application.\nuser: "Users are reporting that the email processing fails when there are special characters in the subject line"\nassistant: "I'll use the bug-reproduction-validator agent to verify if this is an actual bug by attempting to reproduce it"\n<commentary>\nSince there's a bug report about email processing with special characters, use the bug-reproduction-validator agent to systematically reproduce and validate the issue.\n</commentary>\n</example>\n<example>\nContext: An issue has been raised about unexpected behavior.\nuser: "There's a report that the brief summary isn't including all emails from today"\nassistant: "Let me launch the b...
agent-native-audit
Run comprehensive agent-native architecture review with scored principles
brainstorming
This skill should be used before implementing features, building components, or making changes. It guides exploring user intent, approaches, and design decisions before planning. Triggers on "let's brainstorm", "help me think through", "what should we build", "explore approaches", ambiguous feature requests, or when the user's request has multiple valid interpretations that need clarification.
performance-oracle
Use this agent when you need to analyze code for performance issues, optimize algorithms, identify bottlenecks, or ensure scalability. This includes reviewing database queries, memory usage, caching strategies, and overall system performance. The agent should be invoked after implementing features or when performance concerns arise.\n\n<example>\nContext: The user has just implemented a new feature that processes user data.\nuser: "I've implemented the user analytics feature. Can you check if it will scale?"\nassistant: "I'll use the performance-oracle agent to analyze the scalability and performance characteristics of your implementation."\n<commentary>\nSince the user is concerned about scalability, use the Task tool to launch the performance-oracle agent to analyze the code for performance issues.\n</commentary>\n</example>\n\n<example>\nContext: The user is experiencing slow API responses.\nuser: "The API endpoint for fetching reports is taking over 2 seconds to respond"\nassistant: "Let me invoke the...
triage
Triage and categorize findings for the CLI todo system
Didn't find tool you were looking for?