Agent skill
podcast-generation
Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.
Install this agent skill to your Project
npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/sickn33/podcast-generation
SKILL.md
Podcast Generation with GPT Realtime Mini
Generate real audio narratives from text content using Azure OpenAI's Realtime API.
Quick Start
- Configure environment variables for Realtime API
- Connect via WebSocket to Azure OpenAI Realtime endpoint
- Send text prompt, collect PCM audio chunks + transcript
- Convert PCM to WAV format
- Return base64-encoded audio to frontend for playback
Environment Configuration
AZURE_OPENAI_AUDIO_API_KEY=your_realtime_api_key
AZURE_OPENAI_AUDIO_ENDPOINT=https://your-resource.cognitiveservices.azure.com
AZURE_OPENAI_AUDIO_DEPLOYMENT=gpt-realtime-mini
Note: Endpoint should NOT include /openai/v1/ - just the base URL.
Core Workflow
Backend Audio Generation
from openai import AsyncOpenAI
import base64
# Convert HTTPS endpoint to WebSocket URL
ws_url = endpoint.replace("https://", "wss://") + "/openai/v1"
client = AsyncOpenAI(
websocket_base_url=ws_url,
api_key=api_key
)
audio_chunks = []
transcript_parts = []
async with client.realtime.connect(model="gpt-realtime-mini") as conn:
# Configure for audio-only output
await conn.session.update(session={
"output_modalities": ["audio"],
"instructions": "You are a narrator. Speak naturally."
})
# Send text to narrate
await conn.conversation.item.create(item={
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": prompt}]
})
await conn.response.create()
# Collect streaming events
async for event in conn:
if event.type == "response.output_audio.delta":
audio_chunks.append(base64.b64decode(event.delta))
elif event.type == "response.output_audio_transcript.delta":
transcript_parts.append(event.delta)
elif event.type == "response.done":
break
# Convert PCM to WAV (see scripts/pcm_to_wav.py)
pcm_audio = b''.join(audio_chunks)
wav_audio = pcm_to_wav(pcm_audio, sample_rate=24000)
Frontend Audio Playback
// Convert base64 WAV to playable blob
const base64ToBlob = (base64, mimeType) => {
const bytes = atob(base64);
const arr = new Uint8Array(bytes.length);
for (let i = 0; i < bytes.length; i++) arr[i] = bytes.charCodeAt(i);
return new Blob([arr], { type: mimeType });
};
const audioBlob = base64ToBlob(response.audio_data, 'audio/wav');
const audioUrl = URL.createObjectURL(audioBlob);
new Audio(audioUrl).play();
Voice Options
| Voice | Character |
|---|---|
| alloy | Neutral |
| echo | Warm |
| fable | Expressive |
| onyx | Deep |
| nova | Friendly |
| shimmer | Clear |
Realtime API Events
response.output_audio.delta- Base64 audio chunkresponse.output_audio_transcript.delta- Transcript textresponse.done- Generation completeerror- Handle withevent.error.message
Audio Format
- Input: Text prompt
- Output: PCM audio (24kHz, 16-bit, mono)
- Storage: Base64-encoded WAV
References
- Full architecture: See references/architecture.md for complete stack design
- Code examples: See references/code-examples.md for production patterns
- PCM conversion: Use scripts/pcm_to_wav.py for audio format conversion
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
perigon-backend
Perigon ASP.NET Core + EF Core + Aspire conventions
perigon-agent
Pointers for Copilot/agents to apply Perigon conventions
perigon-angular
Angular 21+ standalone/Material/signal conventions for Perigon WebApp
fastapi-mastery
Comprehensive FastAPI development skill covering REST API creation, routing, request/response handling, validation, authentication, database integration, middleware, and deployment. Use when working with FastAPI projects, building APIs, implementing CRUD operations, setting up authentication/authorization, integrating databases (SQL/NoSQL), adding middleware, handling WebSockets, or deploying FastAPI applications. Triggered by requests involving .py files with FastAPI code, API endpoint creation, Pydantic models, or FastAPI-specific features.
context7-efficient
Token-efficient library documentation fetcher using Context7 MCP with 86.8% token savings through intelligent shell pipeline filtering. Fetches code examples, API references, and best practices for JavaScript, Python, Go, Rust, and other libraries. Use when users ask about library documentation, need code examples, want API usage patterns, are learning a new framework, need syntax reference, or troubleshooting with library-specific information. Triggers include questions like "Show me React hooks", "How do I use Prisma", "What's the Next.js routing syntax", or any request for library/framework documentation.
browser-use
Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.
Didn't find tool you were looking for?