Agent skill
ai-avatar-video
Create AI avatar and talking head videos with OmniHuman, Fabric, PixVerse via inference.sh CLI. Models: OmniHuman 1.5, OmniHuman 1.0, Fabric 1.0, PixVerse Lipsync. Capabilities: audio-driven avatars, lipsync videos, talking head generation, virtual presenters. Use for: AI presenters, explainer videos, virtual influencers, dubbing, marketing videos. Triggers: ai avatar, talking head, lipsync, avatar video, virtual presenter, ai spokesperson, audio driven video, heygen alternative, synthesia alternative, talking avatar, lip sync, video avatar, ai presenter, digital human
Install this agent skill to your Project
npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/inference-sh/ai-avatar-video
SKILL.md
AI Avatar & Talking Head Videos
Create AI avatars and talking head videos via inference.sh CLI.
Quick Start
curl -fsSL https://cli.inference.sh | sh && infsh login
# Create avatar video from image + audio
infsh app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'
Available Models
| Model | App ID | Best For |
|---|---|---|
| OmniHuman 1.5 | bytedance/omnihuman-1-5 |
Multi-character, best quality |
| OmniHuman 1.0 | bytedance/omnihuman-1-0 |
Single character |
| Fabric 1.0 | falai/fabric-1-0 |
Image talks with lipsync |
| PixVerse Lipsync | falai/pixverse-lipsync |
Highly realistic |
Search Avatar Apps
infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"
Examples
OmniHuman 1.5 (Multi-Character)
infsh app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'
Supports specifying which character to drive in multi-person images.
Fabric 1.0 (Image Talks)
infsh app run falai/fabric-1-0 --input '{
"image_url": "https://face.jpg",
"audio_url": "https://audio.mp3"
}'
PixVerse Lipsync
infsh app run falai/pixverse-lipsync --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'
Generates highly realistic lipsync from any audio.
Full Workflow: TTS + Avatar
# 1. Generate speech from text
infsh app run infsh/kokoro-tts --input '{
"text": "Welcome to our product demo. Today I will show you..."
}' > speech.json
# 2. Create avatar video with the speech
infsh app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://presenter-photo.jpg",
"audio_url": "<audio-url-from-step-1>"
}'
Full Workflow: Dub Video in Another Language
# 1. Transcribe original video
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json
# 2. Translate text (manually or with an LLM)
# 3. Generate speech in new language
infsh app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json
# 4. Lipsync the original video with new audio
infsh app run infsh/latentsync-1-6 --input '{
"video_url": "https://original-video.mp4",
"audio_url": "<new-audio-url>"
}'
Use Cases
- Marketing: Product demos with AI presenter
- Education: Course videos, explainers
- Localization: Dub content in multiple languages
- Social Media: Consistent virtual influencer
- Corporate: Training videos, announcements
Tips
- Use high-quality portrait photos (front-facing, good lighting)
- Audio should be clear with minimal background noise
- OmniHuman 1.5 supports multiple people in one image
- LatentSync is best for syncing existing videos to new audio
Related Skills
# Full platform skill (all 150+ apps)
npx skills add inference-sh/skills@inference-sh
# Text-to-speech (generate audio for avatars)
npx skills add inference-sh/skills@text-to-speech
# Speech-to-text (transcribe for dubbing)
npx skills add inference-sh/skills@speech-to-text
# Video generation
npx skills add inference-sh/skills@ai-video-generation
# Image generation (create avatar images)
npx skills add inference-sh/skills@ai-image-generation
Browse all video apps: infsh app list --category video
Documentation
- Running Apps - How to run apps via CLI
- Content Pipeline Example - Building media workflows
- Streaming Results - Real-time progress updates
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
perigon-backend
Perigon ASP.NET Core + EF Core + Aspire conventions
perigon-agent
Pointers for Copilot/agents to apply Perigon conventions
perigon-angular
Angular 21+ standalone/Material/signal conventions for Perigon WebApp
fastapi-mastery
Comprehensive FastAPI development skill covering REST API creation, routing, request/response handling, validation, authentication, database integration, middleware, and deployment. Use when working with FastAPI projects, building APIs, implementing CRUD operations, setting up authentication/authorization, integrating databases (SQL/NoSQL), adding middleware, handling WebSockets, or deploying FastAPI applications. Triggered by requests involving .py files with FastAPI code, API endpoint creation, Pydantic models, or FastAPI-specific features.
context7-efficient
Token-efficient library documentation fetcher using Context7 MCP with 86.8% token savings through intelligent shell pipeline filtering. Fetches code examples, API references, and best practices for JavaScript, Python, Go, Rust, and other libraries. Use when users ask about library documentation, need code examples, want API usage patterns, are learning a new framework, need syntax reference, or troubleshooting with library-specific information. Triggers include questions like "Show me React hooks", "How do I use Prisma", "What's the Next.js routing syntax", or any request for library/framework documentation.
browser-use
Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.
Didn't find tool you were looking for?