Agent skills
video-transcriber

Agent skill

video-transcriber

Transcribe audio from videos using Whisper (local) or Gemini API (gemini-flash-lite-latest). Use when you need to convert video/audio to text for further processing, subtitle generation, or content analysis. Supports multiple languages, speaker diarization, and timestamp-accurate transcription. Gemini provides additional features like emotion detection and viral segment analysis.

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/video-transcriber

Metadata

Additional technical details for this skill

models: whisper, gemini-flash-lite-latest
version: 1.0

SKILL.md

Video Transcriber

This skill enables AI agents to transcribe audio from video files using either Whisper (local processing) or Gemini API (cloud processing with advanced features).

When to Use

User wants to transcribe a video or audio file
User needs subtitles/captions for a video
User wants to analyze video content through transcription
User needs to identify viral-worthy segments
User wants speaker diarization or emotion detection

Model Selection

Whisper (Local)

Pros:

Free to use
100% privacy (no cloud upload)
Good for sensitive content
Lower cost for high volume

Cons:

Requires local processing power
No built-in speaker diarization
No emotion detection
Limited to 99 languages

Models:

tiny - Fastest, lower accuracy (~32MB)
base - Fast, good accuracy (~74MB)
small - Balanced speed/accuracy (~244MB)
medium - Good accuracy, slower (~769MB)
large-v3 - Highest accuracy, slowest (~1550MB)

Gemini API (Cloud)

Pros:

High accuracy with gemini-flash-lite-latest
Built-in speaker diarization
Emotion detection from speech
Context understanding
Can identify viral segments
125+ language support
Faster processing (cloud-based)

Cons:

Requires API key
Cloud upload (privacy consideration)
Cost per usage
Internet required

Available Scripts

`scripts/transcribe.py`

Transcribe audio from video file.

Usage:

bash

python skills/video-transcriber/scripts/transcribe.py <video_path> [options]

Options:

--model, -m: Model to use (whisper, gemini) - default: auto
--whisper-model: Whisper model size (tiny, base, small, medium, large-v3) - default: medium
--use-faster: Use faster-whisper for speed - default: True
--output, -o: Output file path (default: <video_path>.srt)
--format: Output format (srt, vtt, json) - default: srt
--language: Language code (e.g., en, id) - default: auto
--speaker-diarization: Enable speaker labels (Gemini only)
--emotion-detection: Enable emotion detection (Gemini only)
--device: Device for Whisper (auto, cpu, cuda) - default: auto

Examples:

Transcribe with Whisper (default):

bash

python skills/video-transcriber/scripts/transcribe.py video.mp4

Transcribe with Gemini API:

bash

python skills/video-transcriber/scripts/transcribe.py video.mp4 --model gemini

Transcribe with speaker diarization and emotion detection (Gemini):

bash

python skills/video-transcriber/scripts/transcribe.py video.mp4 --model gemini --speaker-diarization --emotion-detection

Transcribe with large Whisper model:

bash

python skills/video-transcriber/scripts/transcribe.py video.mp4 --whisper-model large-v3

Output to JSON:

bash

python skills/video-transcriber/scripts/transcribe.py video.mp4 --format json

`scripts/analyze.py`

Analyze audio content using Gemini API for viral segments, summary, or emotions.

Usage:

bash

python skills/video-transcriber/scripts/analyze.py <video_path> [options]

Options:

--analysis-type: Type of analysis (viral, summary, emotions, questions) - default: viral
--num-segments: Number of segments to identify (for viral analysis) - default: 5
--model: Model to use (default: gemini)

Examples:

Detect viral segments:

bash

python skills/video-transcriber/scripts/analyze.py video.mp4 --analysis-type viral

Get summary:

bash

python skills/video-transcriber/scripts/analyze.py video.mp4 --analysis-type summary

Analyze emotions:

bash

python skills/video-transcriber/scripts/analyze.py video.mp4 --analysis-type emotions

Output Format

SRT Format

srt

1
00:00:00,000 --> 00:00:05,000
This is the first subtitle.

2
00:00:05,500 --> 00:00:10,000
This is the second subtitle.

JSON Format

json

[
  {
    "index": 1,
    "start": 0.0,
    "end": 5.0,
    "text": "This is the first subtitle.",
    "speaker": "Speaker A",
    "emotion": "neutral"
  }
]

Auto Selection Logic

When --model auto, the system selects based on:

Privacy priority: Always use Whisper
Quality needed: Use gemini for highest quality
Content length: Use faster-whisper for long content (> 1 hour)
Feature requirements: Use gemini if speaker diarization or emotion detection needed
Default: Use gemini-flash-lite-latest

Environment Variables

bash

# For Gemini API
export GEMINI_API_KEY="your-api-key"

# Optional: For Vertex AI
export GOOGLE_PROJECT_ID="your-project-id"
export GOOGLE_LOCATION="us-central1"

Integration with Other Skills

After transcription, you can use these skills:

highlight-scanner: Analyze transcript for viral moments
subtitle-overlay: Add captions to video
autocut-shorts: Full workflow for creating short clips

Common Workflow

User provides video file or URL
Download if needed (youtube-downloader)
Transcribe using this skill
Analyze transcript for highlights (highlight-scanner)
Create short clips (autocut-shorts)

Tips

Use --use-faster with Whisper for faster processing
Use Gemini when you need speaker diarization
Use --format json for programmatic processing
For long videos, consider splitting into segments
Use --analysis-type viral to identify best segments for short-form content

References

Whisper documentation: https://github.com/openai/whisper
Gemini API: https://ai.google.dev/gemini-api/docs/audio
Language codes: ISO 639-1 codes (en, id, es, etc.)

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/video-transcriber
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

Video Transcriber

When to Use

Model Selection

Whisper (Local)

Gemini API (Cloud)

Available Scripts

scripts/transcribe.py

scripts/analyze.py

Output Format

SRT Format

JSON Format

Auto Selection Logic

Environment Variables

Integration with Other Skills

Common Workflow

Tips

References

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state

`scripts/transcribe.py`

`scripts/analyze.py`