Agent skill
multimodal-router
Install this agent skill to your Project
npx add-skill https://github.com/pyramidheadshark/claude-scaffold/tree/main/.claude/skills/multimodal-router
SKILL.md
Multimodal Router
When to Load This Skill
Load when working with: PDF files, Word documents, Excel spreadsheets, images, audio, video files, or any document exceeding 400k tokens that cannot fit in Claude's standard context.
Model
- Model:
google/gemini-3-flash-preview - Provider: OpenRouter API
- Context window: 1M tokens
- Capabilities: text, images, audio, video, PDF — all natively
- Thinking levels: minimal / low / medium / high (configurable per task)
Gemini 3 Flash Preview is a thinking model with near-Pro reasoning at Flash latency.
Use thinking_level: "low" for document extraction, "medium" or "high" for complex analysis.
When to Use This Skill (Decision Rules)
Use Gemini 3 Flash via this skill when:
- Input is a PDF, image, audio file, or video
- Input document exceeds ~400k tokens (rough estimate: 300+ pages of text)
- Task requires visual understanding (screenshots, diagrams, scanned docs)
- Client sent
.docx,.pdf,.xlsx,.mp4,.wavfor initial project analysis
Do NOT use for: writing code, architecture decisions, tests. Those stay with Claude Code.
OpenRouter Client Pattern
import httpx
from src.project_name.core.config import settings
OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
MULTIMODAL_MODEL = "google/gemini-3-flash-preview"
async def call_gemini_flash(
prompt: str,
base64_content: str | None = None,
media_type: str | None = None,
thinking_level: str = "low",
) -> str:
messages: list[dict] = []
if base64_content and media_type:
messages.append({
"role": "user",
"content": [
{
"type": "image_url" if media_type.startswith("image") else "file",
"image_url": {"url": f"data:{media_type};base64,{base64_content}"},
},
{"type": "text", "text": prompt},
],
})
else:
messages.append({"role": "user", "content": prompt})
payload = {
"model": MULTIMODAL_MODEL,
"messages": messages,
"reasoning": {"effort": thinking_level},
"max_tokens": 4096,
}
async with httpx.AsyncClient(timeout=120.0) as client:
response = await client.post(
f"{OPENROUTER_BASE_URL}/chat/completions",
headers={
"Authorization": f"Bearer {settings.openrouter_api_key}",
"HTTP-Referer": "https://github.com/your-org/project",
"X-Title": "ML Engineering Platform",
},
json=payload,
)
response.raise_for_status()
data = response.json()
return data["choices"][0]["message"]["content"]
PDF Analysis Pattern
import base64
from pathlib import Path
async def analyze_pdf(pdf_path: Path, analysis_prompt: str) -> str:
pdf_bytes = pdf_path.read_bytes()
b64 = base64.b64encode(pdf_bytes).decode("utf-8")
return await call_gemini_flash(
prompt=analysis_prompt,
base64_content=b64,
media_type="application/pdf",
thinking_level="medium",
)
Standard Analysis Prompts
For intake phase (analyzing client documents):
INTAKE_SYSTEM_PROMPT = """
You are analyzing a client document to extract structured requirements.
Return a JSON object with these fields:
- business_goal: str
- key_stakeholders: list[str]
- data_sources: list[dict with name, format, volume]
- use_cases: list[str]
- constraints: list[str]
- open_questions: list[str]
Be thorough. Every ambiguity should appear in open_questions.
Return ONLY valid JSON, no markdown fences.
"""
.env Keys Required
OPENROUTER_API_KEY=sk-or-...
Cost Awareness
Gemini 3 Flash Preview pricing on OpenRouter: ~$0.0005/1k input tokens, ~$0.003/1k output. A 300-page PDF (≈150k tokens) costs approximately $0.075 to analyze. Always reasonable.
For documents that need Pro-level reasoning (very complex technical analysis):
use google/gemini-3-flash-preview with thinking_level: "high" before escalating to Pro.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
critical-analysis
prompt-engineering
claude-api-patterns
htmx-frontend
github-actions
python-project-standards
Didn't find tool you were looking for?