Agent skill
oma-pdf
Convert PDF files to Markdown using opendataloader-pdf. Extracts text, tables, headings, lists, and images with correct reading order. Use for PDF parsing, PDF to Markdown conversion, document extraction, and AI-ready data preparation.
Install this agent skill to your Project
npx add-skill https://github.com/first-fluke/oh-my-agent/tree/main/.agents/skills/oma-pdf
SKILL.md
PDF Skill - PDF to Markdown Conversion
When to use
- Converting PDF documents to Markdown for LLM context or RAG
- Extracting structured content (tables, headings, lists) from PDFs
- Preparing PDF data for AI consumption
- User says "convert this PDF", "parse PDF", "PDF to markdown", "read this PDF"
When NOT to use
- Generating or creating PDFs -> use appropriate document tools
- Editing existing PDFs -> out of scope
- Simple file reading of already-text files -> use Read tool directly
Core Rules
- Use
uvx opendataloader-pdfto run — no installation required - Default output format is Markdown
- If no output directory specified, output to the same directory as the input PDF
- Preserve document structure: headings, tables, lists, images
- For scanned PDFs, use hybrid mode with OCR
- Always run
uvx mdformaton the output to normalize Markdown formatting - Validate the output Markdown is readable and well-structured
- Report any conversion issues (missing tables, garbled text) to the user
How to Execute
Follow resources/execution-protocol.md step by step.
Quick Reference
Basic conversion (single file)
uvx opendataloader-pdf input.pdf
Specify output directory
uvx opendataloader-pdf input.pdf --output-dir ./output/
Multiple files or folder
uvx opendataloader-pdf file1.pdf file2.pdf folder/
With OCR (scanned PDFs)
Requires hybrid mode server:
uvx opendataloader-pdf-hybrid --port 5002 --force-ocr --ocr-lang "ko,en"
uvx opendataloader-pdf --hybrid docling-fast input.pdf
With image extraction (embedded base64)
uvx opendataloader-pdf input.pdf --image-output embedded --image-format png
With Tagged PDF structure
uvx opendataloader-pdf input.pdf --use-struct-tree
Output Formats
| Format | Flag | Use case |
|---|---|---|
| Markdown | --format markdown |
Default. Clean text for LLM/RAG |
| JSON | --format json |
Structured data with bounding boxes |
| HTML | --format html |
Web display |
| Text | --format text |
Plain text extraction |
| Combined | --format markdown,json |
Multiple formats at once |
Configuration
Project-specific settings: config/pdf-config.yaml
Troubleshooting
| Issue | Solution |
|---|---|
| Garbled text in output | Try --use-struct-tree for Tagged PDFs |
| Scanned PDF (no text layer) | Use hybrid mode with --force-ocr |
| Tables not extracted properly | Use hybrid mode for complex/borderless tables |
| Non-English PDF | Add --ocr-lang with appropriate language codes |
| Large PDF (100+ pages) | Process in page ranges or use batch mode |
| Formula not extracted | Use hybrid mode with --enrich-formula |
References
- Execution steps:
resources/execution-protocol.md - Configuration:
config/pdf-config.yaml - Context loading:
../_shared/core/context-loading.md - Quality principles:
../_shared/core/quality-principles.md
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
oma-mobile
Mobile specialist for Flutter, React Native, and cross-platform mobile development. Use for mobile app, Flutter, Dart, iOS, Android, Riverpod, and widget work.
oma-frontend
Frontend specialist for React, Next.js, TypeScript with FSD-lite architecture, shadcn/ui, and design system alignment. Use for UI, component, page, layout, CSS, Tailwind, and shadcn work.
oma-backend
Backend specialist for APIs, databases, authentication with clean architecture (Repository/Service/Router pattern). Use for API, endpoint, REST, database, server, migration, and auth work.
oma-brainstorm
Design-first ideation that explores user intent, constraints, and approaches before any planning or implementation. Use for brainstorming, ideation, exploring concepts, and evaluating approaches.
oma-scm
SCM (software configuration management) and Git — branching, merges, conflicts, worktrees, baselines, audit readiness, plus Conventional Commits and safe staging.
oma-translator
Context-aware translation that preserves tone, style, and natural word order. Use when translating UI strings, documentation, marketing copy, or any multilingual content. Infers register, domain, and style from the source text and surrounding codebase context.
Didn't find tool you were looking for?