Agent skill

pdf-to-markdown

Convert PDF files to Markdown. Use when extracting text from PDFs, creating editable documentation from PDF reports, or converting PDF content to version-controlled markdown files.

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/pdf-to-markdown

SKILL.md

pdf-to-markdown

Convert PDF files to Markdown format.

Installation Required

bash

cd .claude/skills/pdf-to-markdown
npm install

Dependencies: pdf-parse

Quick Start

bash

# Basic conversion
node .claude/skills/pdf-to-markdown/scripts/convert.cjs \
  --file ./document.pdf

# Custom output path
node .claude/skills/pdf-to-markdown/scripts/convert.cjs \
  --file ./doc.pdf \
  --output ./output/doc.md

CLI Options

Option	Required	Description
`--file <path>`	Yes	Input PDF file
`--output <path>`	No	Output Markdown path (default: input name + .md)

Output Format (JSON)

json

{
  "success": true,
  "input": "/path/to/input.pdf",
  "output": "/path/to/output.md",
  "wordCount": 1523,
  "warnings": ["Tables may not be accurately converted"]
}

Supported Elements

Text extraction from digital PDFs
Headings (detected by font size heuristics)
Paragraphs
Basic lists
Links (when embedded in PDF)

Known Limitations

Tables: Very limited support; may not render correctly
Multi-column layouts: Text may interleave between columns
Scanned PDFs: NOT supported (requires OCR - see alternatives below)
Images: NOT extracted (PDF images are not included in output)
Complex formatting: May be simplified or lost
Password-protected PDFs: NOT supported

Alternatives for Unsupported Cases

For scanned PDFs (OCR needed):

Use scribe.js-ocr library (AGPL license)
Commercial OCR services (Google Cloud Vision, AWS Textract)

For complex tables:

Consider AI-based extraction (LLM post-processing)
Manual review and correction

For image extraction:

Use unpdf library with sharp for image extraction
Process images separately and reference in markdown

Troubleshooting

Dependencies not found: Run npm install in skill directory Empty output: PDF may be scanned/image-based (requires OCR) Garbled text: PDF may use embedded fonts not supported by parser Memory issues: Large PDFs may require --max-old-space-size=4096 flag

IMPORTANT Task Planning Notes

Always plan and break many small todo tasks
Always add a final review todo task to review the works done at the end to find any fix or enhancement needed

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/pdf-to-markdown
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

pdf-to-markdown

Installation Required

Quick Start

CLI Options

Output Format (JSON)

Supported Elements

Known Limitations

Alternatives for Unsupported Cases

Troubleshooting

IMPORTANT Task Planning Notes

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state