Agent skill
extract-text-pdf
Extract text from PDF files using PyMuPDF. Use this skill when you need to read the contents of a PDF file, such as a resume, report, or manual, into plain text for analysis or processing.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/extract-text-pdf
SKILL.md
Extract Text from PDF
Overview
This skill provides a reliable way to extract text from PDF files using the pymupdf library (also known as fitz). It correctly handles document structure and encoding better than many basic tools.
Prerequisites
This skill requires the pymupdf Python library.
pip install pymupdf
Usage
Extract Text Script
The skill includes a Python script scripts/extract_pdf_text.py that extracts text from a PDF file.
Syntax:
python3 .agent/skills/extract-text-pdf/scripts/extract_pdf_text.py <path_to_pdf> [--layout]
Arguments:
path_to_pdf: The absolute path to the PDF file you want to read.--layout: (Optional) precise layout preservation. By default, the script extracts text in natural reading order.
Example:
# Extract text from a resume
python3 .agent/skills/extract-text-pdf/scripts/extract_pdf_text.py /Users/user/documents/resume.pdf
# Capture output to a file
python3 .agent/skills/extract-text-pdf/scripts/extract_pdf_text.py /path/to/doc.pdf > extracted_text.txt
When to Use
Use this skill when:
- You need to read the content of a PDF file.
- You want to analyze text data from a PDF (e.g., parsing a resume).
- Simple checks (
cat,grep) won't work because the file is binary PDF format.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?