Agent skill

PDF Processing

Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.

View SKILL.md on GitHub Repository

Stars 152

Forks 20

Install this agent skill to your Project

npx add-skill https://github.com/Microck/ordinary-claude-skills/tree/main/skills_categorized/content-creation/pdf-processing

SKILL.md

PDF Processing

Quick start

Use pdfplumber to extract text from PDFs:

python

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    text = pdf.pages[0].extract_text()
    print(text)

Extracting tables

Extract tables from PDFs with automatic detection:

python

import pdfplumber

with pdfplumber.open("report.pdf") as pdf:
    page = pdf.pages[0]
    tables = page.extract_tables()

    for table in tables:
        for row in table:
            print(row)

Extracting all pages

Process multi-page documents efficiently:

python

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    full_text = ""
    for page in pdf.pages:
        full_text += page.extract_text() + "\n\n"

    print(full_text)

Form filling

For PDF form filling, see FORMS.md for the complete guide including field analysis and validation.

Merging PDFs

Combine multiple PDF files:

python

from pypdf import PdfMerger

merger = PdfMerger()

for pdf in ["file1.pdf", "file2.pdf", "file3.pdf"]:
    merger.append(pdf)

merger.write("merged.pdf")
merger.close()

Splitting PDFs

Extract specific pages or ranges:

python

from pypdf import PdfReader, PdfWriter

reader = PdfReader("input.pdf")
writer = PdfWriter()

# Extract pages 2-5
for page_num in range(1, 5):
    writer.add_page(reader.pages[page_num])

with open("output.pdf", "wb") as output:
    writer.write(output)

Available packages

pdfplumber - Text and table extraction (recommended)
pypdf - PDF manipulation, merging, splitting
pdf2image - Convert PDFs to images (requires poppler)
pytesseract - OCR for scanned PDFs (requires tesseract)

Common patterns

Extract and save text:

python

import pdfplumber

with pdfplumber.open("input.pdf") as pdf:
    text = "\n\n".join(page.extract_text() for page in pdf.pages)

with open("output.txt", "w") as f:
    f.write(text)

Extract tables to CSV:

python

import pdfplumber
import csv

with pdfplumber.open("tables.pdf") as pdf:
    tables = pdf.pages[0].extract_tables()

    with open("output.csv", "w", newline="") as f:
        writer = csv.writer(f)
        for table in tables:
            writer.writerows(table)

Error handling

Handle common PDF issues:

python

import pdfplumber

try:
    with pdfplumber.open("document.pdf") as pdf:
        if len(pdf.pages) == 0:
            print("PDF has no pages")
        else:
            text = pdf.pages[0].extract_text()
            if text is None or text.strip() == "":
                print("Page contains no extractable text (might be scanned)")
            else:
                print(text)
except Exception as e:
    print(f"Error processing PDF: {e}")

Performance tips

Process pages in batches for large PDFs
Use multiprocessing for multiple files
Extract only needed pages rather than entire document
Close PDF objects after use

Maintainer

Microck Core maintainer

Source details

Full Name: Microck/ordinary-claude-skills
Branch: main
Path in repo: skills_categorized/content-creation/pdf-processing
License: Other
Topics: claude-code claude claude-skills collection list

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

Microck/ordinary-claude-skills

nondominium-holochain-dna-dev

Specialized skill for nondominium Holochain DNA development, focusing on zome creation, entry patterns, integrity/coordinator architecture, ValueFlows compliance, and WASM optimization. Use when creating new zomes, implementing entry types, or modifying Holochain DNA code.

152 20

Explore

Microck/ordinary-claude-skills

fluidsim

Framework for computational fluid dynamics simulations using Python. Use when running fluid dynamics simulations including Navier-Stokes equations (2D/3D), shallow water equations, stratified flows, or when analyzing turbulence, vortex dynamics, or geophysical flows. Provides pseudospectral methods with FFT, HPC support, and comprehensive output analysis.

152 20

Explore

Microck/ordinary-claude-skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

152 20

Explore

Microck/ordinary-claude-skills

run-tests

Validate code changes by intelligently selecting and running the appropriate test suites. Use this when editing code to verify changes work correctly, run tests, validate functionality, or check for regressions. Automatically discovers affected test suites, selects the minimal set of venvs needed for validation, and handles test execution with Docker services as needed.

152 20

Explore

Microck/ordinary-claude-skills

skill-navigator

The 100th skill! Your intelligent guide to all 99 other skills. Recommends the perfect skill for any task, creates skill combinations, and helps you discover capabilities you didn't know you had.

152 20

Explore

Microck/ordinary-claude-skills

AgentDB Advanced Features

Master advanced AgentDB features including QUIC synchronization, multi-database management, custom distance metrics, hybrid search, and distributed systems integration. Use when building distributed AI systems, multi-agent coordination, or advanced vector search applications.

152 20

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

PDF Processing

Quick start

Extracting tables

Extracting all pages

Form filling

Merging PDFs

Splitting PDFs

Available packages

Common patterns

Error handling

Performance tips

Recommended Agent Skills

nondominium-holochain-dna-dev

fluidsim

metabolomics-workbench-database

run-tests

skill-navigator

AgentDB Advanced Features