Agent skills
pdf-processing-pro

Agent skill

pdf-processing-pro

Production-ready PDF processing with forms, tables, OCR, validation, and batch operations. Use when working with complex PDF workflows in production environments, processing large volumes of PDFs, or requiring robust error handling and validation. Do NOT use for simple text extraction - use pdf-extract for quick reads.

View SKILL.md on GitHub Repository

Stars 46

Forks 8

Install this agent skill to your Project

npx add-skill https://github.com/henkisdabro/wookstar-claude-plugins/tree/main/plugins/documents/skills/pdf-processing-pro

SKILL.md

PDF Processing Pro

Production-ready PDF processing toolkit with pre-built scripts, comprehensive error handling, and support for complex workflows.

Quick start

Extract text from PDF

python

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    text = pdf.pages[0].extract_text()
    print(text)

Analyse PDF form (using included script)

bash

python scripts/analyze_form.py input.pdf --output fields.json
# Returns: JSON with all form fields, types, and positions

Fill PDF form with validation

bash

python scripts/fill_form.py input.pdf data.json output.pdf
# Validates all fields before filling, includes error reporting

Extract tables from PDF

bash

python scripts/extract_tables.py report.pdf --output tables.csv
# Extracts all tables with automatic column detection

Features

Production-ready scripts

Error handling with detailed messages and proper exit codes
Input validation, type checking, and configurable logging
Full type annotations and CLI interface (--help on all scripts)

Comprehensive workflows

PDF forms, table extraction, OCR processing
Batch operations, pre/post-processing validation

Advanced topics

PDF form processing

Complete form workflows including field analysis, dynamic filling, validation rules, multi-page forms, and checkbox/radio handling. See references/forms.md.

Table extraction

Complex table extraction including multi-page tables, merged cells, nested tables, custom detection, and CSV/Excel export. See references/tables.md.

OCR processing

Scanned PDFs and image-based documents including Tesseract integration, language support, image preprocessing, and confidence scoring. See references/ocr.md.

Included scripts

Script	Purpose	Usage
analyze_form.py	Extract form field info	`python scripts/analyze_form.py input.pdf [--output fields.json] [--verbose]`
fill_form.py	Fill PDF forms with data	`python scripts/fill_form.py input.pdf data.json output.pdf [--validate]`
validate_form.py	Validate form data before filling	`python scripts/validate_form.py data.json schema.json`
extract_tables.py	Extract tables to CSV/Excel	`python scripts/extract_tables.py input.pdf [--output tables.csv] [--format csv\|excel]`
extract_text.py	Extract text with formatting	`python scripts/extract_text.py input.pdf [--output text.txt] [--preserve-formatting]`
merge_pdfs.py	Merge multiple PDFs	`python scripts/merge_pdfs.py file1.pdf file2.pdf --output merged.pdf`
split_pdf.py	Split PDF into pages	`python scripts/split_pdf.py input.pdf --output-dir pages/`
validate_pdf.py	Validate PDF integrity	`python scripts/validate_pdf.py input.pdf`

Dependencies

All scripts require:

bash

pip install pdfplumber pypdf pillow pytesseract pandas

Optional for OCR:

bash

# macOS: brew install tesseract
# Ubuntu: apt-get install tesseract-ocr
# Windows: Download from GitHub releases

References

File	Contents
references/forms.md	Complete form processing guide
references/tables.md	Advanced table extraction
references/ocr.md	Scanned PDF processing
references/workflows.md	Common workflows, error handling, performance tips, best practices
references/troubleshooting.md	Troubleshooting common issues and getting help

Maintainer

henkisdabro Core maintainer

Source details

Full Name: henkisdabro/wookstar-claude-plugins
Branch: main
Path in repo: plugins/documents/skills/pdf-processing-pro
Topics: claude-code

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

henkisdabro/wookstar-claude-plugins

tampermonkey

Write Tampermonkey userscripts for browser automation, page modification, and web enhancement. Use when creating browser scripts, writing greasemonkey scripts, automating user interactions, injecting CSS or JavaScript into web pages, modifying website behaviour, building browser extensions, hiding unwanted page elements, adding form auto-fill, scraping website data, intercepting requests, detecting URL changes in SPAs, or storing persistent user preferences. Covers userscript headers (@match, @grant, @require), synchronous and async GM_* API functions, common patterns (DOM mutation, URL change detection, element waiting), security sandboxing, and cross-browser compatibility (Chrome, Firefox, Edge).

46 8

Explore

henkisdabro/wookstar-claude-plugins

google-analytics

Comprehensive Google Analytics 4 guide covering property setup, events, custom events, recommended events, custom dimensions, user tracking, audiences, reporting, BigQuery integration, gtag.js implementation, GTM integration, Measurement Protocol, DebugView, privacy compliance, and data management. Use when working with GA4 implementation, tracking, analysis, or any GA4-related tasks.

46 8

Explore

henkisdabro/wookstar-claude-plugins

docx

Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. Use when working with professional documents (.docx files) for creating new documents, modifying or editing content, working with tracked changes, adding comments, or any other document tasks. Do NOT use for creating proposals, letters, or client-facing business documents from scratch - use document-builder for those.

46 8

Explore

henkisdabro/wookstar-claude-plugins

xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualisation. Use when working with spreadsheets (.xlsx, .xlsm, .csv, .tsv) for creating new spreadsheets with formulas and formatting, reading or analysing data, modifying existing spreadsheets while preserving formulas, data analysis and visualisation, or recalculating formulas.

46 8

Explore

henkisdabro/wookstar-claude-plugins

prp-generator

Generate comprehensive Product Requirement Plans (PRPs) for feature implementation with thorough codebase analysis and external research. Use when the user requests a PRP, PRD, or detailed implementation plan for a new feature. Conducts systematic research, identifies patterns, and creates executable validation gates for one-pass implementation success. Do NOT use for client discovery, requirements gathering, or scope definition - use scope-clarifier for those.

46 8

Explore

henkisdabro/wookstar-claude-plugins

webapp-testing

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behaviour, capturing browser screenshots, and viewing browser logs. Use when user asks to test a web app, verify UI, capture screenshots, check browser logs, or debug frontend issues.

46 8

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

PDF Processing Pro

Quick start

Extract text from PDF

Analyse PDF form (using included script)

Fill PDF form with validation

Extract tables from PDF

Features

Production-ready scripts

Comprehensive workflows

Advanced topics

PDF form processing

Table extraction

OCR processing

Included scripts

Dependencies

References

Recommended Agent Skills

tampermonkey

google-analytics

docx

xlsx

prp-generator

webapp-testing