Agent skill

create-table

Create PDF tables using ReportLab Table class for proper structure detection. Creates tables that Marker and other extractors can parse. Use when you need detectable tables in PDFs.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/create-table

Metadata

Additional technical details for this skill

short description
Create detectable PDF tables (ReportLab)

SKILL.md

create-table

Create PDF tables using ReportLab's Table class that extractors (Marker, Camelot) can detect.

Why This Exists

PyMuPDF raw drawing commands (draw_line, insert_text) create visual tables but not proper PDF table structures. Marker and other extractors cannot detect these as tables.

ReportLab's Table class creates proper PDF table structures with cell boundaries that extraction tools recognize.

Quick Start

bash
cd .pi/skills/create-table

# Generate from preset (simple, medium, complex)
uv run generate.py preset simple
uv run generate.py preset complex --output my_table.pdf

# Generate with custom data
uv run generate.py generate --output custom.pdf \
  --columns "ID,Name,Status" \
  --rows '[["1","Alice","Active"],["2","Bob","Pending"]]'

# Borderless table
uv run generate.py generate --no-border --output borderless.pdf

# Thick borders (3pt)
uv run generate.py generate --line-width 3 --output thick.pdf

# Render as image (for scanned document testing)
uv run generate.py preset medium --as-image --dpi 200

# Generate all examples
uv run generate.py example

Commands

preset - Generate from preset complexity

bash
uv run generate.py preset <name> [options]
Preset Description
simple 3x3 basic table
medium 8-row requirements matrix with 6 columns
complex Multi-index headers with column spans

generate - Create custom table

bash
uv run generate.py generate [options]

Options:

Option Short Description Default
--output -o Output PDF/PNG path table_fixture.pdf
--columns -c Comma-separated column headers ID,Name,Value
--rows -r JSON array of row data 3 sample rows
--spec -s JSON spec file path -
--title -t Table title text -
--style grid, plain, colored grid
--border/--no-border Enable or disable borders --border
--line-width -w Border thickness in points 1.0
--multi-index -m JSON multi-index headers -
--as-image -i Render as PNG image False
--dpi Image resolution 150

example - Generate all examples

bash
uv run generate.py example

Creates examples/ directory with all presets, styles, and border variations.

Multi-Index Tables

For hierarchical headers that span columns:

bash
uv run generate.py generate \
  --columns "ID,Q1,Q2,Q3,Q4,Total" \
  --rows '[["A","10","20","30","40","100"]]' \
  --multi-index '[[{"text":"","span":1},{"text":"2024 Quarters","span":4},{"text":"","span":1}]]'

Scanned Document Simulation

Generate table as a rasterized PNG image (for testing OCR/VLM extraction):

bash
# 150 DPI (default)
uv run generate.py preset medium --as-image

# Higher quality (300 DPI)
uv run generate.py preset complex --as-image --dpi 300 --output scanned_table.png

Verification

Test that Camelot detects the table:

bash
uv run generate.py preset simple --output test.pdf

python -c "
import camelot
tables = camelot.read_pdf('test.pdf', pages='1', flavor='lattice')
print(f'Tables found: {len(tables)}')
if tables:
    print(tables[0].df)
"

Dependencies

Self-contained via uv run - no pre-installation needed.

toml
[project]
dependencies = [
    "reportlab>=4.0.0",
    "typer>=0.9.0",
    "pymupdf>=1.23.0",
]

Didn't find tool you were looking for?

Be as detailed as possible for better results