Agent skill

pdf

Kit de ferramentas abrangente para manipulação de PDFs, incluindo extração de texto e tabelas, criação de novos PDFs, mesclagem/divisão de documentos e manipulação de formulários. Quando Claude precisa preencher um formulário PDF ou processar, gerar ou analisar documentos PDF programaticamente em escala.

View SKILL.md on GitHub Repository

Stars 0

Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/diegofornalha/document-skills/tree/main/pdf

SKILL.md

Guia de Processamento de PDF

Visão Geral

Este guia cobre operações essenciais de processamento de PDF usando bibliotecas Python e ferramentas de linha de comando. Para recursos avançados, bibliotecas JavaScript e exemplos detalhados, consulte reference.md. Se você precisa preencher um formulário PDF, leia forms.md e siga suas instruções.

Início Rápido

python

from pypdf import PdfReader, PdfWriter

# Ler um PDF
reader = PdfReader("document.pdf")
print(f"Páginas: {len(reader.pages)}")

# Extrair texto
text = ""
for page in reader.pages:
    text += page.extract_text()

Bibliotecas Python

pypdf - Operações Básicas

Mesclar PDFs

python

from pypdf import PdfWriter, PdfReader

writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
    reader = PdfReader(pdf_file)
    for page in reader.pages:
        writer.add_page(page)

with open("merged.pdf", "wb") as output:
    writer.write(output)

Dividir PDF

python

reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
    writer = PdfWriter()
    writer.add_page(page)
    with open(f"page_{i+1}.pdf", "wb") as output:
        writer.write(output)

Extrair Metadados

python

reader = PdfReader("document.pdf")
meta = reader.metadata
print(f"Título: {meta.title}")
print(f"Autor: {meta.author}")
print(f"Assunto: {meta.subject}")
print(f"Criador: {meta.creator}")

Rotacionar Páginas

python

reader = PdfReader("input.pdf")
writer = PdfWriter()

page = reader.pages[0]
page.rotate(90)  # Rotacionar 90 graus no sentido horário
writer.add_page(page)

with open("rotated.pdf", "wb") as output:
    writer.write(output)

pdfplumber - Extração de Texto e Tabelas

Extrair Texto com Layout

python

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    for page in pdf.pages:
        text = page.extract_text()
        print(text)

Extrair Tabelas

python

with pdfplumber.open("document.pdf") as pdf:
    for i, page in enumerate(pdf.pages):
        tables = page.extract_tables()
        for j, table in enumerate(tables):
            print(f"Tabela {j+1} na página {i+1}:")
            for row in table:
                print(row)

Extração Avançada de Tabelas

python

import pandas as pd

with pdfplumber.open("document.pdf") as pdf:
    all_tables = []
    for page in pdf.pages:
        tables = page.extract_tables()
        for table in tables:
            if table:  # Verificar se a tabela não está vazia
                df = pd.DataFrame(table[1:], columns=table[0])
                all_tables.append(df)

# Combinar todas as tabelas
if all_tables:
    combined_df = pd.concat(all_tables, ignore_index=True)
    combined_df.to_excel("extracted_tables.xlsx", index=False)

reportlab - Criar PDFs

Criação Básica de PDF

python

from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

c = canvas.Canvas("hello.pdf", pagesize=letter)
width, height = letter

# Adicionar texto
c.drawString(100, height - 100, "Olá Mundo!")
c.drawString(100, height - 120, "Este é um PDF criado com reportlab")

# Adicionar uma linha
c.line(100, height - 140, 400, height - 140)

# Salvar
c.save()

Criar PDF com Múltiplas Páginas

python

from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
from reportlab.lib.styles import getSampleStyleSheet

doc = SimpleDocTemplate("report.pdf", pagesize=letter)
styles = getSampleStyleSheet()
story = []

# Adicionar conteúdo
title = Paragraph("Título do Relatório", styles['Title'])
story.append(title)
story.append(Spacer(1, 12))

body = Paragraph("Este é o corpo do relatório. " * 20, styles['Normal'])
story.append(body)
story.append(PageBreak())

# Página 2
story.append(Paragraph("Página 2", styles['Heading1']))
story.append(Paragraph("Conteúdo da página 2", styles['Normal']))

# Construir PDF
doc.build(story)

Ferramentas de Linha de Comando

pdftotext (poppler-utils)

bash

# Extrair texto
pdftotext input.pdf output.txt

# Extrair texto preservando layout
pdftotext -layout input.pdf output.txt

# Extrair páginas específicas
pdftotext -f 1 -l 5 input.pdf output.txt  # Páginas 1-5

qpdf

bash

# Mesclar PDFs
qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf

# Dividir páginas
qpdf input.pdf --pages . 1-5 -- pages1-5.pdf
qpdf input.pdf --pages . 6-10 -- pages6-10.pdf

# Rotacionar páginas
qpdf input.pdf output.pdf --rotate=+90:1  # Rotacionar página 1 em 90 graus

# Remover senha
qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf

pdftk (se disponível)

bash

# Mesclar
pdftk file1.pdf file2.pdf cat output merged.pdf

# Dividir
pdftk input.pdf burst

# Rotacionar
pdftk input.pdf rotate 1east output rotated.pdf

Tarefas Comuns

Extrair Texto de PDFs Digitalizados

python

# Requer: pip install pytesseract pdf2image
import pytesseract
from pdf2image import convert_from_path

# Converter PDF para imagens
images = convert_from_path('scanned.pdf')

# OCR de cada página
text = ""
for i, image in enumerate(images):
    text += f"Página {i+1}:\n"
    text += pytesseract.image_to_string(image)
    text += "\n\n"

print(text)

Adicionar Marca d'Água

python

from pypdf import PdfReader, PdfWriter

# Criar marca d'água (ou carregar existente)
watermark = PdfReader("watermark.pdf").pages[0]

# Aplicar a todas as páginas
reader = PdfReader("document.pdf")
writer = PdfWriter()

for page in reader.pages:
    page.merge_page(watermark)
    writer.add_page(page)

with open("watermarked.pdf", "wb") as output:
    writer.write(output)

Extrair Imagens

bash

# Usando pdfimages (poppler-utils)
pdfimages -j input.pdf output_prefix

# Isto extrai todas as imagens como output_prefix-000.jpg, output_prefix-001.jpg, etc.

Proteção por Senha

python

from pypdf import PdfReader, PdfWriter

reader = PdfReader("input.pdf")
writer = PdfWriter()

for page in reader.pages:
    writer.add_page(page)

# Adicionar senha
writer.encrypt("userpassword", "ownerpassword")

with open("encrypted.pdf", "wb") as output:
    writer.write(output)

Referência Rápida

Tarefa	Melhor Ferramenta	Comando/Código
Mesclar PDFs	pypdf	`writer.add_page(page)`
Dividir PDFs	pypdf	Uma página por arquivo
Extrair texto	pdfplumber	`page.extract_text()`
Extrair tabelas	pdfplumber	`page.extract_tables()`
Criar PDFs	reportlab	Canvas ou Platypus
Mesclar via linha de comando	qpdf	`qpdf --empty --pages ...`
OCR em PDFs digitalizados	pytesseract	Converter para imagem primeiro
Preencher formulários PDF	pdf-lib ou pypdf (veja forms.md)	Veja forms.md

Próximos Passos

Para uso avançado do pypdfium2, consulte reference.md
Para bibliotecas JavaScript (pdf-lib), consulte reference.md
Se você precisa preencher um formulário PDF, siga as instruções em forms.md
Para guias de solução de problemas, consulte reference.md

Maintainer

diegofornalha Core maintainer

Source details

Full Name: diegofornalha/document-skills
Branch: main
Path in repo: pdf

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

diegofornalha/document-skills

docx

Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When Claude needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks

0 0

Explore

diegofornalha/document-skills

xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

0 0

Explore

diegofornalha/document-skills

pptx

Presentation creation, editing, and analysis. When Claude needs to work with presentations (.pptx files) for: (1) Creating new presentations, (2) Modifying or editing content, (3) Working with layouts, (4) Adding comments or speaker notes, or any other presentation tasks

0 0

Explore

diegofornalha/plugin-rag-to-quiz

gerar-quiz-rag

Gerador e executor de quiz sobre conhecimento RAG. Comandos: '/gerar-quiz-rag gerar' para gerar perguntas do banco RAG e salvar em JSON, '/gerar-quiz-rag iniciar' para executar quiz de 10 perguntas, '/gerar-quiz-rag status' para ver progresso. Claude consulta o banco SQLite RAG e gera perguntas diretamente.

0 0

Explore

diegofornalha/skills-futuro

setup-linting-ci

Configura linting automático com pre-commit hooks e CI/CD para projetos Python

0 0

Explore

diegofornalha/skills-futuro

tech-debt-resolver

Loop automatizado de melhoria contínua que usa o Chat RAG para identificar débitos técnicos, implementa correções, reingere a base de conhecimento e valida até eliminar 100% dos débitos.

0 0

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Guia de Processamento de PDF

Visão Geral

Início Rápido

Bibliotecas Python

pypdf - Operações Básicas

Mesclar PDFs

Dividir PDF

Extrair Metadados

Rotacionar Páginas

pdfplumber - Extração de Texto e Tabelas

Extrair Texto com Layout

Extrair Tabelas

Extração Avançada de Tabelas

reportlab - Criar PDFs

Criação Básica de PDF

Criar PDF com Múltiplas Páginas

Ferramentas de Linha de Comando

pdftotext (poppler-utils)

qpdf

pdftk (se disponível)

Tarefas Comuns

Extrair Texto de PDFs Digitalizados

Adicionar Marca d'Água

Extrair Imagens

Proteção por Senha

Referência Rápida

Próximos Passos

Recommended Agent Skills

docx

xlsx

pptx

gerar-quiz-rag

setup-linting-ci

tech-debt-resolver