Agent skill
policy-document-parser
Extracts and structures company policy information from PDF documents, particularly travel expense policies with destination-specific caps. Reads policy PDFs, extracts structured data about expense limits by location and employee level, and converts policy tables into machine-readable formats for automated validation.
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/policy-document-parser
SKILL.md
Policy Document Parser
When to Use
Use this skill when you need to:
- Analyze company policy documents (PDF format)
- Extract structured expense limits from travel policy tables
- Parse destination-specific caps by employee level
- Convert policy tables into machine-readable formats (JSON/CSV)
- Prepare policy data for automated expense validation systems
Core Workflow
1. Initial Setup & Document Discovery
- Scan the workspace for policy documents (typically PDF files)
- Identify the main policy document (look for files like
policy_en.pdf,travel_policy.pdf, etc.) - Use
filesystem-list_directoryto explore the workspace structure
2. Policy Document Analysis
- Read the PDF document using
pdf-tools-read_pdf_pages - Extract the full text content for analysis
- Identify the document structure and locate policy tables
3. Table Extraction & Parsing
- Focus on sections containing destination-specific expense caps
- Look for tables with the following structure:
- Destination cities/countries
- Employee levels (L1, L2, L3, L4, etc.)
- Category caps (Accommodation, Meals, Transportation, Communication, Miscellaneous)
- Per-day or per-trip limits
4. Data Structuring
- Convert extracted tables into structured JSON format
- Organize data by:
city → employee_level → category → limit - Include global rules (receipt thresholds, airfare policies, etc.) as separate metadata
5. Output Generation
- Create machine-readable policy files (JSON preferred)
- Generate summary reports of extracted policy limits
- Prepare data for integration with expense validation systems
Key Patterns from Trajectory
Document Structure Recognition
The policy PDF typically contains:
- Header with policy name, effective date, currency
- Global rules section (airfare, receipt thresholds, exceptions)
- Destination-specific tables with caps by employee level
- Multiple pages with consistent table formatting
Data Extraction Strategy
- Extract all pages first to understand full document scope
- Look for patterns like "Destination-Specific Caps" headers
- Parse tables with city names followed by level-based caps
- Capture per-day vs per-trip limits (Transportation vs Communication/Miscellaneous)
Error Handling
- Handle missing or malformed tables gracefully
- Validate extracted data against expected structure
- Log parsing issues for manual review
Output Formats
Primary Output: Structured Policy JSON
Didn't find tool you were looking for?