Agent skill

docx-template-filling

Fill DOCX template forms preserving 100% original structure - logos, footers, styles, metadata. Zero-artifact insertion. Output indistinguishable from manual entry.

Stars 44
Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/belumume/claude-skills/tree/main/web-desktop-exports/docx-template-filling

SKILL.md

DOCX Template Filling - Forensic Preservation

Fill template forms programmatically with zero detectable artifacts. The filled document must be indistinguishable from manual typing in the original template.

When to Use This Skill

Invoke when:

  • Filling standardized forms and templates
  • Completing application forms
  • Responding to questionnaires and surveys
  • Processing template-based documents
  • Any scenario where the recipient must not detect programmatic manipulation

Critical requirement: Template integrity must be 100% preserved (logos, footers, headers, styles, metadata, element structure).

Core Philosophy: Preservation Over Recreation

WRONG approach: Extract content from template, generate new document

  • Loses metadata
  • Changes element IDs
  • Alters styles subtly
  • Creates detectable artifacts

RIGHT approach: Load template, insert content at anchor points using XML API

  • Preserves all original elements
  • Maintains metadata
  • Zero structural changes
  • Indistinguishable from manual entry

Critical Anti-Patterns

❌ NEVER: Use pandoc with --reference-doc

bash
# This SEEMS correct but ONLY copies styles, NOT structure
pandoc content.md -o output.docx --reference-doc=template.docx

What happens:

  • Template's tables disappear
  • Logos, headers, footers lost
  • Only style definitions copied
  • Looks completely different

Why it fails: --reference-doc means "copy the style definitions," NOT "preserve the document structure"

❌ NEVER: Append content at the end

python
# This destroys template structure
template = Document('template.docx')

# Remove content after markers
# ... (deletion logic)

# Append all new content at end
for para in new_content:
    template.add_paragraph(para.text)  # WRONG!

What happens:

  • Template questions appear unanswered
  • All answers grouped at end
  • Structure broken
  • Obviously programmatic

❌ NEVER: Recreate tables

python
# DON'T copy table structure and rebuild
new_table = template.add_table(rows=3, cols=2)
# Even if copying all properties, it's not the original!

What happens:

  • Loses original element IDs
  • Style inheritance breaks
  • Metadata changes
  • Detectable as modified

Essential Workflow

Step 1: Inspect Template Structure FIRST

Always inspect before modifying. Never assume structure.

Use the provided inspection script:

bash
python scripts/inspect_template.py template.docx

This prints:

  • All tables with identities
  • Potential anchor points (paragraphs ending with ":", "Answer:", etc.)
  • Headers and footers
  • Document element counts

Why critical: Prevents modifying wrong tables, missing anchors, breaking structure.

Step 2: Selective Table Filling

Modify cells in place. Never recreate tables.

python
from docx import Document

template = Document('template.docx')

# Fill specific cells in existing table
info_table = template.tables[0]
info_table.rows[0].cells[1].text = "Jane Smith"
info_table.rows[1].cells[1].text = "S12345"

# Table structure, styles, borders all preserved

Principle: Modify existing cells. Never remove and recreate.

Step 3: Anchor-Based Content Insertion

Insert content at specific positions using XML API.

python
# Find anchor paragraphs
anchor_positions = []
for i, para in enumerate(template.paragraphs):
    if para.text.strip() == "Answer:":
        anchor_positions.append(i)

# Insert content after anchor using XML API
def insert_after(doc, anchor_idx, content_paras):
    anchor_elem = doc.paragraphs[anchor_idx]._element
    parent = anchor_elem.getparent()

    for offset, para in enumerate(content_paras):
        parent.insert(
            parent.index(anchor_elem) + 1 + offset,
            para._element
        )

# Load content to insert
content_doc = Document('my_content.docx')
section_paragraphs = content_doc.paragraphs[5:64]

# Insert at anchor
insert_after(template, anchor_positions[0], section_paragraphs)

# Save
template.save('completed.docx')

Why XML API:

  • doc.add_paragraph() appends at end → wrong position
  • para.insert_paragraph_before() has stale reference issues
  • XML API: direct element manipulation → correct position, zero artifacts

Step 4: Multi-Anchor Insertion (Reverse Order)

When inserting at multiple positions, insert from bottom to top to preserve earlier indices.

python
# Template has anchors at paragraphs 18, 27, 37

# Insert in REVERSE order
insert_after(template, 37, section3_content)  # Last anchor first
insert_after(template, 27, section2_content)  # Middle still at 27
insert_after(template, 18, section1_content)  # First still at 18

Why reverse: Inserting content shifts later paragraph indices but not earlier ones.

Advanced Patterns

For detailed implementations, see references/patterns.md:

  • Content range extraction - Extract multi-section content between markers
  • Table identity detection - Identify tables when no IDs exist
  • Robust anchor matching - exact/partial/smart modes
  • Table repositioning - Move tables without recreating
  • Verification - Ensure zero artifacts after filling

Common Scenarios

Scenario 1: Form with Info Table + Q&A

python
template = Document('form_template.docx')

# Fill info table
info_table = template.tables[0]
info_table.rows[0].cells[1].text = "Applicant Name"

# Find "Answer:" anchors
anchors = [i for i, p in enumerate(template.paragraphs)
           if p.text.strip() == "Answer:"]

# Insert responses
responses = Document('my_responses.docx')
response_content = responses.paragraphs[5:30]

insert_after(template, anchors[0], response_content)

template.save('form_completed.docx')

Scenario 2: Report with Table Repositioning

python
template = Document('report_template.docx')

# Fill team table
team_table = template.tables[0]
team_table.rows[0].cells[1].text = "Team 5"

# Insert section content at anchors
# ... (insertion code)

# Move summary table to correct position
summary_heading_idx = next(i for i, p in enumerate(template.paragraphs)
                           if "Summary Table:" in p.text)

# Move table from end to after summary heading
# See references/patterns.md for move_table_to_position()

template.save('report_completed.docx')

Bundled Resources

Scripts

  • scripts/inspect_template.py - Inspect template structure before modification
    • Usage: python scripts/inspect_template.py <template.docx>
    • Prevents destructive mistakes by showing all tables, anchors, headers/footers

References

  • references/patterns.md - Detailed technical patterns
    • Content range extraction
    • Table identity detection strategies
    • XML-level insertion patterns
    • Multi-anchor workflows
    • Verification procedures
    • Complete code examples

Load patterns.md when implementing specific operations beyond basic workflow.

Verification Checklist

Template filling is successful if:

  • Filled document indistinguishable from manual entry
  • All template tables preserved (count unchanged unless expected)
  • Headers/footers unchanged
  • Logo(s) intact
  • Scoring/grading tables empty (if they should be)
  • Styles identical to original
  • Content inserted at correct anchor points (not at end)
  • Template owner cannot detect programmatic manipulation

Key Lessons

This skill documents patterns where:

  • Templates have info tables (to fill) and evaluation/scoring tables (preserve empty)
  • Multiple anchor points like "Answer:", "Response:", or "Solution:" for content insertion
  • Tables may need repositioning to correct sections
  • Document structure must remain intact (headers, footers, logos, branding)
  • Zero artifacts requirement (recipient cannot detect automation)

Use cases: Forms, questionnaires, standardized documents, applications, reports.

Core principle: Preservation over recreation. Never rebuild - always modify in place.

Expand your agent's capabilities with these related and highly-rated skills.

belumume/claude-skills

rtl-document-translation

Translate structured documents (DOCX) to RTL languages (Arabic, Hebrew, Urdu) while preserving exact formatting, table structures, colors, and layouts. Handles quote normalization, multi-pass translation matching, and RTL-specific formatting patterns.

44 0
Explore
belumume/claude-skills

document-quality-standards

Use when creating or editing documents (DOCX, PDF, XLSX, PPTX) that need professional output. Adds visual verification, typography hygiene, and formula patterns to the document-skills plugin.

44 0
Explore
belumume/claude-skills

ralph-loop

Activate autonomous Ralph Wiggum loop mode for iterative task completion. Use when you have a well-defined task with clear completion criteria that benefits from persistent, autonomous execution.

44 0
Explore
belumume/claude-skills

project-retrospective

44 0
Explore
belumume/claude-skills

docx-advanced-patterns

Advanced python-docx patterns for handling nested tables, complex cell structures, and content extraction beyond basic .text property. Complements the official docx skill with specialized techniques for forms, checklists, and complex layouts.

44 0
Explore
belumume/claude-skills

obsidian-study-vault-builder

Build comprehensive, mobile-compatible Obsidian study vaults from academic course materials with checkpoint-based workflow, error pattern recognition, and quality assurance. Battle-tested patterns from 828KB/37-file projects. Works across all subjects - CS, medicine, business, self-study.

44 0
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results