Agent skill

document-scanning

Document discovery, inventory building, and metadata extraction for accessibility audits. Use when scanning folders for Office documents (.docx, .xlsx, .pptx) and PDFs, building file inventories, detecting changes via git diff, or extracting document properties like title, author, and language.

Stars 217
Forks 22

Install this agent skill to your Project

npx add-skill https://github.com/Community-Access/accessibility-agents/tree/main/.github/skills/document-scanning

SKILL.md

Document Scanning

Supported File Types

Extension Type Sub-Agent
.docx Word document word-accessibility
.xlsx Excel workbook excel-accessibility
.pptx PowerPoint presentation powerpoint-accessibility
.pdf PDF document pdf-accessibility

File Discovery Commands

PowerShell (Windows)

powershell
# Non-recursive scan
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf

# Recursive scan
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf -Recurse |
  Where-Object { $_.Name -notlike '~$*' -and $_.Name -notlike '*.tmp' -and $_.Name -notlike '*.bak' } |
  Where-Object { $_.FullName -notmatch '[\\/](\.git|node_modules|__pycache__|\.vscode)[\\/]' }

Bash (macOS)

bash
# Non-recursive scan
find "<folder>" -maxdepth 1 -type f \( -name "*.docx" -o -name "*.xlsx" -o -name "*.pptx" -o -name "*.pdf" \) ! -name "~\$*"

# Recursive scan
find "<folder>" -type f \( -name "*.docx" -o -name "*.xlsx" -o -name "*.pptx" -o -name "*.pdf" \) \
  ! -name "~\$*" ! -name "*.tmp" ! -name "*.bak" \
  ! -path "*/.git/*" ! -path "*/node_modules/*" ! -path "*/__pycache__/*" ! -path "*/.vscode/*"

Delta Detection

Git-based

bash
# Files changed since last commit
git diff --name-only HEAD~1 HEAD -- '*.docx' '*.xlsx' '*.pptx' '*.pdf'

# Files changed since a specific tag
git diff --name-only <tag> HEAD -- '*.docx' '*.xlsx' '*.pptx' '*.pdf'

# Files changed in the last N days
git log --since="N days ago" --name-only --diff-filter=ACMR --pretty="" -- '*.docx' '*.xlsx' '*.pptx' '*.pdf' | sort -u

Timestamp-based (PowerShell)

powershell
# Files modified since a specific date
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf -Recurse |
  Where-Object { $_.LastWriteTime -gt [datetime]"2025-01-01" }

Files to Skip

Always exclude these patterns during scanning:

  • ~$* - Office lock/temp files (created when a document is open)
  • *.tmp - Temporary files
  • *.bak - Backup files
  • Files inside .git/, node_modules/, .vscode/, __pycache__/ directories

Scan Configuration Files

File Purpose
.a11y-office-config.json Rule enable/disable for Word, Excel, PowerPoint
.a11y-pdf-config.json Rule enable/disable for PDF scanning

Scan Profiles

Profile Rules Severities Use Case
Strict All Error, Warning, Tip Public-facing, legally required documents
Moderate All Error, Warning Most organizations
Minimal All Error only Triaging large document libraries

Context Passing Format

When delegating to a sub-agent, always provide this context block:

text
## Document Scan Context
- **File:** [full path]
- **Scan Profile:** [strict | moderate | minimal]
- **Severity Filter:** [error, warning, tip]
- **Disabled Rules:** [list or "none"]
- **User Notes:** [any specifics]
- **Part of Batch:** [yes/no - if yes, indicate X of Y]

Expand your agent's capabilities with these related and highly-rated skills.

Community-Access/accessibility-agents

i18n-accessibility

Internationalization and RTL accessibility specialist. Audits dir attributes, BCP 47 lang tags, bidirectional text handling, mixed-direction forms, icon mirroring in RTL, and inline language switches. Ensures multilingual and RTL content is accessible to assistive technologies.

217 22
Explore
Community-Access/accessibility-agents

testing-coach

Accessibility testing coach for web applications. Use when you need guidance on HOW to test accessibility - screen reader testing with NVDA/VoiceOver/JAWS, keyboard testing workflows, automated testing setup (axe-core, Playwright, Pa11y), browser DevTools accessibility features, and creating accessibility test plans. Does not write product code - teaches and guides testing practices.

217 22
Explore
Community-Access/accessibility-agents

pdf-scan-config

Internal helper agent. Invoked by orchestrator agents via Task tool. PDF accessibility scan configuration manager. Use to create, edit, validate, or explain .a11y-pdf-config.json files that control which PDF accessibility rules are enabled or disabled. Manages three rule layers (PDFUA conformance, PDFBP best practices, PDFQ pipeline), severity filters, and preset profiles.

217 22
Explore
Community-Access/accessibility-agents

aria-specialist

ARIA implementation specialist for web applications. Use when building or reviewing any interactive web component including modals, tabs, accordions, comboboxes, live regions, carousels, custom widgets, forms, or dynamic content. Also use when reviewing ARIA usage for correctness. Applies to any web framework or vanilla HTML/CSS/JS.

217 22
Explore
Community-Access/accessibility-agents

Desktop A11y Testing Coach

Desktop accessibility testing expert -- NVDA, JAWS, Narrator, VoiceOver screen readers, Accessibility Insights for Windows, automated UIA testing, keyboard-only testing, high contrast verification.

217 22
Explore
Community-Access/accessibility-agents

lighthouse-bridge

Internal helper agent. Invoked by orchestrator agents via Task tool. Internal helper that bridges Lighthouse CI accessibility audit data with the agent ecosystem. Parses Lighthouse reports, normalizes accessibility findings, tracks score regressions, and deduplicates against local scans.

217 22
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results