Agent skill

document-conversion

Stars 96
Forks 17

Install this agent skill to your Project

npx add-skill https://github.com/Harryoung/efka/tree/main/skills/document-conversion

SKILL.md

Document Format Conversion

Convert various document formats to Markdown for knowledge base onboarding.

Supported Formats

Format Processing Method
DOCX Pandoc conversion, preserve formatting and images
DOC LibreOffice → DOCX → Pandoc
PDF Electronic PyMuPDF4LLM fast conversion
PDF Scanned PaddleOCR-VL online OCR
PPTX pptx2md professional conversion
PPT LibreOffice → PPTX → pptx2md

Usage

bash
python .claude/skills/document-conversion/scripts/smart_convert.py \
    <temp_path> \
    --original-name "<original_filename>" \
    --json-output

Parameters:

  • <temp_path>: Temporary file path (e.g. /tmp/kb_upload_xxx.pptx)
  • --original-name: Must pass original filename, used to generate correct image directory name
  • --json-output: Output JSON format result

Output Format

json
{
  "success": true,
  "markdown_file": "/path/to/output.md",
  "images_dir": "original_filename_images",
  "image_count": 5,
  "input_file": "/path/to/input.pptx"
}

Processing Flow

  1. Execute conversion command (must use --original-name and --json-output)
  2. Parse JSON output, check success field
  3. If success: false, report error and end
  4. If success: true, record generated file path and image directory

Important Notes

  • Image directory uses original filename naming (e.g. 培训资料_images/)
  • Not passing --original-name will cause incorrect image reference paths
  • PDF type is automatically detected, scanned version processing is slower (tens of seconds to minutes)

Format Details

Detailed processing instructions for each format, see FORMATS.md

Didn't find tool you were looking for?

Be as detailed as possible for better results