Agent skill
sanitize-text
Normalize raw text by removing excessive whitespace, non-printable characters, and standardizing unicode. Use this to clean up text extracted from PDFs or DOCX files before processing with LLMs.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/sanitize-text
SKILL.md
Sanitize Text
Overview
This skill cleans and normalizes raw text. It is essential for preprocessing text extracted from documents like PDFs, which often contain encoding artifacts, excessive whitespace, or weird control characters.
Usage
Sanitize Script
Syntax:
python3 .agent/skills/sanitize-text/scripts/sanitize.py <input_file> [--output <output_file>]
Arguments:
input_file: Path to the file containing raw text.--output: (Optional) Path to write cleaned text to. If omitted, prints to stdout.
Example:
python3 .agent/skills/sanitize-text/scripts/sanitize.py raw_resume.txt --output clean_resume.txt
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?