Agent skill
skill-tester
This skill should be used whenever the user wants to test a skill's behavior, analyze how it uses the Claude API, inspect inputs/outputs from scripts, or run security and code review audits against skill scripts. Even for casual phrases like "test my skill", "analyze this skill", "audit skill scripts", "review skill for security issues", "what does this skill actually do when it runs", "inspect API calls from skill", "run a skill through its paces", "check my skill for bugs or vulnerabilities". Also trigger when the user shows you a SKILL.md and asks you to evaluate, critique, or stress-test it.
Install this agent skill to your Project
npx add-skill https://github.com/ddunnock/claude-plugins/tree/main/skills/skill-tester
SKILL.md
Skill Tester & Analyzer
A meta-skill for deeply testing and auditing other Claude skills. It instruments test runs to capture raw API call traces, records all script stdin/stdout/stderr with timing, and runs deterministic security scans followed by dedicated security and code review subagents against any scripts embedded in the skill.
Session Directory Layout
<report_root>/<skill_name>_<YYYYMMDD_HHMMSS>/
├── manifest.json # Validation results and session metadata (created by setup_test_env.py)
├── sandbox/ # Isolated workspace for script execution
├── inventory.json # Skill structure scan
├── scan_results.json # Deterministic security findings (B9 — runs first)
├── prompt_lint.json # Deterministic prompt quality findings (B11 — runs first)
├── prompt_review.json # AI prompt quality analysis (receives prompt_lint as input)
├── api_log.jsonl # All Claude API calls (one JSON object per line)
├── script_runs.jsonl # All script executions with I/O
├── security_report.json # AI security analysis (receives scan_results as input)
├── code_review.json # Code quality review
├── session_report.html # Claude Code session trace (API calls, tool use, conversation)
└── report.html # Unified interactive HTML report
Modes
| Mode | Description | Phases Run | Command |
|---|---|---|---|
| Full (default) | Complete analysis: scan → prompt-lint → test → security → review → report | All (2-9) | /st:run |
| Audit | Static analysis only, no test execution | 2-4, 6-7, 9 | /st:audit |
| Trace | Runtime capture only, no security/code review | 2, 5, 8, 9 | /st:trace |
| Report | Re-generate HTML from existing session data | 9 only | /st:report |
Commands
| Command | Mode | Phases | Purpose |
|---|---|---|---|
/st:init |
All | 1 | Set up session: target, mode, prompts, report location |
/st:run |
Full | 2-9 | Execute all analysis phases |
/st:audit |
Audit | 2-4, 6-7, 9 | Static analysis only |
/st:trace |
Trace | 2, 5, 8, 9 | Runtime capture only |
/st:report |
Report | 9 | Regenerate HTML from session data |
/st:status |
N/A | — | Show session state |
/st:resume |
Any | Variable | Resume interrupted session |
Interpreting Results
Security Severity Levels
| Level | Meaning | Action |
|---|---|---|
CRITICAL |
Active exploit risk (e.g., shell injection, RCE, hardcoded production key) | Block — do not use skill; fix immediately |
HIGH |
Likely data exposure or privilege escalation | Fix before production |
MEDIUM |
Defense-in-depth gap; not immediately exploitable | Fix in next iteration |
LOW |
Style/practice issue with minor security implications | Note in report |
INFO |
Observation, no risk | Informational only |
Code Quality Score (0–10)
| Range | Interpretation |
|---|---|
| 9–10 | Production-ready |
| 7–8 | Minor improvements needed |
| 5–6 | Significant gaps — refactoring advised |
| < 5 | Major issues — rework required |
Didn't find tool you were looking for?