Agent skill

skill-tester

This skill should be used whenever the user wants to test a skill's behavior, analyze how it uses the Claude API, inspect inputs/outputs from scripts, or run security and code review audits against skill scripts. Even for casual phrases like "test my skill", "analyze this skill", "audit skill scripts", "review skill for security issues", "what does this skill actually do when it runs", "inspect API calls from skill", "run a skill through its paces", "check my skill for bugs or vulnerabilities". Also trigger when the user shows you a SKILL.md and asks you to evaluate, critique, or stress-test it.

Stars 6
Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/ddunnock/claude-plugins/tree/main/skills/skill-tester

SKILL.md

Skill Tester & Analyzer

A meta-skill for deeply testing and auditing other Claude skills. It instruments test runs to capture raw API call traces, records all script stdin/stdout/stderr with timing, and runs deterministic security scans followed by dedicated security and code review subagents against any scripts embedded in the skill.


Session Directory Layout

<report_root>/<skill_name>_<YYYYMMDD_HHMMSS>/
├── manifest.json          # Validation results and session metadata (created by setup_test_env.py)
├── sandbox/               # Isolated workspace for script execution
├── inventory.json         # Skill structure scan
├── scan_results.json      # Deterministic security findings (B9 — runs first)
├── prompt_lint.json       # Deterministic prompt quality findings (B11 — runs first)
├── prompt_review.json     # AI prompt quality analysis (receives prompt_lint as input)
├── api_log.jsonl          # All Claude API calls (one JSON object per line)
├── script_runs.jsonl      # All script executions with I/O
├── security_report.json   # AI security analysis (receives scan_results as input)
├── code_review.json       # Code quality review
├── session_report.html    # Claude Code session trace (API calls, tool use, conversation)
└── report.html            # Unified interactive HTML report

Modes

Mode Description Phases Run Command
Full (default) Complete analysis: scan → prompt-lint → test → security → review → report All (2-9) /st:run
Audit Static analysis only, no test execution 2-4, 6-7, 9 /st:audit
Trace Runtime capture only, no security/code review 2, 5, 8, 9 /st:trace
Report Re-generate HTML from existing session data 9 only /st:report

Commands

Command Mode Phases Purpose
/st:init All 1 Set up session: target, mode, prompts, report location
/st:run Full 2-9 Execute all analysis phases
/st:audit Audit 2-4, 6-7, 9 Static analysis only
/st:trace Trace 2, 5, 8, 9 Runtime capture only
/st:report Report 9 Regenerate HTML from session data
/st:status N/A Show session state
/st:resume Any Variable Resume interrupted session




Interpreting Results

Security Severity Levels

Level Meaning Action
CRITICAL Active exploit risk (e.g., shell injection, RCE, hardcoded production key) Block — do not use skill; fix immediately
HIGH Likely data exposure or privilege escalation Fix before production
MEDIUM Defense-in-depth gap; not immediately exploitable Fix in next iteration
LOW Style/practice issue with minor security implications Note in report
INFO Observation, no risk Informational only

Code Quality Score (0–10)

Range Interpretation
9–10 Production-ready
7–8 Minor improvements needed
5–6 Significant gaps — refactoring advised
< 5 Major issues — rework required

Didn't find tool you were looking for?

Be as detailed as possible for better results