Skill Tester & Analyzer

A meta-skill for deeply testing and auditing other Claude skills. It instruments test runs to capture raw API call traces, records all script stdin/stdout/stderr with timing, and runs deterministic security scans followed by dedicated security and code review subagents against any scripts embedded in the skill.

Session Directory Layout

<report_root>/<skill_name>_<YYYYMMDD_HHMMSS>/
├── manifest.json          # Validation results and session metadata (created by setup_test_env.py)
├── sandbox/               # Isolated workspace for script execution
├── inventory.json         # Skill structure scan
├── scan_results.json      # Deterministic security findings (B9 — runs first)
├── prompt_lint.json       # Deterministic prompt quality findings (B11 — runs first)
├── prompt_review.json     # AI prompt quality analysis (receives prompt_lint as input)
├── api_log.jsonl          # All Claude API calls (one JSON object per line)
├── script_runs.jsonl      # All script executions with I/O
├── security_report.json   # AI security analysis (receives scan_results as input)
├── code_review.json       # Code quality review
├── session_report.html    # Claude Code session trace (API calls, tool use, conversation)
└── report.html            # Unified interactive HTML report

Modes

Mode	Description	Phases Run	Command
Full (default)	Complete analysis: scan → prompt-lint → test → security → review → report	All (2-9)	`/st:run`
Audit	Static analysis only, no test execution	2-4, 6-7, 9	`/st:audit`
Trace	Runtime capture only, no security/code review	2, 5, 8, 9	`/st:trace`
Report	Re-generate HTML from existing session data	9 only	`/st:report`

Commands

Command	Mode	Phases	Purpose
`/st:init`	All	1	Set up session: target, mode, prompts, report location
`/st:run`	Full	2-9	Execute all analysis phases
`/st:audit`	Audit	2-4, 6-7, 9	Static analysis only
`/st:trace`	Trace	2, 5, 8, 9	Runtime capture only
`/st:report`	Report	9	Regenerate HTML from session data
`/st:status`	N/A	—	Show session state
`/st:resume`	Any	Variable	Resume interrupted session

Interpreting Results

Security Severity Levels

Level	Meaning	Action
`CRITICAL`	Active exploit risk (e.g., shell injection, RCE, hardcoded production key)	Block — do not use skill; fix immediately
`HIGH`	Likely data exposure or privilege escalation	Fix before production
`MEDIUM`	Defense-in-depth gap; not immediately exploitable	Fix in next iteration
`LOW`	Style/practice issue with minor security implications	Note in report
`INFO`	Observation, no risk	Informational only

Code Quality Score (0–10)

Range	Interpretation
9–10	Production-ready
7–8	Minor improvements needed
5–6	Significant gaps — refactoring advised
< 5	Major issues — rework required

Search AI Tools

skill-tester

Install this agent skill to your Project

SKILL.md