Agent skill
spec-tests
Intent-based specification tests evaluated by LLM-as-judge. Use when the user asks to "create spec tests", "write intent tests", "TDD with intent", "natural language tests", or wants tests that capture WHY, not just WHAT. NOT pytest/jest/unittest - natural language specs Claude evaluates.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/spec-tests
SKILL.md
Spec Tests: Intent-Based Testing for LLM Development
Spec tests are intent-based specifications that Claude evaluates as judge. They capture WHY something matters—making them cheat-proof for LLM-driven development.
The TDD Flow
1. Plan → Define what you're building
2. Spec (red) → Write intent tests (they fail - no implementation yet)
3. Implement → Build the feature
4. Spec (green) → Tests pass (Claude confirms intent is satisfied)
Test File Format
# Feature Name
## Test Group
### Test Case Name
Intent statement explaining WHY this test matters. What user need does it serve?
What breaks if this doesn't work?
\`\`\`
Given [precondition]
When [action]
Then [expected outcome]
\`\`\`
Structure: H2 = test group, H3 = test case, intent = required statement, code block = expected behavior.
Critical: Intent statement must appear immediately above the code block, between the H3 header and the assertion block. Section-level intent does not count—each test case needs its own WHY directly before its code block.
Each test must include a fenced code block. Missing code blocks fail with [missing-assertion].
Test Location & Targets
Spec tests live in specs/tests/ and declare their target(s) via frontmatter.
Single target:
---
target: src/auth.py
---
# Authentication Tests
Multiple targets:
---
target:
- src/auth.py
- src/session.py
---
# Authentication Flow
Directory structure — name files by feature/spec, not by target path:
specs/tests/
authentication.md ← target: [src/auth.py, src/session.py]
intent-requirement.md ← target: [SKILL.md]
api-validation.md ← target: [src/api/validate.py]
Frontmatter is required. Missing target: causes immediate failure with [missing-target].
Running Tests
Copy the runner files to your project:
cp "${CLAUDE_PLUGIN_ROOT}/scripts/run_tests_claude.py" specs/tests/
cp "${CLAUDE_PLUGIN_ROOT}/scripts/judge_prompt.md" specs/tests/
Run tests:
python specs/tests/run_tests_claude.py specs/tests/authentication.md # Single spec
python specs/tests/run_tests_claude.py specs/tests/ # All specs
python specs/tests/run_tests_claude.py specs/tests/auth.md --test "Valid Credentials" # Single test
Uses claude -p (your subscription, no API key needed).
Options:
| Flag | Purpose |
|---|---|
--target FILE |
Override frontmatter target |
--model MODEL |
Claude model (default: sonnet) |
--test "Name" |
Run only named test |
Timeout: 60-300 seconds per test.
Why Intent Matters
LLMs can "game" tests by changing them instead of fixing code.
Without intent (fails with `[missing-intent]}):
### Completes Quickly
\`\`\`
elapsed < 50ms
\`\`\`
LLM thinks: "50 seems arbitrary, change to 100." User gets laggy editor.
With intent:
### Completes Quickly
Users perceive delays over 50ms as laggy. This runs on every keystroke.
The 50ms target is a UX requirement, not negotiable.
\`\`\`
Given a keystroke event
When process_keystroke() is called
Then it completes in under 50ms
\`\`\`
Claude-as-judge evaluates: Does it satisfy the UX requirement? Relaxing threshold → [intent-violated].
Intent properties:
- Required — Missing intent →
[missing-intent]before evaluation - Per-test — Each test needs its own WHY above the code block
- Business-focused — Why users/product care, not technical details
- Evaluative — Catches "legal but wrong" solutions
Reference Files
For detailed patterns, consult:
references/evaluation.md— Error codes, response format, strictness rules, alternative runners, template variablesreferences/multi-target.md— Writing tests for multiple targets, multi-file Given syntaxreferences/examples.md— Complete examples, porting tests across languagesreferences/meta-content.md— Testing prompt files and directive-like content
Checklist
- Each test has intent statement explaining WHY
- Intent is business/user focused
- Expected behavior is clear
- Each test includes a fenced assertion code block
- One behavior per test case
- Multi-target specs: each test starts with
Given the <target> file
Missing intent = immediate failure. The runner rejects tests without intent statements before evaluating behavior.
Didn't find tool you were looking for?