Agent skill

dev-review

This skill should be used as REQUIRED Phase 6 of /dev workflow when the implementation is complete and needs code review. Combines spec compliance and code quality checks with confidence-based filtering.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/dev-review

SKILL.md

Contents

  • Prerequisites - Test Output Gate
  • The Iron Law of Review
  • Red Flags - STOP Immediately If You Think
  • Review Focus Areas
  • Confidence Scoring
  • Required Output Structure
  • Agent Invocation
  • Quality Standards

Code Review

Single-pass code review combining spec compliance and quality checks. Uses confidence-based filtering to report only high-priority issues.

Do NOT start review without test evidence.

Before reviewing, verify these preconditions:

  1. .claude/LEARNINGS.md contains actual test output
  2. Tests were run (not just written)
  3. Test output shows PASS (not SKIP, not assumed)

What Counts as Test Evidence

Valid Evidence NOT Valid
meson test output with results "Tests should pass"
pytest output showing PASS "I wrote tests"
Screenshot of working UI "It looks correct"
Playwright snapshot showing expected state "User can verify"
D-Bus command output "The feature works"
E2E test output with user flow verified "Unit tests pass" (for UI changes)

FOR USER-FACING CHANGES: Unit test evidence is INSUFFICIENT.

Before approving user-facing changes, verify:

  1. Unit tests pass (necessary but not sufficient)
  2. E2E tests pass (required for approval)
  3. Visual evidence exists (screenshots/snapshots for UI)
Change Type Unit Evidence E2E Evidence Approval?
Internal refactor N/A ✅ APPROVE
API change ❌ Missing ❌ BLOCKED
UI change ❌ Missing ❌ BLOCKED
User workflow ❌ Missing ❌ BLOCKED

Return BLOCKED if E2E evidence is missing for user-facing changes.

"Unit tests pass" without E2E for UI changes is NOT approvable. </EXTREMELY-IMPORTANT>

Gate Check

Check LEARNINGS.md for test output:

bash
rg -E "(PASS|OK|SUCCESS|\d+ passed)" .claude/LEARNINGS.md

If no test output is found, STOP and return to /dev-implement.

"It should work" is NOT evidence. Test output IS evidence. </EXTREMELY-IMPORTANT>

You MUST report only issues with >= 80% confidence. This is not negotiable.

Before reporting ANY issue, complete these verification steps:

  1. Verify it's not a false positive
  2. Verify it's not a pre-existing issue
  3. Assign a confidence score
  4. Report only if score >= 80

You MUST apply this rule even when encountering:

  • "This looks suspicious"
  • "I think this might be wrong"
  • "The style seems inconsistent"
  • "I would have done it differently"

You MUST discard any low-confidence issue found during review. </EXTREMELY-IMPORTANT>

Red Flags - STOP Immediately If You Think:

Thought Why It's Wrong Do Instead
"Tests probably pass" You don't have evidence - absence of evidence is not evidence Check LEARNINGS.md for actual output
"This looks wrong" Your vague suspicion ≠ evidence Find concrete proof or discard
"I would do it differently" Your style preference ≠ bug Check if it violates project guidelines
"This might cause problems" Your "might" = < 80% confidence Find proof or discard
"Pre-existing but should be fixed" You're out of scope Score it 0 and discard
"User can test it" Your manual testing is less reliable than automation Return to implement phase

Review Focus Areas

Test Evidence (Check First!)

  • LEARNINGS.md contains actual test command output
  • Tests show PASS/OK (not SKIP, FAIL, or missing)
  • UI changes have screenshot/snapshot evidence
  • All test types run (unit, integration, UI as applicable)
  • E2E tests exist and pass for user-facing changes
  • E2E test simulates actual user flow, not just component render

Spec Compliance

  • All requirements from .claude/SPEC.md are implemented
  • Acceptance criteria are met
  • No requirements were skipped or partially implemented
  • Edge cases mentioned in spec are handled

Code Quality

  • Code is simple and DRY (no unnecessary duplication)
  • Logic is correct (no bugs, handles edge cases)
  • Codebase conventions followed (naming, patterns, structure)
  • Error handling is complete
  • No security vulnerabilities detected

Confidence Scoring

Rate each potential issue from 0-100:

Score Meaning
0 False positive or pre-existing issue
25 Might be real, might not. Stylistic without guideline backing
50 Real issue but nitpick or rare in practice
75 Verified real issue, impacts functionality
100 Absolutely certain, confirmed with direct evidence

CRITICAL: Only report issues with confidence >= 80.

Required Output Structure

markdown
## Code Review: [Feature/Change Name]
Reviewing: [files/scope being reviewed]

### Test Evidence Verified
- Unit tests: [PASS/FAIL/MISSING] - [paste key output line]
- Integration: [PASS/FAIL/N/A]
- UI/Visual: [Screenshot taken / Snapshot verified / N/A]

### Critical Issues (Confidence >= 90)

#### [Issue Title] (Confidence: XX)

**Location:** `file/path.ext:line_number`

**Problem:** Clear description of the issue

**Fix:**
```[language]
// Specific code fix

Important Issues (Confidence 80-89)

[Same format as Critical Issues]

Summary

Verdict: APPROVED | CHANGES REQUIRED | BLOCKED (no test evidence)

[If APPROVED] The reviewed code meets project standards. Tests pass. No issues with confidence >= 80 detected.

[If CHANGES REQUIRED] X critical issues and Y important issues must be addressed before proceeding.

[If BLOCKED] Cannot approve without test evidence. Return to /dev-implement and run tests.


## Agent Invocation

Spawn Task agent for review execution:

Task(subagent_type="general-purpose"): "Review implementation against .claude/SPEC.md.

FIRST: Check .claude/LEARNINGS.md for test output. Return BLOCKED immediately if no test output is found.

Complete single-pass review covering:

  1. Test evidence - tests actually run and pass?
  2. Spec compliance - all requirements met?
  3. Code quality - simple, correct, follows conventions?

Confidence score each issue (0-100). Report only issues with >= 80 confidence. Return structured output per /dev-review format."


## Honesty Requirement

<EXTREMELY-IMPORTANT>
**You approving without test evidence is LYING.**

An "APPROVED" verdict means YOU assert:
- Tests actually ran (not "should work")
- Test output shows PASS (not SKIP, not assumed)
- Evidence exists and YOU verified it (not trusted reports)

You approving without test evidence is not "efficiency" - it is LYING about code quality.

**BLOCKED is honest. Your fake APPROVED is fraud.**
</EXTREMELY-IMPORTANT>

## Rationalization Prevention

STOP - you're about to rationalize if these thoughts arise—they indicate dishonest approval:

| Thought | Reality |
|---------|---------|
| "Tests probably pass" | Your probably ≠ evidence. Check LEARNINGS.md. |
| "I saw the code, it looks right" | Your looking ≠ running. Find test output. |
| "User is waiting for approval" | They want honest approval. You return BLOCKED if needed. |
| "It's a small change" | Your size estimate doesn't matter. Small changes break things. Require evidence. |
| "I trust the implementer" | Your trust doesn't replace verification. You verify evidence. |
| "I'll approve and they can fix later" | You block now or bugs ship to users. |
| "Review is just a formality" | Review is the LAST GATE before bugs ship. You execute seriously. |

## Quality Standards

- **Test evidence is mandatory** - do not approve without test output
- Do not report style preferences lacking project guideline backing
- Do not report pre-existing issues (confidence = 0)
- Make each reported issue immediately actionable
- Use absolute file paths with line numbers in reports
- Treat uncertainty as below 80 confidence

## Phase Complete

After review completes:

**If APPROVED:** Immediately invoke the dev-verify skill:

Read("${CLAUDE_PLUGIN_ROOT}/lib/skills/dev-verify/SKILL.md")


**If CHANGES REQUIRED:** Return to `/dev-implement` to fix reported issues.

**If BLOCKED:** Return to `/dev-implement` to collect test evidence.

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results