Agent skill
verify
Rigorous self-assessment checklist before marking any task as complete. Use when about to claim task completion, before final commit, when user asks "is it done?", or when transitioning from implementation to reporting. Prevents premature completion claims by requiring evidence for every assertion.
Install this agent skill to your Project
npx add-skill https://github.com/Jamie-BitFlight/claude_skills/tree/main/.claude/skills/verify
SKILL.md
Verification Protocol
Workflow Reference: See Master Workflow for how this skill fits into the verification stage of the agentic workflow.
STOP. You are NOT done yet. Generate this checklist and provide EVIDENCE for every item.
1. Task Type & Strategy
- Type: FIX / FEATURE / REFACTOR / DOCS / INVESTIGATION
- Strategy: Executable verification vs. Static verification?
2. The "WORKS" Check
flowchart TD
Start(["Begin WORKS Check -- Section 2"]) --> Q{"Task type?"}
Q -->|"Executable code -- compiled, scripted, or CLI-run"| A1["Execution check<br>Terminal output showing successful run<br>(exit code 0 is NOT enough)"]
Q -->|"Static asset -- docs, configs, analysis"| B1["Accuracy check<br>Verified against source code or schema?"]
A1 --> A2["Real data check<br>Ran changed code path against real data<br>not just read the diff?"]
A2 --> A3["Regression check<br>Evidence that existing tests still pass?"]
A3 --> A4["Edge case check<br>Evidence of testing failure scenarios?"]
A4 --> AEvidence["Record code evidence<br>execution output, real data test,<br>test results, edge case result"]
B1 --> B2["Clarity check<br>Follows the established format?"]
B2 --> B3["Validity check<br>Links and references resolve?"]
B3 --> BEvidence["Record static evidence<br>accuracy check method,<br>format standard, link validation method"]
AEvidence --> Done(["WORKS Check complete -- proceed to Section 3"])
BEvidence --> Done
A. For Code (Executable)
- Execution: Terminal output showing successful run? (Exit code 0 is NOT enough)
- Real data: Ran the changed code path against real data, not just read the diff?
- Regression: Evidence that existing tests still pass?
- Edge Cases: Evidence of testing failure scenarios?
EVIDENCE:
- Execution output: [paste actual output]
- Real data test: [command run, input used, output observed]
- Test results: [paste test output]
- Edge case tested: [describe scenario and result]
B. For Static Assets (Docs, Configs, Analysis)
- Accuracy: Verified against source code/schema?
- Clarity: Does it follow the established format?
- Validity: Do links/references resolve?
EVIDENCE:
- Accuracy check: [how verified]
- Format compliance: [standard followed]
- Links validated: [method used]
3. The "FIXED" Check
For bug fixes specifically:
- Reproduction: Did I observe the pre-fix state?
- Resolution: Does the original problem NO LONGER occur?
EVIDENCE:
- Pre-fix behavior: [what was observed]
- Post-fix behavior: [what is now observed]
- Regression test added: [yes/no, location]
4. Quality Gates
- Pre-commit hooks passed?
- Linting passed? (Necessary, but not sufficient)
- Type checking passed? (if applicable)
EVIDENCE:
- Pre-commit: [output or "not configured"]
- Linting: [tool and result]
- Type check: [tool and result]
5. Proportional Response Check
If the task has an issue-classification field in its metadata, verify the response matched the issue type. If no issue-classification is present, mark N/A and proceed.
flowchart TD
Start(["Begin Proportional Response Check"]) --> Q1{"issue-classification<br>present in task metadata?"}
Q1 -->|"absent"| Skip["SKIP -- existing WORKS/FIXED/Quality Gates apply"]
Q1 -->|"present"| Q2{"Classification type?"}
Q2 -->|"procedural"| P["Sweep completeness<br>Codebase search returns zero<br>remaining instances of the pattern"]
Q2 -->|"defect"| D["Root cause addressed<br>Fix targets root cause from evidence chain<br>+ scenario in scenario-target succeeds"]
Q2 -->|"recurring-pattern"| R["Guardrail added<br>New gate/check exists AND<br>covers the defect CLASS not just instance"]
Q2 -->|"missing-guardrail"| M["Gate gap filled<br>Guardrail triggers in the<br>exposing scenario"]
Q2 -->|"unbounded-design"| U["Design implemented<br>Matches chosen direction +<br>trade-offs documented"]
P --> Evidence
D --> Evidence
R --> Evidence
M --> Evidence
U --> Evidence
Skip --> Done(["Proportional Check complete"])
Evidence["Record proportional evidence"] --> Done
EVIDENCE:
- Issue Classification: [type or "not classified"]
- Scenario Target: [scenario -> improvement, or "not specified"]
- Proportional Check: [PASS/FAIL/N/A]
- Check detail: [what was verified and result]
6. Agent Delegation Verification
When work was delegated to a sub-agent, the agent's success report is NOT evidence.
- VCS diff reviewed:
git diffshows the expected changes? - Changes verified: Read the modified files — content matches intent?
- Tests run independently: Ran the verification command yourself, not trusting the agent's claim?
EVIDENCE:
- Agent report: [what agent claimed]
- VCS diff: [files changed, scope matches expectation]
- Independent verification: [command run, output observed]
If no agents were used, mark N/A and proceed.
7. Honesty Check
- Did I verify the full scope?
- Am I distinguishing between "should work" and "verified to work"?
- Destination check: Did I read the target state after writing? (Tool output claiming success is not evidence — the state of the destination is.)
- Can I answer YES to: "I have VALIDATED this output in its intended context"?
Rationalization Prevention
If any of these thoughts occur, STOP and run the verification command:
| Rationalization | Response |
|---|---|
| "Should work now" | Run the verification command |
| "I'm confident" | Confidence is not evidence |
| "Just this once" | No exceptions |
| "Linter passed so build passes" | Linter does not check compilation |
| "Agent said success" | Verify independently (Section 6) |
| "I'm tired" | Exhaustion is not an excuse |
| "Partial check is enough" | Partial check proves nothing about the whole |
| "Different words so rule doesn't apply" | Spirit over letter |
Red flags in your own output — if you catch yourself writing any of these, the gate has not been passed:
- "should", "seems to", "looks correct"
- Expressions of satisfaction before verification ("Done!", "Perfect!")
- About to commit/push/PR without fresh command output in this message
The Golden Rule
If you cannot demonstrate it working in practice with evidence, it is NOT done.
| Claim | Required Evidence |
|---|---|
| "Code works" | Terminal output showing execution against real data |
| "Tests pass" | Actual test output, not assumption |
| "Bug fixed" | Before/after comparison |
| "Data synced" | Read the destination after writing — not the tool output |
| "Docs accurate" | Cross-reference with source |
| "Config valid" | Validation command output |
| "Root cause fixed" | Evidence chain from grooming + fix addresses root cause claim |
| "Guardrail added" | New gate/check exists and triggers in exposing scenario |
| "Agent completed" | VCS diff reviewed + independent verification command run |
Quick Reference
VERIFICATION SUMMARY:
Task Type: [FIX/FEATURE/REFACTOR/DOCS/INVESTIGATION]
Works Check: [PASS/FAIL] - Evidence: ___
Fixed Check: [PASS/FAIL/N/A] - Evidence: ___
Proportional Check: [PASS/FAIL/N/A] - Evidence: ___
Quality Gates: [PASS/FAIL] - Evidence: ___
Agent Delegation: [PASS/FAIL/N/A] - Evidence: ___
Honesty Check: [PASS/FAIL]
VERDICT: [COMPLETE / NOT COMPLETE - reason]
Didn't find tool you were looking for?