Agent skill
evidence-verification
This skill teaches agents how to collect and verify evidence before marking tasks complete. Inspired by production-grade development practices, it ensures all claims are backed by executable proof:...
Install this agent skill to your Project
npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/ariegoldkin/evidence-verification
SKILL.md
Evidence-Based Verification Skill
Version: 1.0.0 Type: Quality Assurance Auto-activate: Code review, task completion, production deployment
Overview
This skill teaches agents how to collect and verify evidence before marking tasks complete. Inspired by production-grade development practices, it ensures all claims are backed by executable proof: test results, coverage metrics, build success, and deployment verification.
Key Principle: Show, don't tell. No task is complete without verifiable evidence.
When to Use This Skill
Auto-Activate Triggers
- Completing code implementation
- Finishing code review
- Marking tasks complete in Squad mode
- Before agent handoff
- Production deployment verification
Manual Activation
- When user requests "verify this works"
- Before creating pull requests
- During quality assurance reviews
- When troubleshooting failures
Core Concepts
1. Evidence Types
Test Evidence
- Exit code (must be 0 for success)
- Test suite results (passed/failed/skipped)
- Coverage percentage (if available)
- Test duration
Build Evidence
- Build exit code (0 = success)
- Compilation errors/warnings
- Build artifacts created
- Build duration
Deployment Evidence
- Deployment status (success/failed)
- Environment deployed to
- Health check results
- Rollback capability verified
Code Quality Evidence
- Linter results (errors/warnings)
- Type checker results
- Security scan results
- Accessibility audit results
2. Evidence Collection Protocol
## Evidence Collection Steps
1. **Identify Verification Points**
- What needs to be proven?
- What could go wrong?
- What does "complete" mean?
2. **Execute Verification**
- Run tests
- Run build
- Run linters
- Check deployments
3. **Capture Results**
- Record exit codes
- Save output snippets
- Note timestamps
- Document environment
4. **Store Evidence**
- Add to shared context
- Reference in task completion
- Link to artifacts
3. Verification Standards
Minimum Evidence Requirements:
- ✅ At least ONE verification type executed
- ✅ Exit code captured (0 = pass, non-zero = fail)
- ✅ Timestamp recorded
- ✅ Evidence stored in context
Production-Grade Requirements:
- ✅ Tests run with exit code 0
- ✅ Coverage >70% (or project standard)
- ✅ Build succeeds with exit code 0
- ✅ No critical linter errors
- ✅ Security scan passes
Evidence Collection Templates
Template 1: Test Evidence
Use this template when running tests:
## Test Evidence
**Command:** `npm test` (or equivalent)
**Exit Code:** 0 ✅ / non-zero ❌
**Duration:** X seconds
**Results:**
- Tests passed: X
- Tests failed: X
- Tests skipped: X
- Coverage: X%
**Output Snippet:**
[First 10 lines of test output]
**Timestamp:** YYYY-MM-DD HH:MM:SS
**Environment:** Node vX.X.X, OS, etc.
Template 2: Build Evidence
Use this template when building:
## Build Evidence
**Command:** `npm run build` (or equivalent)
**Exit Code:** 0 ✅ / non-zero ❌
**Duration:** X seconds
**Artifacts Created:**
- dist/bundle.js (XXX KB)
- dist/styles.css (XXX KB)
**Errors:** X
**Warnings:** X
**Output Snippet:**
[First 10 lines of build output]
**Timestamp:** YYYY-MM-DD HH:MM:SS
Template 3: Code Quality Evidence
Use this template for linting and type checking:
## Code Quality Evidence
**Linter:** ESLint / Ruff / etc.
**Command:** `npm run lint`
**Exit Code:** 0 ✅ / non-zero ❌
**Errors:** X
**Warnings:** X
**Type Checker:** TypeScript / mypy / etc.
**Command:** `npm run typecheck`
**Exit Code:** 0 ✅ / non-zero ❌
**Type Errors:** X
**Timestamp:** YYYY-MM-DD HH:MM:SS
Template 4: Combined Evidence Report
Use this comprehensive template for task completion:
## Task Completion Evidence
### Task: [Task description]
### Agent: [Agent name]
### Completed: YYYY-MM-DD HH:MM:SS
### Verification Results
| Check | Command | Exit Code | Result |
|-------|---------|-----------|--------|
| Tests | `npm test` | 0 | ✅ 45 passed, 0 failed |
| Build | `npm run build` | 0 | ✅ Bundle created (234 KB) |
| Linter | `npm run lint` | 0 | ✅ No errors, 2 warnings |
| Types | `npm run typecheck` | 0 | ✅ No type errors |
### Coverage
- Statements: 87%
- Branches: 82%
- Functions: 90%
- Lines: 86%
### Evidence Files
- Test output: `.claude/quality-gates/evidence/tests-2025-XX-XX.log`
- Build output: `.claude/quality-gates/evidence/build-2025-XX-XX.log`
### Conclusion
All verification checks passed. Task ready for review.
Step-by-Step Workflows
Workflow 1: Code Implementation Verification
When: After writing code for a feature or bug fix
Steps:
-
Save all files - Ensure changes are written
-
Run tests
bashnpm test # or: pytest, cargo test, go test, etc.- Capture exit code
- Note passed/failed counts
- Record coverage if available
-
Run build (if applicable)
bashnpm run build # or: cargo build, go build, etc.- Capture exit code
- Note any errors/warnings
- Verify artifacts created
-
Run linter
bashnpm run lint # or: ruff check, cargo clippy, golangci-lint run- Capture exit code
- Note errors/warnings
-
Run type checker (if applicable)
bashnpm run typecheck # or: mypy, tsc --noEmit- Capture exit code
- Note type errors
-
Document evidence
- Use Template 4 (Combined Evidence Report)
- Add to shared context under
quality_evidence - Reference in task completion message
-
Mark task complete (only if all evidence passes)
Workflow 2: Code Review Verification
When: Reviewing another agent's code or user's PR
Steps:
-
Read the code changes
-
Verify tests exist
- Are there tests for new functionality?
- Do tests cover edge cases?
- Are existing tests updated?
-
Run tests
- Execute test suite
- Verify exit code 0
- Check coverage didn't decrease
-
Check build
- Ensure project still builds
- No new build errors
-
Verify code quality
- Run linter
- Run type checker
- Check for security issues
-
Document review evidence
- Use Template 3 (Code Quality Evidence)
- Note any issues found
- Add to context
-
Approve or request changes
- Approve only if all evidence passes
- If issues found, document them with evidence
Workflow 3: Production Deployment Verification
When: Deploying to production or staging
Steps:
-
Pre-deployment checks
- All tests pass (exit code 0)
- Build succeeds
- No critical linter errors
- Security scan passes
-
Execute deployment
- Run deployment command
- Capture output
-
Post-deployment checks
- Health check endpoint responds
- Application starts successfully
- No immediate errors in logs
- Smoke tests pass
-
Document deployment evidence
markdown## Deployment Evidence **Environment:** production **Timestamp:** YYYY-MM-DD HH:MM:SS **Version:** vX.X.X **Pre-Deployment:** - Tests: ✅ Exit 0 - Build: ✅ Exit 0 - Security: ✅ No critical issues **Deployment:** - Command: `kubectl apply -f deployment.yaml` - Exit Code: 0 ✅ **Post-Deployment:** - Health Check: ✅ 200 OK - Smoke Tests: ✅ All passed - Error Rate: <0.1% -
Verify rollback capability
- Ensure previous version can be restored
- Document rollback procedure
Evidence Storage
Where to Store Evidence
Shared Context (Primary)
{
"quality_evidence": {
"tests_run": true,
"test_exit_code": 0,
"coverage_percent": 87,
"build_success": true,
"build_exit_code": 0,
"linter_errors": 0,
"linter_warnings": 2,
"timestamp": "2025-11-02T10:30:00Z"
}
}
Evidence Files (Secondary)
.claude/quality-gates/evidence/directory- One file per verification run
- Format:
{type}-{timestamp}.log - Example:
tests-2025-11-02-103000.log
Task Completion Messages
- Include evidence summary
- Link to detailed evidence files
- Example: "Task complete. Tests passed (exit 0, 87% coverage), build succeeded."
Quality Standards
Minimum Acceptable
✅ Tests executed with captured exit code ✅ Timestamp recorded ✅ Evidence stored in context
Production-Grade
✅ Tests pass (exit code 0) ✅ Coverage ≥70% (or project standard) ✅ Build succeeds (exit code 0) ✅ No critical linter errors ✅ Type checker passes ✅ Security scan shows no critical issues
Gold Standard
✅ All production-grade requirements ✅ Coverage ≥80% ✅ No linter warnings ✅ Performance benchmarks within thresholds ✅ Accessibility audit passes (WCAG 2.1 AA) ✅ Integration tests pass ✅ Deployment verification complete
Common Pitfalls
❌ Don't Skip Evidence Collection
Bad:
"I've implemented the login feature. It should work correctly."
Good:
"I've implemented the login feature. Evidence:
- Tests: Exit code 0, 12 tests passed, 0 failed
- Build: Exit code 0, no errors
- Coverage: 89%
Task complete with verification."
❌ Don't Fake Evidence
Bad:
"Tests passed" (without actually running them)
Good:
"Tests passed. Exit code: 0
Command: npm test
Output: Test Suites: 3 passed, 3 total
Timestamp: 2025-11-02 10:30:15"
❌ Don't Ignore Failed Evidence
Bad:
"Build failed with exit code 1, but the code looks correct so marking complete."
Good:
"Build failed with exit code 1. Errors:
- TypeError: Cannot read property 'id' of undefined (line 42)
Fixing the error now before marking complete."
❌ Don't Collect Evidence Only Once
Bad:
"Tests passed yesterday, so the code is still good."
Good:
"Re-running tests after today's changes.
New evidence: Exit code 0, 45 tests passed, coverage 87%"
Integration with Other Systems
Context System Integration
Evidence is automatically tracked in shared context:
// Context structure includes:
{
quality_evidence?: {
tests_run: boolean;
test_exit_code?: number;
coverage_percent?: number;
build_success?: boolean;
linter_errors?: number;
timestamp: string;
}
}
Quality Gates Integration
Evidence collection feeds into quality gates:
- Quality gates check if evidence exists
- Block task completion if evidence missing
- Escalate if evidence shows failures
Squad Mode Integration
In parallel execution:
- Each agent collects evidence independently
- Studio Coach validates evidence before sync
- Blocked tasks don't waste parallel cycles
Quick Reference
Evidence Collection Checklist
Before marking task complete:
- [ ] Tests executed
- [ ] Test exit code captured (0 = pass)
- [ ] Build executed (if applicable)
- [ ] Build exit code captured (0 = pass)
- [ ] Code quality checks run (linter, types)
- [ ] Evidence documented with timestamp
- [ ] Evidence added to shared context
- [ ] Evidence summary in completion message
Common Commands by Language/Framework
JavaScript/TypeScript:
npm test # Run tests
npm run build # Build project
npm run lint # Run ESLint
npm run typecheck # Run TypeScript compiler
Python:
pytest # Run tests
pytest --cov # Run tests with coverage
ruff check . # Run linter
mypy . # Run type checker
Rust:
cargo test # Run tests
cargo build # Build project
cargo clippy # Run linter
Go:
go test ./... # Run tests
go build # Build project
golangci-lint run # Run linter
Examples
See /skills/evidence-verification/examples/ for:
- Sample evidence reports
- Real-world verification scenarios
- Integration examples
Version History
v1.0.0 - Initial release
- Core evidence collection templates
- Verification workflows
- Quality standards
- Integration with context system
Remember: Evidence-first development prevents hallucinations, ensures production quality, and builds confidence. When in doubt, collect more evidence, not less.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
perigon-backend
Perigon ASP.NET Core + EF Core + Aspire conventions
perigon-agent
Pointers for Copilot/agents to apply Perigon conventions
perigon-angular
Angular 21+ standalone/Material/signal conventions for Perigon WebApp
fastapi-mastery
Comprehensive FastAPI development skill covering REST API creation, routing, request/response handling, validation, authentication, database integration, middleware, and deployment. Use when working with FastAPI projects, building APIs, implementing CRUD operations, setting up authentication/authorization, integrating databases (SQL/NoSQL), adding middleware, handling WebSockets, or deploying FastAPI applications. Triggered by requests involving .py files with FastAPI code, API endpoint creation, Pydantic models, or FastAPI-specific features.
context7-efficient
Token-efficient library documentation fetcher using Context7 MCP with 86.8% token savings through intelligent shell pipeline filtering. Fetches code examples, API references, and best practices for JavaScript, Python, Go, Rust, and other libraries. Use when users ask about library documentation, need code examples, want API usage patterns, are learning a new framework, need syntax reference, or troubleshooting with library-specific information. Triggers include questions like "Show me React hooks", "How do I use Prisma", "What's the Next.js routing syntax", or any request for library/framework documentation.
browser-use
Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.
Didn't find tool you were looking for?