Agent skills
maestro:verification

Agent skill

maestro:verification

Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always

View SKILL.md on GitHub Repository

Stars 26

Forks 4

Install this agent skill to your Project

npx add-skill https://github.com/ReinaMacCredy/maestro/tree/main/skills/built-in/maestro:verification

SKILL.md

Verification Before Completion

Overview

Claiming work is complete without verification is dishonesty, not efficiency.

Core principle: Evidence before claims, always.

Violating the letter of this rule is violating the spirit of this rule.

When to Use

Always before:

Claiming a task is complete
Committing code
Creating a pull request
Merging a worktree
Moving to the next task
Reporting status to your human partner
Marking a feature complete

Three scopes of verification:

Scope	When	What to verify
Task	After completing a single task	Build, tests, lint for changed code
Phase	After merging multiple tasks in a plan phase	Cross-module integration, no regressions
Feature	Before `feature-complete`	Full test suite, all acceptance criteria, end-to-end

Thinking "verification scope doesn't matter"? It does. A task-level check misses integration bugs. A feature-level check on every commit wastes time.

The Iron Law

NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE

If you haven't run the verification command in this message, you cannot claim it passes.

"Fresh" means:

Run after your latest change (not before it)
Full output examined (not truncated or skimmed)
Exit code checked (not assumed)

The Gate Function

BEFORE claiming any status or expressing satisfaction:

1. IDENTIFY: What commands prove this claim?
   - Build: compiler/bundler exit code
   - Tests: test runner output with pass/fail counts
   - Lint: linter output with error/warning counts
   - Types: type checker output
   - Integration: cross-module test suite
   - Acceptance: line-by-line requirement check

2. RUN: Execute EVERY required command (fresh, complete)
   - Do not rely on cached results
   - Do not run partial suites
   - Do not skip "slow" checks

3. READ: Full output for each command
   - Check exit code (0 = success, non-zero = failure)
   - Count failures, errors, warnings
   - Read error messages -- do not skim

4. CROSS-CHECK: Does output match the claim?
   - Zero failures AND zero errors = pass
   - Any failure or error = fail (even if "unrelated")
   - Warnings warrant investigation

5. REPORT: State claim WITH evidence
   - If PASS: "Build passes (exit 0), 47/47 tests pass, 0 lint errors"
   - If FAIL: "Build fails: 2 type errors in src/parser.ts:34,56"

Skip any step = lying, not verifying

Task Verification

After completing a single task, verify the task's own deliverables.

What to Run

bash

# 1. Build -- does it compile?
bun run build
# Check: exit 0, no errors

# 2. Tests -- do they pass?
bun test
# Check: all pass, 0 failures, 0 errors

# 3. Lint -- is the code clean?
bun run lint
# Check: 0 errors (warnings acceptable if pre-existing)

# 4. Type check (if separate from build)
bun run typecheck
# Check: 0 errors

How to Interpret Results

$ bun test PASS src/parser.test.ts (12 tests) PASS src/cli.test.ts (8 tests) Tests: 20 passed, 0 failed

Clear pass. State: "Build passes, 20/20 tests pass."
</Good>

<Bad>

$ bun run build Build completed in 1.2s

$ bun test PASS src/parser.test.ts (12 tests) FAIL src/cli.test.ts x handles empty input (expected "error" got undefined) Tests: 19 passed, 1 failed

One failure. Do NOT say "tests mostly pass" or "just one unrelated failure."
State: "1 test fails: cli.test.ts 'handles empty input'. Investigating."
</Bad>

### Task Verification Checklist

- [ ] Build exits 0
- [ ] All tests pass (exact count reported)
- [ ] Lint reports 0 errors
- [ ] Type check reports 0 errors
- [ ] Changed files reviewed in diff
- [ ] Task acceptance criteria checked line-by-line

## Phase Verification

After merging multiple tasks that form a plan phase, verify integration.

### What to Run

Everything from Task Verification, plus:

```bash
# 1. Full test suite (not just changed files)
bun test
# Check: ALL tests pass, including ones from other tasks

# 2. Integration tests specifically
bun test --grep "integration"
# Check: cross-module interactions work

# 3. Git status -- clean working tree
git status
# Check: no untracked files that should be committed
# Check: no uncommitted changes

# 4. Diff against main -- review all changes in the phase
git diff main...HEAD --stat
# Check: only expected files changed
# Check: no accidental inclusions (lockfiles, generated code, secrets)

Phase-Specific Checks

Phase type	Additional verification
API changes	Contract tests pass, no breaking changes to consumers
Schema changes	Migration runs clean, rollback tested
Dependency updates	Full build from clean state, no version conflicts
Refactoring	Behavior tests unchanged, coverage not decreased
New feature	Feature flag works in both states (on/off)

Phase Verification Checklist

All task-level verifications pass
Full test suite passes (not just per-task tests)
No untracked or uncommitted changes
Diff review: only expected files changed
Cross-module integration verified
No new warnings introduced
Phase acceptance criteria from plan checked line-by-line

Feature Verification

Before calling feature-complete, verify everything end-to-end.

What to Run

Everything from Phase Verification, plus:

bash

# 1. Clean build from scratch
rm -rf node_modules dist .cache
bun install
bun run build
# Check: builds from clean state

# 2. Full test suite
bun test
# Check: every test passes

# 3. End-to-end / smoke test
# Run the actual feature manually or via E2E tests
bun test:e2e
# Check: feature works as specified

# 4. Diff against main -- full feature review
git log main..HEAD --oneline
git diff main...HEAD --stat
# Check: all commits are intentional
# Check: no debug code, console.logs, TODO hacks

Feature Verification Checklist

Clean build from scratch succeeds
Full test suite passes
E2E / smoke tests pass
Every acceptance criterion verified with evidence
No debug code or temporary hacks in diff
No secrets, credentials, or .env files in diff
Documentation updated (if required by plan)
Breaking changes documented (if any)
Rollback strategy confirmed (feature flag, revert plan)

Post-Merge Verification

After merging a worktree back to the main branch, the merge itself can introduce problems.

The Merge Verification Protocol

1. MERGE: Complete the worktree merge

2. BUILD: Immediately build on the target branch
   - Merge conflicts resolved incorrectly cause build failures
   - "It built in the worktree" does not mean it builds after merge

3. TEST: Run full test suite on target branch
   - Tests that passed in isolation may fail when combined
   - Other worktrees' tests may break from your changes

4. DIFF: Review the merge commit
   - git diff HEAD~1 -- verify only expected changes landed
   - Check for conflict markers (<<<<<<< ======= >>>>>>>)
   - Check for duplicate code from bad conflict resolution

5. REPORT: State merge result with evidence
   - "Merged task-3. Build passes, 47/47 tests pass on main."
   - NOT "Merge complete" (no evidence)

Common Merge Failures

Symptom	Cause	Fix
Build fails after merge	Conflict resolved incorrectly	Review conflict markers, fix, rebuild
Tests fail after merge	Two tasks changed same behavior	Determine correct behavior, update test + code
New warnings after merge	Import order, unused vars from merge	Clean up merged code
Missing files after merge	Conflict deleted file incorrectly	Check git log for the file, restore

Cross-Worktree Verification

When multiple worktrees are active (parallel task execution), verify that work in one worktree does not conflict with another.

Before Starting a New Worktree

bash

# Check what other worktrees are active
maestro status
# Note: which files are being modified in other active tasks

Before Merging Parallel Worktrees

bash

# 1. Merge first worktree
maestro merge --task <first-task>

# 2. Verify on main
bun run build && bun test

# 3. Merge second worktree
maestro merge --task <second-task>

# 4. Verify again -- this is where integration breaks surface
bun run build && bun test

# 5. If tests fail: the second merge introduced a conflict
# Do NOT blame the first merge. Investigate the interaction.

Cross-Worktree Red Flags

Two tasks modifying the same file
Two tasks changing the same API
One task adding a dependency another task also adds (version conflict)
One task renaming what another task imports

Handling Verification Failures

When verification fails, follow this decision tree:

Verification failed
  |
  +--> Is the failure in YOUR changed code?
  |     |
  |     +--> YES: Fix it. Re-verify. Do not proceed until green.
  |     |
  |     +--> NO: Is it a pre-existing failure?
  |           |
  |           +--> YES: Document it. Verify YOUR changes don't make it worse.
  |           |         Proceed only if pre-existing failure is unrelated.
  |           |
  |           +--> NO: Is it caused by a parallel merge?
  |                 |
  |                 +--> YES: Coordinate with the other task.
  |                 |         Fix before proceeding.
  |                 |
  |                 +--> NO: Investigate further.
  |                           Do not proceed until understood.
  |
  +--> NEVER: Ignore failures. "Unrelated" failures are usually related.
  +--> NEVER: Skip re-verification after fixing. The fix might break something else.
  +--> NEVER: Mark task complete with known failures.

Flaky Test Protocol

Test fails intermittently?

1. Run 3 times. If it fails 2+ times, it is a REAL failure. Fix it.
2. If it fails 1/3 times:
   - Document the flaky test
   - Investigate root cause (timing, state leakage, external dependency)
   - Do NOT ignore it. Flaky tests mask real failures.
3. If test is flaky AND unrelated to your changes:
   - Document in task report
   - Verify your changes with the flaky test excluded
   - File a follow-up to fix the flaky test

Common Failures

Claim	Requires	Not Sufficient
Tests pass	Test command output: 0 failures	Previous run, "should pass"
Linter clean	Linter output: 0 errors	Partial check, extrapolation
Build succeeds	Build command: exit 0	Linter passing, logs look good
Bug fixed	Test original symptom: passes	Code changed, assumed fixed
Regression test works	Red-green cycle verified	Test passes once
Agent completed	VCS diff shows changes	Agent reports "success"
Requirements met	Line-by-line checklist	Tests passing
Merge clean	Build + tests on target branch	"No conflicts"
Feature complete	All acceptance criteria with evidence	"All tasks done"

Red Flags -- STOP

Signs You Are About to Skip Verification

Using "should", "probably", "seems to"
Expressing satisfaction before verification ("Great!", "Perfect!", "Done!")
About to commit/push/PR without running commands
Trusting agent success reports without checking diff
Relying on partial verification ("tests pass" without build check)
Thinking "just this once"
Tired and wanting work to be over
ANY wording implying success without having run verification

Signs Verification Is Incomplete

Only ran build, not tests
Only ran tests for changed files, not full suite
Did not check exit code (command printed errors but you did not scroll up)
Ran commands but did not read output carefully
Verified in worktree but not after merge
Checked code changes but not acceptance criteria
Ran verification once but made changes after

Signs Verification Is Unreliable

Tests pass but coverage is low on changed code
Build passes but with warnings you did not investigate
Tests pass but they test mocks, not real behavior
"All tests pass" but you added no new tests for new behavior
Verification passes but you cannot explain what each check proved
Same test suite has been passing for weeks with no new tests added

Rationalization Prevention

Excuse	Reality
"Should work now"	RUN the verification
"I'm confident"	Confidence is not evidence
"Just this once"	No exceptions
"Linter passed"	Linter is not compiler
"Agent said success"	Verify independently
"I'm tired"	Exhaustion is not an excuse
"Partial check is enough"	Partial proves nothing
"Different words so rule doesn't apply"	Spirit over letter
"Tests passed in the worktree"	Verify after merge
"It's just a refactor"	Refactors break things. Verify.
"No tests changed so they still pass"	Run them anyway. Prove it.
"The CI will catch it"	CI is a backstop, not a substitute
"I'll verify the next one more carefully"	Verify this one now

Verification Commands by Stack

Adapt to your project. The principle is the same everywhere.

Stack	Build	Test	Lint	Types
TypeScript/Bun	`bun run build`	`bun test`	`bun run lint`	`bun run typecheck`
TypeScript/Node	`npm run build`	`npm test`	`npm run lint`	`npx tsc --noEmit`
Python/uv	N/A	`uv run pytest`	`uv run ruff check`	`uv run mypy .`
Rust	`cargo build`	`cargo test`	`cargo clippy`	(included in build)
Go	`go build ./...`	`go test ./...`	`golangci-lint run`	(included in build)
Java/Gradle	`./gradlew build`	`./gradlew test`	`./gradlew check`	(included in build)

When Stuck

Problem	Solution
Don't know what to verify	Re-read the task spec. Every requirement = one verification item.
Verification takes too long	Run targeted tests first (`bun test <file>`), then full suite. Never skip full suite.
Can't reproduce a failure	Clean state: `rm -rf node_modules dist && bun install && bun run build && bun test`
Flaky test blocking progress	Follow the Flaky Test Protocol above.
Pre-existing failures confuse results	Document them. Diff test results before/after your changes.
Not sure if failure is related	`git stash && bun test` -- if it fails without your changes, it is pre-existing.
Verification passes but behavior seems wrong	Manual smoke test. Automated tests can have blind spots.

Integration with Other Skills

Skill	How verification interacts
maestro:tdd	TDD provides red-green cycle for individual tests. Verification ensures the full suite passes after TDD work.
maestro:review	Review checks code quality. Verification checks code correctness. Both required before completion.
maestro:implement	Implementation produces code. Verification proves the code works. Never skip between them.

The Bottom Line

Run the command. Read the output. THEN claim the result.

Three scopes, one principle: evidence before claims.

Task: build + test + lint + acceptance criteria
Phase: full suite + integration + diff review
Feature: clean build + full suite + E2E + every acceptance criterion

No shortcuts. No exceptions. This is non-negotiable.

Maintainer

ReinaMacCredy Core maintainer

Source details

Full Name: ReinaMacCredy/maestro
Branch: main
Path in repo: skills/built-in/maestro:verification
Topics: claude-code mcp typescript workflow ai-agents orchestration plugin

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

ReinaMacCredy/maestro

maestro-skill-author

Create, update, or debug maestro built-in skills. Covers SKILL.md frontmatter, reference directory structure, step-file architecture, build-time embedding, naming conventions, alias management, and registry validation. Use when creating a new maestro built-in skill, modifying an existing SKILL.md, adding reference files, debugging skill loading failures, updating the skills registry, or working on the skills full port. Also use when frontmatter validation fails, skills don't appear in skill-list, or reference files fail to load.

26 4

Explore

ReinaMacCredy/maestro

maestro:brainstorming

Use before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation.

26 4

Explore

ReinaMacCredy/maestro

mcp-builder

Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).

26 4

Explore

ReinaMacCredy/maestro

maestro:plan-review-loop

Deep-review any plan (maestro, Codex, Claude Code plan mode, or plain markdown) using iterative subagent review loops with BMAD-inspired adversarial edge-case discovery. Spawns reviewer subagents that find issues using pre-mortem, inversion, and red-team techniques, auto-fixes them with structured fix strategies, and re-reviews until the plan passes with zero actionable issues. Use when the user says 'review the plan', 'deep review', 'check the plan thoroughly', 'review loop', 'validate before approving', or wants rigorous plan validation before execution. Also use proactively before plan-approve when the plan is complex or high-risk.

26 4

Explore

ReinaMacCredy/maestro

maestro:research

Structured research workflow for maestro features. Guides tool selection across three tiers (codebase exploration, Context7 for library docs, NotebookLM for deep analysis), defines research patterns, finding organization via memory_write, and completion criteria. Use during the research pipeline stage after feature_create and before plan_write. Also use when investigating a problem space, comparing technical approaches, gathering context on unfamiliar code, or needing to understand external library APIs before making architectural decisions.

26 4

Explore

ReinaMacCredy/maestro

cli-for-agents

Designs or reviews CLIs so coding agents can run them reliably: non-interactive flags, layered --help with examples, stdin/pipelines, fast actionable errors, idempotency, dry-run, and predictable structure. Use when building a CLI, adding commands, writing --help, or when the user mentions agents, terminals, or automation-friendly CLIs.

26 4

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Verification Before Completion

Overview

When to Use

The Iron Law

The Gate Function

Task Verification

What to Run

How to Interpret Results

Phase-Specific Checks

Phase Verification Checklist

Feature Verification

What to Run

Feature Verification Checklist

Post-Merge Verification

The Merge Verification Protocol

Common Merge Failures

Cross-Worktree Verification

Before Starting a New Worktree

Before Merging Parallel Worktrees

Cross-Worktree Red Flags

Handling Verification Failures

Flaky Test Protocol

Common Failures

Red Flags -- STOP

Signs You Are About to Skip Verification

Signs Verification Is Incomplete

Signs Verification Is Unreliable

Rationalization Prevention

Verification Commands by Stack

When Stuck

Integration with Other Skills

The Bottom Line

Recommended Agent Skills

maestro-skill-author

maestro:brainstorming

mcp-builder

maestro:plan-review-loop

maestro:research

cli-for-agents