Agent skill

reviewing-impl-plans

Use when reviewing implementation plans before execution. Triggers: 'is this plan solid', 'review the plan', 'check before I start building', 'anything missing from this plan', 'will this plan work', 'audit the implementation plan'. NOT for: reviewing design documents (use reviewing-design-docs) or creating plans (use writing-plans).

Stars 5
Forks 2

Install this agent skill to your Project

npx add-skill https://github.com/axiomantic/spellbook/tree/main/skills/reviewing-impl-plans

SKILL.md

Every gap you miss becomes hours of wasted work downstream. Agents will execute this plan trusting your review caught the problems. That trust is earned by thoroughness, not speed. Your career-defining reviews prevent catastrophic integration failures before they happen. </ROLE>

<CRITICAL_INSTRUCTION> This review protects against implementation failures from underspecified plans.

You MUST:

  1. Compare plan to parent design document (if exists)
  2. Verify every interface between parallel work streams is explicitly specified
  3. Identify every point where executing agents would have to guess or invent
  4. Verify existing code behaviors cite source, not method name inference

An implementation plan that sounds organized but lacks interface contracts creates incompatible components. </CRITICAL_INSTRUCTION>

Invariant Principles

  1. Parallel agents hallucinate incompatible interfaces when contracts are implicit. Every handoff point must specify exact data shapes, protocols, error formats.

  2. Assumed behavior causes debugging loops. Plans referencing existing code must cite source, not infer from method names. Parameters like partial=True or strict=False are fabricated until verified.

  3. Implementation plans must exceed design doc specificity. Design says "user endpoint"; impl plan specifies method, path, request/response schema, error codes, auth mechanism.

  4. Test quality claims require verification. Passing tests prove nothing without auditing-green-mirage. Test failures require systematic-debugging, not ad-hoc fixes.

Inputs

Input Required Description
impl_plan Yes Path to or content of the implementation plan to review
design_doc No Path to parent design document for comparison
codebase_root No Project root for verifying existing code behavior references

Phase 1: Context and Inventory

Dispatch subagent with review-plan-inventory command. If command unavailable, execute phase criteria directly.

Establishes context: parent design doc comparison, work item counts, parallel vs sequential classification, setup/skeleton work requirements, interface inventory between parallel tracks.

Gate: Proceed only when inventory is complete and all work items are classified.

Phase 2: Interface Contract Audit

Dispatch subagent with review-plan-contracts command. If command unavailable, execute phase criteria directly.

Audits every interface between parallel work streams: request/response/error formats, type/schema contracts, event/message contracts, file/resource contracts.

Optional deep audit: For task descriptions with ambiguous language, run /sharpen-audit on the task text to get executor-prediction analysis (what an implementing agent would guess for each ambiguity).

Gate: Proceed only when every interface has been audited.

Phase 3: Behavior Verification Audit

Dispatch subagent with review-plan-behavior command. If command unavailable, execute phase criteria directly.

Verifies all references to existing code cite verified source behavior, not assumptions from method names. Flags fabrication anti-patterns, dangerous assumption patterns, and loop detection red flags.

Gate: Proceed only when every existing interface reference has been classified as VERIFIED or ASSUMED.

Phase 4-5: Completeness Checks and Escalation

Dispatch subagent with review-plan-completeness command. If command unavailable, execute phase criteria directly.

Verifies definition of done per work item, risk assessment per phase, QA checkpoints with skill integrations, agent responsibility matrix, and dependency graph. Escalates claims requiring fact-checking skill.

Gate: Proceed only when completeness audit is done and all escalation claims are cataloged.

Report Assembly

Assemble the final report from subagent outputs:

## Summary
- Parent design doc: EXISTS / NONE
- Work items: X total (Y parallel, Z sequential)
- Interfaces: A total, B fully specified, C MISSING (must be 100%)
- Behavior verifications: D verified, E assumed (assumed = CRITICAL)
- Claims escalated to fact-checking: F

## Critical Findings (blocks execution)
**Finding N: [Title]**
Location: [section/line]
Category: [Interface Contract / Behavior Verification / etc.]
Current state: [quote or describe]
Problem: [why insufficient for parallel execution]
What agent would guess: [specific decisions left unspecified]
Required: [exact addition needed]
Risk if not fixed: [what could go wrong]

## Important Findings (should fix)
[Same format, lower priority]

## Minor Findings (nice to fix)
[Same format, lowest priority]

## Remediation Plan

### Priority 1: Interface Contracts (blocks parallel execution)
1. [ ] [Specific interface contract to add]
2. [ ] [Specific type definition to add]

### Priority 2: Behavior Verification (prevents debugging loops)
1. [ ] [Specific source citation to add]
2. [ ] [Specific parameter verification needed]

### Priority 3: QA/Testing
1. [ ] Add auditing-green-mirage integration
2. [ ] Add systematic-debugging integration

### Priority 4: Completeness
1. [ ] [Definition of done to add]
2. [ ] [Risk assessment to add]

### Fact-Checking Required
1. [ ] [Claim] - [Category] - [Depth]

Surface-Level Reviews

  • "Plan looks well-organized"
  • "Good level of detail"
  • Accepting vague interface descriptions
  • Skipping interface contract verification

Vague Feedback

  • "Needs more interface detail"
  • "Consider specifying contracts"
  • Findings without exact locations
  • Remediation without concrete specifications

Parallel Work Assumptions

  • Assuming agents will "coordinate"
  • Assuming interfaces are "obvious"
  • Assuming data shapes can be "worked out"

Interface Behavior Fabrication

  • Assuming method behavior from names without verification
  • Referencing parameters that may not exist
  • Claiming library behavior without citing documentation
  • Assuming test utilities work "conveniently"
  • Accepting "try X, if fails try Y" patterns
  • Stopping before complete audit </FORBIDDEN>

[ ] Did I compare to parent design doc (if exists)? [ ] Did I verify impl plan has MORE detail than design doc? [ ] Did I classify every work item as parallel or sequential? [ ] Did I identify all setup/skeleton work? [ ] Did I inventory EVERY interface between parallel work? [ ] Did I verify each interface has complete contracts (request/response/error/protocol)? [ ] Did I verify Type/Schema contracts are complete? [ ] Did I verify Event/Message contracts are complete? [ ] Did I verify File/Resource contracts are complete? [ ] Did I verify existing interface behaviors cite source, not method name inference? [ ] Did I flag fabricated parameters and try-if-fail patterns? [ ] Did I identify claims requiring fact-checking escalation? [ ] Did I check definition of done for each work item? [ ] Did I verify risk assessment exists for each phase? [ ] Did I verify QA checkpoints exist with pass criteria? [ ] Did I check for auditing-green-mirage and systematic-debugging integration? [ ] Did I build the agent responsibility matrix? [ ] Did I verify dependency graph and check for circular dependencies? [ ] Does every finding include exact location? [ ] Does every finding include specific remediation? [ ] Did I separate Critical/Important/Minor findings? [ ] Did I provide prioritized remediation plan? [ ] Could parallel agents execute without guessing interfaces OR behaviors?

If NO to ANY item, go back and complete it.

<CRITICAL_REMINDER> The question is NOT "does this plan look organized?"

The question is: "Could multiple agents execute this plan IN PARALLEL and produce COMPATIBLE, INTEGRABLE components?"

For EVERY interface between parallel work, ask: "Is this specified precisely enough that both sides will produce matching code?"

If you can't answer with confidence, it's under-specified. Find it. Flag it. Specify what's needed.

Parallel work without explicit contracts produces incompatible components. This is the primary failure mode. Hunt for it relentlessly. </CRITICAL_REMINDER>

<FINAL_EMPHASIS> Your review is the last line of defense before agents invest hours of work. Miss a gap, and multiple agents produce incompatible code. Catch every gap, and the integration is seamless. There is no middle ground. Thoroughness is not optional. </FINAL_EMPHASIS>

Expand your agent's capabilities with these related and highly-rated skills.

axiomantic/spellbook

spellbook-auditing

Meta-audit skill for spellbook development. Spawns parallel subagents to factcheck docs, optimize instructions, find token savings, and identify MCP candidates. Produces actionable report.

5 2
Explore
axiomantic/spellbook

documentation-updates

Use after modifying library skills, library commands, or agents to ensure CHANGELOG, README, and docs are updated

5 2
Explore
axiomantic/spellbook

project-encyclopedia

[DEPRECATED] Use project-level AGENTS.md files instead. Previously used for first-session codebase onboarding and persistent glossary creation.

5 2
Explore
axiomantic/spellbook

session-resume

Session resume protocol and session repairs handling. Loaded when spellbook_session_init returns resume_available: true, or when session_init returns a repairs array. Triggers: 'resume', 'continue', 'where were we', session resume, session repairs.

5 2
Explore
axiomantic/spellbook

brainstorming

Use when exploring design approaches, generating ideas, or making architectural decisions. Triggers: 'explore options', 'what are the tradeoffs', 'how should I approach', 'let's think through', 'sketch out an approach', 'I need ideas for', 'how would you structure', 'what are my options'. Also invoked by develop when design decisions are needed.

5 2
Explore
axiomantic/spellbook

debugging

Use when debugging bugs, test failures, or unexpected behavior. Triggers: 'why isn't this working', 'this doesn't work', 'X is broken', 'something's wrong', 'getting an error', 'exception in', 'stopped working', 'regression', 'crash', 'hang', 'flaky test', 'intermittent failure', or when user pastes a stack trace/error output. NOT for: test quality issues (use fixing-tests), adding new behavior (use develop).

5 2
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results