Agent skill
formal-verify
Continuous formal verification of architectural constraints and code quality. Use when asked to verify, audit, or validate codebase integrity. Runs automatically via hooks on every edit (structural) and pre-commit (full). Catches ownership violations, boundary crossings, state machine bugs, and code smells that grep ratchets miss. Triggers: "verify", "formal verify", "check architecture", "audit code quality", "run verification", "/verify", "/verify --bootstrap", "/verify --grade".
Install this agent skill to your Project
npx add-skill https://github.com/petekp/agent-skills/tree/main/skills/formal-verify
Metadata
Additional technical details for this skill
- author
- petekp
- version
- 0.1.0
SKILL.md
formal-verify
Use this skill when architectural intent matters more than "it compiles."
This skill runs a three-layer verification loop:
- Layer 1: structural verification over extracted AST facts and declarative rules
- Layer 2: behavioral verification over Z3Py protocol specs and TLA+/Apalache state-machine specs
- Layer 3: elegance auditing over complexity, consistency, and craft heuristics
The layers are intentionally tiered:
- every edit: Layer 1 only, fast enough for continuous feedback
- slice checkpoint: Layers 1 and 2
- pre-commit and manual
/verify: all three layers
Quick Start
Bootstrap a target project with:
/verify --bootstrap
Bootstrap runs four phases:
- Install dependencies and create
.verifier/ - Discover architectural rules from docs and code shape
- Interview the user in plain English about ambiguities
- Validate the initial rules against the current codebase
Commands
/verifyRuns all layers in verbose mode and prints a unified report./verify --bootstrapInstalls dependencies, creates.verifier/, and scaffolds the first rule set./verify --evolveChecks for drift between architectural docs and existing verification specs./verify --gradeRuns Layer 3 only and reports the current elegance grade.
How Verification Runs
Layer 1: Structural
The runner extracts facts from Rust and Swift source files, then checks
structural.yaml rules such as:
- only module X may cross boundary Y
- modules matching pattern Z must implement interface W
- all modules must not reference legacy identifiers
Structural checks are the default PostToolUse hook because they are the fastest.
Layer 2: Behavioral
Behavioral verification covers state transitions and protocol contracts:
- TLA+/Apalache for temporal properties, liveness, and interleavings
- Z3Py spec files for contracts, invariants, and cross-boundary data guarantees
Use this layer at slice checkpoints, before risky merges, and whenever a change touches coordination logic or cross-language contracts.
Layer 3: Elegance
Elegance auditing scores code for:
- complexity
- consistency
- craft
It produces a grade and line-level deductions so the agent can clean up code, not just make it technically correct.
Violation Handling
When a violation is found, tailor the output to the audience:
- agent output: counterexample, diagnosis, concrete fix suggestion
- human output: counterexample and diagnosis only
If the agent fails to resolve the same violation three times, stop the fix loop and escalate with:
- the original rule
- the counterexample
- the three attempted fixes
- what still appears to block a correct repair
Project Structure Created In The Target Repo
Bootstrap creates and maintains:
.verifier/
├── structural.yaml
├── elegance.yaml
├── specs/
├── facts/
└── reports/
structural.yamlstores declarative Layer 1 ruleselegance.yamlstores thresholds and grade policyspecs/stores Z3Py and TLA+ behavioral specsfacts/caches extracted AST factsreports/stores the most recent verification outputs
facts/ and reports/ should be gitignored in the target project.
Operating Guidance
- Run
/verifybefore claiming a migration is complete. - Run
/verify --gradewhen the code is correct but still feels rough. - Prefer updating rules and specs over weakening them when the architecture evolves intentionally.
- Keep
SKILL.mdfocused on orchestration; pull detailed mechanics from the references below.
References
@references/layer1-structural.mdFact extraction, Z3 encoding, reachability, and incremental invalidation.@references/layer2-behavioral.mdWhen to use TLA+/Apalache versus Z3Py, plus spec execution contracts.@references/layer3-elegance.mdMetric families, grading, thresholds, and the Layer 3 sub-module layout.@references/constraint-yaml-spec.mdStructural rule schema, selectors, assertions, and fact pattern operators.@references/bootstrap-process.mdThe install, discover, interview, validate bootstrap workflow.@references/agent-feedback-loop.mdHook integration, violation injection, retries, and escalation policy.@references/spec-authoring-guide.mdTranslating plain-English architectural intent into formal specs.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
multi-model-meta-analysis
Synthesize outputs from multiple AI models into a comprehensive, verified assessment. Use when: (1) User pastes feedback/analysis from multiple LLMs (Claude, GPT, Gemini, etc.) about code or a project, (2) User wants to consolidate model outputs into a single reliable document, (3) User needs conflicting model claims resolved against actual source code. This skill verifies model claims against the codebase, resolves contradictions with evidence, and produces a more reliable assessment than any single model.
capture-learning
Analyze recent conversation context and capture learnings to project knowledge files (for project-specific insights) or skills/commands/subagents (for cross-project patterns). Use when the user asks to "capture this learning", "update the docs with this", "remember this for next time", "document this issue", "add this to CLAUDE.md", "save this knowledge", or "update project knowledge". Also triggers after resolving build/setup issues, discovering non-obvious patterns, or completing debugging sessions with valuable insights.
optimize-agent-docs
Build a retrieval-optimized knowledge layer over agent documentation in dotfiles (.claude, .codex, .cursor, .aider). Use when asked to "optimize docs", "improve agent knowledge", "make docs more efficient", or when documentation has accumulated and retrieval feels inefficient. Generates a manifest mapping task-contexts to knowledge chunks, optimizes information density, and creates compiled artifacts for efficient agent consumption.
agent-changelog
Compile an agent-optimized changelog by cross-referencing git history with plans and documentation. Use when asked to "update changelog", "compile history", "document project evolution", or proactively after major milestones, architectural changes, or when stale/deprecated information is detected that could confuse coding agents.
literate-guide
Create a narrative guide to a codebase or feature in the style of Knuth's Literate Programming — code and prose interwoven as a single essay, ordered for human understanding rather than compiler needs. Use when the user asks to 'explain this codebase as a story', 'write a literate guide', 'create a narrative walkthrough', 'tell the story of this code', 'Knuth-style documentation', 'weave a guide for this feature', or when they want deep, readable documentation that treats the program as literature. Also trigger when someone wants a document that a thoughtful reader could follow from start to finish and come away understanding both WHAT the code does and WHY every design choice was made.
autonomous-agent-readiness
Assess a codebase's readiness for autonomous agent development and provide tailored recommendations. Use when asked to evaluate how well a project supports unattended agent execution, assess development practices for agent autonomy, audit infrastructure for agent reliability, or improve a codebase for autonomous agent workflows. Triggers on requests like "assess this project for agent readiness", "how autonomous-ready is this codebase", "evaluate agent infrastructure", or "improve development practices for agents".
Didn't find tool you were looking for?