Agent skill
hypothesis-debugging
Structured code debugging through hypothesis formation and falsification planning. Use when diagnosing bugs, unexpected behaviour, or system failures where the root cause is unclear. Produces a hypothesis document for execution by another agent rather than performing the investigation directly. Triggers on requests to debug issues, diagnose problems, investigate failures, or create debugging plans.
Install this agent skill to your Project
npx add-skill https://github.com/leynos/agent-helper-scripts/tree/main/skills/hypothesis-debugging
SKILL.md
Hypothesis-Driven Debugging
Generate a structured debugging document that identifies candidate root causes and provides falsification plans for each. The output document instructs a separate execution agent; do not perform the investigation yourself.
Philosophical Foundation
Apply Popperian falsificationism: hypotheses cannot be proven true, only disproven. Design tests that could definitively rule out each hypothesis rather than confirm it. A good falsification test produces a clear negative result if the hypothesis is wrong.
Process
1. Gather Context
Before forming hypotheses, collect:
- Symptom description: What behaviour is observed vs expected?
- Reproduction conditions: When does it occur? Intermittent or consistent?
- Recent changes: Deployments, configuration changes, dependency updates
- Error artefacts: Stack traces, logs, error messages, screenshots
- Environmental factors: OS, runtime versions, network conditions
If information is missing, note gaps in the output document.
2. Form Hypotheses
Generate 1–5 hypotheses ranked by plausibility. Each hypothesis must be:
- Specific: Name the component, function, or interaction suspected
- Falsifiable: A concrete test could disprove it
- Independent: Falsifying one should not automatically falsify others
Common hypothesis categories:
| Category | Examples |
|---|---|
| State | Race condition, stale cache, corrupted data |
| Input | Malformed payload, encoding issue, boundary case |
| Environment | Missing dependency, version mismatch, resource exhaustion |
| Logic | Off-by-one, incorrect predicate, missing null check |
| Integration | API contract violation, timeout, auth failure |
Avoid vague hypotheses ("something wrong with the database"). Pin down the specific failure mode.
3. Design Falsification Plans
For each hypothesis, specify:
- Prediction: If this hypothesis is correct, what observable outcome follows?
- Falsification test: What action would produce a contradicting observation?
- Expected negative result: What outcome would disprove the hypothesis?
- Tooling required: Commands, scripts, or instrumentation needed
- Confidence impact: How decisively would a negative result rule this out?
Prefer tests that are:
- Quick to execute
- Minimally invasive
- Deterministic rather than probabilistic
4. Output Document
Generate a Markdown document following the template in assets/debugging-plan.md. Save to the working directory as debugging-plan-{timestamp}.md.
Quality Criteria
A well-formed debugging plan exhibits:
- Mutual exclusivity: At least one hypothesis should survive if others fail
- Collective exhaustiveness: Hypotheses cover the likely failure space
- Ordered efficiency: Cheapest decisive tests appear first
- Clear success criteria: The executing agent knows when to stop
Anti-Patterns
- Confirmation bias: Designing tests that can only succeed, not fail
- Hypothesis creep: Adding new hypotheses during execution rather than revision
- Coupling: Tests that cannot isolate individual hypotheses
- Vagueness: "Check the logs" without specifying what pattern would falsify
References
references/examples.md: Worked examples of hypothesis-falsification pairs across common debugging scenarios (API timeouts, flaky tests, memory leaks)
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
logisphere-design-review
Pre-implementation design review framework using the df12 Logisphere crew. Stress-tests system designs, RFCs, ADRs, API proposals, data models, and architecture decisions before code gets written. Each expert examines the design through their specialist lens — structural integrity (Pandalump), alternative approaches (Wafflecat), scaling characteristics (Buzzy Bee), contract design (Telefono), failure modes (Doggylump), and long-term viability (Dinolump). Includes a structured pre-mortem and alternatives checkpoint. Use this skill when asked to review a design document, RFC, ADR, system proposal, API design, or architecture decision — or when asked "should we build it this way", "what could go wrong", "design review", "pre-mortem", "architecture review", "RFC review", or any request for pre-implementation feedback.
implementation-postmortem
Conduct structured implementation postmortems to gather feedback on architecture conformance, library friction, and tooling effectiveness. Use when reviewing completed implementations, PRs, or development phases to surface design gaps, boundary violations, and improvement opportunities. Triggers on requests for code review feedback, implementation retrospectives, architecture audits, or library/tooling evaluations.
biome-typescript
Configure and use Biome (biomejs) for TypeScript linting and formatting. Use when setting up Biome in a project, configuring lint rules, migrating from ESLint/Prettier, fixing lint errors, setting up CI pipelines with Biome, or configuring git hooks for code quality. Covers biome.json configuration, file inclusion/exclusion patterns, rule overrides, and integration with build tooling.
code-review
Conduct thorough, actionable code reviews that catch real problems without drowning in noise
execplans
Write and maintain self-contained ExecPlans (execution plans) that a novice can follow end-to-end; use when planning or implementing non-trivial repo changes.
leta
Fast semantic code navigation via LSP. Load FIRST before ANY code task - even 'simple' ones. Trigger scenarios: (1) fixing lint/type/pyright/mypy warnings or errors, (2) fixing reportAny/reportUnknownType/Any type errors, (3) adding type annotations, (4) refactoring or modifying code, (5) finding where a function/class/symbol is defined, (6) finding where a symbol is used/referenced/imported, (7) understanding what a function calls or what calls it, (8) exploring unfamiliar code or understanding architecture, (9) renaming symbols across codebase, (10) finding interface/protocol implementations, (11) ANY task where you'd use ripgrep to find code or read-file to view a function. Use `leta show SYMBOL` instead of read-file, `leta refs SYMBOL` instead of ripgrep for usages, `leta grep PATTERN` instead of ripgrep for definitions, `leta files` instead of list-directory.
Didn't find tool you were looking for?