Agent skill

hypothesis-debugging

Structured code debugging through hypothesis formation and falsification planning. Use when diagnosing bugs, unexpected behaviour, or system failures where the root cause is unclear. Produces a hypothesis document for execution by another agent rather than performing the investigation directly. Triggers on requests to debug issues, diagnose problems, investigate failures, or create debugging plans.

Stars 1
Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/leynos/agent-helper-scripts/tree/main/skills/hypothesis-debugging

SKILL.md

Hypothesis-Driven Debugging

Generate a structured debugging document that identifies candidate root causes and provides falsification plans for each. The output document instructs a separate execution agent; do not perform the investigation yourself.

Philosophical Foundation

Apply Popperian falsificationism: hypotheses cannot be proven true, only disproven. Design tests that could definitively rule out each hypothesis rather than confirm it. A good falsification test produces a clear negative result if the hypothesis is wrong.

Process

1. Gather Context

Before forming hypotheses, collect:

  • Symptom description: What behaviour is observed vs expected?
  • Reproduction conditions: When does it occur? Intermittent or consistent?
  • Recent changes: Deployments, configuration changes, dependency updates
  • Error artefacts: Stack traces, logs, error messages, screenshots
  • Environmental factors: OS, runtime versions, network conditions

If information is missing, note gaps in the output document.

2. Form Hypotheses

Generate 1–5 hypotheses ranked by plausibility. Each hypothesis must be:

  • Specific: Name the component, function, or interaction suspected
  • Falsifiable: A concrete test could disprove it
  • Independent: Falsifying one should not automatically falsify others

Common hypothesis categories:

Category Examples
State Race condition, stale cache, corrupted data
Input Malformed payload, encoding issue, boundary case
Environment Missing dependency, version mismatch, resource exhaustion
Logic Off-by-one, incorrect predicate, missing null check
Integration API contract violation, timeout, auth failure

Avoid vague hypotheses ("something wrong with the database"). Pin down the specific failure mode.

3. Design Falsification Plans

For each hypothesis, specify:

  1. Prediction: If this hypothesis is correct, what observable outcome follows?
  2. Falsification test: What action would produce a contradicting observation?
  3. Expected negative result: What outcome would disprove the hypothesis?
  4. Tooling required: Commands, scripts, or instrumentation needed
  5. Confidence impact: How decisively would a negative result rule this out?

Prefer tests that are:

  • Quick to execute
  • Minimally invasive
  • Deterministic rather than probabilistic

4. Output Document

Generate a Markdown document following the template in assets/debugging-plan.md. Save to the working directory as debugging-plan-{timestamp}.md.

Quality Criteria

A well-formed debugging plan exhibits:

  • Mutual exclusivity: At least one hypothesis should survive if others fail
  • Collective exhaustiveness: Hypotheses cover the likely failure space
  • Ordered efficiency: Cheapest decisive tests appear first
  • Clear success criteria: The executing agent knows when to stop

Anti-Patterns

  • Confirmation bias: Designing tests that can only succeed, not fail
  • Hypothesis creep: Adding new hypotheses during execution rather than revision
  • Coupling: Tests that cannot isolate individual hypotheses
  • Vagueness: "Check the logs" without specifying what pattern would falsify

References

  • references/examples.md: Worked examples of hypothesis-falsification pairs across common debugging scenarios (API timeouts, flaky tests, memory leaks)

Expand your agent's capabilities with these related and highly-rated skills.

leynos/agent-helper-scripts

logisphere-design-review

Pre-implementation design review framework using the df12 Logisphere crew. Stress-tests system designs, RFCs, ADRs, API proposals, data models, and architecture decisions before code gets written. Each expert examines the design through their specialist lens — structural integrity (Pandalump), alternative approaches (Wafflecat), scaling characteristics (Buzzy Bee), contract design (Telefono), failure modes (Doggylump), and long-term viability (Dinolump). Includes a structured pre-mortem and alternatives checkpoint. Use this skill when asked to review a design document, RFC, ADR, system proposal, API design, or architecture decision — or when asked "should we build it this way", "what could go wrong", "design review", "pre-mortem", "architecture review", "RFC review", or any request for pre-implementation feedback.

1 0
Explore
leynos/agent-helper-scripts

implementation-postmortem

Conduct structured implementation postmortems to gather feedback on architecture conformance, library friction, and tooling effectiveness. Use when reviewing completed implementations, PRs, or development phases to surface design gaps, boundary violations, and improvement opportunities. Triggers on requests for code review feedback, implementation retrospectives, architecture audits, or library/tooling evaluations.

1 0
Explore
leynos/agent-helper-scripts

biome-typescript

Configure and use Biome (biomejs) for TypeScript linting and formatting. Use when setting up Biome in a project, configuring lint rules, migrating from ESLint/Prettier, fixing lint errors, setting up CI pipelines with Biome, or configuring git hooks for code quality. Covers biome.json configuration, file inclusion/exclusion patterns, rule overrides, and integration with build tooling.

1 0
Explore
leynos/agent-helper-scripts

code-review

Conduct thorough, actionable code reviews that catch real problems without drowning in noise

1 0
Explore
leynos/agent-helper-scripts

execplans

Write and maintain self-contained ExecPlans (execution plans) that a novice can follow end-to-end; use when planning or implementing non-trivial repo changes.

1 0
Explore
leynos/agent-helper-scripts

leta

Fast semantic code navigation via LSP. Load FIRST before ANY code task - even 'simple' ones. Trigger scenarios: (1) fixing lint/type/pyright/mypy warnings or errors, (2) fixing reportAny/reportUnknownType/Any type errors, (3) adding type annotations, (4) refactoring or modifying code, (5) finding where a function/class/symbol is defined, (6) finding where a symbol is used/referenced/imported, (7) understanding what a function calls or what calls it, (8) exploring unfamiliar code or understanding architecture, (9) renaming symbols across codebase, (10) finding interface/protocol implementations, (11) ANY task where you'd use ripgrep to find code or read-file to view a function. Use `leta show SYMBOL` instead of read-file, `leta refs SYMBOL` instead of ripgrep for usages, `leta grep PATTERN` instead of ripgrep for definitions, `leta files` instead of list-directory.

1 0
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results