Agent skill

differential-session-runner

Run or continue a differential debugging session between two implementations, traces, captures, or outputs. Record artifact identity, exact commands, first mismatch progression, findings, validation, and next probe in a durable session log.

Stars 85
Forks 13

Install this agent skill to your Project

npx add-skill https://github.com/alchemiststudiosDOTai/harness-engineering/tree/main/skills/differential-session-runner

SKILL.md

Differential Session Runner

Use this skill when debugging requires a durable evidence trail rather than ad hoc notes.

This skill is for workflows where you are comparing:

  • original vs rewrite
  • implementation A vs implementation B
  • baseline trace vs candidate trace
  • before vs after replay output
  • captured artifact vs regenerated artifact

The goal is not only to investigate. The goal is to leave behind a session artifact another operator or agent can continue.

When to Use

Use this skill when the user asks to:

  • continue a differential debugging session
  • compare two implementations and record mismatches
  • create or update a debugging session log
  • track replay / trace / capture divergence over time
  • document what changed between mismatched and cleared runs

Core Principle

Every differential investigation should produce a reusable evidence packet.

A good session artifact lets another operator answer:

  • what exact artifact was investigated?
  • how was it identified?
  • what commands were run?
  • where did the first mismatch appear?
  • what was learned?
  • what changed?
  • what validation proves the current state?
  • what should happen next?

Preferred Storage

If the repo already has a native evidence location, use it.

Examples:

  • docs/.../sessions/
  • docs/chunks/
  • analysis/.../reports/
  • existing session indexes

If the repo does not already have a native convention, write to:

  • memory-bank/evidence/YYYY-MM-DD_HH-MM-SS_<topic>-session.md
  • optional index: memory-bank/evidence/index.md

Workflow

1. Load the repo's evidence convention first

Before creating anything, search for:

  • playbooks / runbooks
  • session indexes
  • prior session files
  • report directories
  • chunk/evidence templates

Read the relevant guidance and continue the repo's existing pattern.

2. Identify the artifact under investigation

Capture the strongest available identity for the artifact:

  • artifact path
  • SHA256 / content hash
  • commit SHA
  • case name / run id / replay id
  • timestamp if needed

If a content hash is possible, record it early and use it as the main session identity.

3. Decide whether to continue or create

Search existing sessions for the artifact identity.

  • If a session already exists for the same artifact/hash, append to that session.
  • If the artifact identity is new, create a new session and update the session index if the repo has one.

4. Record baseline comparison commands

Capture the exact commands used for the first comparison step.

Examples:

bash
python compare.py --baseline out/a.json --candidate out/b.json
pytest tests/test_replay.py -k case_17
mytool diff trace_a.cdt trace_b.cdt

Never summarize commands loosely. Record them exactly.

5. Record first mismatch progression

Capture the first relevant divergence and, if applicable, how it moved over time.

Examples:

  • first bad tick
  • first mismatched field
  • first unexpected output line
  • first extra/missing RNG draw
  • first snapshot diff

If later fixes move the mismatch frontier, append the new progression rather than deleting the old one.

6. Record key findings

Write findings as evidence-backed observations, not guesses.

Good findings:

  • identify the mismatched subsystem or callsite
  • name the field or branch that diverged
  • note whether the issue self-heals or persists
  • connect the mismatch to a concrete code path or state transition

7. Record landed changes separately

If code changes are made during the investigation, capture them in a separate section:

  • files changed
  • short description of each change
  • tests added or updated

If no changes were made, state that explicitly.

8. Record validation

List the validation commands and their results.

Examples:

  • replay passes end-to-end
  • diff result is now clean
  • targeted tests pass
  • health check passes

Do not write "fixed" without a validation section.

9. Set the next probe

Every session should end with one of:

  • cleared / no next probe required
  • one explicit next investigation target
  • one blocked dependency or missing tool/input

Recommended Session Template

markdown
---
title: "<topic> – Differential Session"
phase: Evidence
date: "YYYY-MM-DD HH:MM:SS"
owner: "<agent_or_user>"
tags: [evidence, differential, <topic>]
---

## Artifact
- Path: `<path>`
- Identity: `<sha256|commit|case-id>`

## Baseline Commands
- `<exact command 1>`
- `<exact command 2>`

## First Mismatch Progression
- baseline: `<first mismatch>`
- after fix 1: `<new frontier>`
- after fix 2: `<cleared|new mismatch>`

## Key Findings
- finding 1
- finding 2

## Landed Changes
- `path/to/file` → change summary
- `tests/...` → validation coverage added

## Validation
- `<command>` → `<result>`
- `<command>` → `<result>`

## Outcome / Next Probe
- `<cleared | next probe | blocked reason>`

Good Session Behavior

  • preserve earlier mismatch states instead of overwriting them
  • use hashes or artifact IDs to avoid ambiguous session naming
  • record exact commands so another operator can replay the same evidence path
  • separate findings from fixes
  • separate fixes from validation

Bad Session Behavior

  • "Investigated replay issue and fixed some stuff"
  • omitting the artifact identity
  • omitting commands
  • replacing prior mismatch history with the latest state only
  • claiming success without validation evidence

Output Requirements

A completed session artifact should make handoff possible with no hidden context.

It must include:

  • artifact identity
  • exact commands
  • first mismatch progression
  • key findings
  • landed changes or explicit no-change note
  • validation evidence
  • next probe or clear outcome

Handoff

After updating the session artifact:

  • if the investigation uncovered code work, hand off to plan-phase
  • if code work is already scoped in a plan, hand off to execute-phase
  • if the investigation is complete, send the user the session path and a one-line status

Expand your agent's capabilities with these related and highly-rated skills.

alchemiststudiosDOTai/harness-engineering

agents-md-mapper

This skill should be used when creating, refreshing, or validating a repository `AGENTS.md` so it stays concise, current, and grounded in repository evidence. Use when `AGENTS.md` is missing or stale, after refactors or tooling changes, when new docs become the system of record, or when adding lightweight drift checks.

85 13
Explore
alchemiststudiosDOTai/harness-engineering

ast-grep-setup

Set up ast-grep for a codebase with common TypeScript rules for detecting anti-patterns, enforcing best practices, and preventing bugs. Creates sgconfig.yml, rule files, and rule tests. Use when adding structural linting, banning legacy patterns, or implementing ratchet gates.

85 13
Explore
alchemiststudiosDOTai/harness-engineering

research-phase

This skill should be used when mapping or researching a codebase to understand its structure, patterns, and architecture. Use when the user asks to "map the codebase", "research how X works", "find all Y patterns", or needs to understand code organization. Produces factual structural maps in .artifacts/research/—no suggestions, no recommendations, just what exists. Uses ast-grep for structural pattern matching.

85 13
Explore
alchemiststudiosDOTai/harness-engineering

plan-phase

Generate execution-ready implementation plans from research docs - planning ONLY, no fixing or verifying. North Star is whether a JR developer can execute the plan with zero additional context.

85 13
Explore
alchemiststudiosDOTai/harness-engineering

execute-phase

Execute implementation plans from .artifacts/plan/. Focus on EXECUTING ONLY - no planning, no fixes outside plan scope. Uses gated checks, atomic commits, and maintains a single execution log in .artifacts/execute/. Use when the user says "execute this plan" or provides a plan path.

85 13
Explore
alchemiststudiosDOTai/harness-engineering

qa-from-execute

Perform quality assurance on code changes after the research-phase -> plan-phase -> execute-phase workflow. STRICTLY QA only—no coding, no fixes, no source-code changes. Focus on changed areas only, emphasizing control/data flow correctness.

85 13
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results