Agent skill

root-cause-tracing

Use when errors occur deep in execution and you need to trace back to find the original trigger - systematically traces bugs backward through call stack, adding instrumentation when needed, to identify source of invalid data or incorrect behavior

Stars 13
Forks 6

Install this agent skill to your Project

npx add-skill https://github.com/NickCrew/Claude-Cortex/tree/main/skills/root-cause-tracing

SKILL.md

Root Cause Tracing

Overview

Bugs often manifest deep in the call stack (git init in wrong directory, file created in wrong location, database opened with wrong path). Your instinct is to fix where the error appears, but that's treating a symptom.

Core principle: Trace backward through the call chain until you find the original trigger, then fix at the source.

When to Use

dot
digraph when_to_use {
    "Bug appears deep in stack?" [shape=diamond];
    "Can trace backwards?" [shape=diamond];
    "Fix at symptom point" [shape=box];
    "Trace to original trigger" [shape=box];
    "BETTER: Also add defense-in-depth" [shape=box];

    "Bug appears deep in stack?" -> "Can trace backwards?" [label="yes"];
    "Can trace backwards?" -> "Trace to original trigger" [label="yes"];
    "Can trace backwards?" -> "Fix at symptom point" [label="no - dead end"];
    "Trace to original trigger" -> "BETTER: Also add defense-in-depth";
}

Use when:

  • Error happens deep in execution (not at entry point)
  • Stack trace shows long call chain
  • Unclear where invalid data originated
  • Need to find which test/code triggers the problem

The Tracing Process

1. Observe the Symptom

Error: git init failed in /Users/jesse/project/packages/core

2. Find Immediate Cause

What code directly causes this?

typescript
await execFileAsync('git', ['init'], { cwd: projectDir });

3. Ask: What Called This?

typescript
WorktreeManager.createSessionWorktree(projectDir, sessionId)
  → called by Session.initializeWorkspace()
  → called by Session.create()
  → called by test at Project.create()

4. Keep Tracing Up

What value was passed?

  • projectDir = '' (empty string!)
  • Empty string as cwd resolves to process.cwd()
  • That's the source code directory!

5. Find Original Trigger

Where did empty string come from?

typescript
const context = setupCoreTest(); // Returns { tempDir: '' }
Project.create('name', context.tempDir); // Accessed before beforeEach!

Adding Stack Traces

When you can't trace manually, add instrumentation:

typescript
// Before the problematic operation
async function gitInit(directory: string) {
  const stack = new Error().stack;
  console.error('DEBUG git init:', {
    directory,
    cwd: process.cwd(),
    nodeEnv: process.env.NODE_ENV,
    stack,
  });

  await execFileAsync('git', ['init'], { cwd: directory });
}

Critical: Use console.error() in tests (not logger - may not show)

Run and capture:

bash
npm test 2>&1 | grep 'DEBUG git init'

Analyze stack traces:

  • Look for test file names
  • Find the line number triggering the call
  • Identify the pattern (same test? same parameter?)

Finding Which Test Causes Pollution

If something appears during tests but you don't know which test:

Use the bisection script: @find-polluter.sh

bash
./find-polluter.sh '.git' 'src/**/*.test.ts'

Runs tests one-by-one, stops at first polluter. See script for usage.

Real Example: Empty projectDir

Symptom: .git created in packages/core/ (source code)

Trace chain:

  1. git init runs in process.cwd() ← empty cwd parameter
  2. WorktreeManager called with empty projectDir
  3. Session.create() passed empty string
  4. Test accessed context.tempDir before beforeEach
  5. setupCoreTest() returns { tempDir: '' } initially

Root cause: Top-level variable initialization accessing empty value

Fix: Made tempDir a getter that throws if accessed before beforeEach

Also added defense-in-depth:

  • Layer 1: Project.create() validates directory
  • Layer 2: WorkspaceManager validates not empty
  • Layer 3: NODE_ENV guard refuses git init outside tmpdir
  • Layer 4: Stack trace logging before git init

Key Principle

dot
digraph principle {
    "Found immediate cause" [shape=ellipse];
    "Can trace one level up?" [shape=diamond];
    "Trace backwards" [shape=box];
    "Is this the source?" [shape=diamond];
    "Fix at source" [shape=box];
    "Add validation at each layer" [shape=box];
    "Bug impossible" [shape=doublecircle];
    "NEVER fix just the symptom" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];

    "Found immediate cause" -> "Can trace one level up?";
    "Can trace one level up?" -> "Trace backwards" [label="yes"];
    "Can trace one level up?" -> "NEVER fix just the symptom" [label="no"];
    "Trace backwards" -> "Is this the source?";
    "Is this the source?" -> "Trace backwards" [label="no - keeps going"];
    "Is this the source?" -> "Fix at source" [label="yes"];
    "Fix at source" -> "Add validation at each layer";
    "Add validation at each layer" -> "Bug impossible";
}

NEVER fix just where the error appears. Trace back to find the original trigger.

Stack Trace Tips

In tests: Use console.error() not logger - logger may be suppressed Before operation: Log before the dangerous operation, not after it fails Include context: Directory, cwd, environment variables, timestamps Capture stack: new Error().stack shows complete call chain

Real-World Impact

From debugging session (2025-10-03):

  • Found root cause through 5-level trace
  • Fixed at source (getter validation)
  • Added 4 layers of defense
  • 1847 tests passed, zero pollution

Expand your agent's capabilities with these related and highly-rated skills.

NickCrew/Claude-Cortex

claude-consult

Consult Claude specialist agents during implementation for codebase understanding, pattern checking, security review, debugging help, and more. Use this skill whenever you're unsure about conventions, stuck on a failure, or need expert input before writing code. Does not replace the formal review gates in agent-loops — this is for mid-implementation consultation.

13 6
Explore
NickCrew/Claude-Cortex

doc-quality-review

Assess documentation quality across readability, consistency, audience fit, and prose clarity. Produces a scored review with actionable findings. This skill should be used before releases, during doc reviews, or when documentation feels unclear or inconsistent.

13 6
Explore
NickCrew/Claude-Cortex

event-driven-architecture

Event-driven architecture patterns with event sourcing, CQRS, and message-driven communication. Use when designing distributed systems, microservices communication, or systems requiring eventual consistency and scalability.

13 6
Explore
NickCrew/Claude-Cortex

prompt-engineering

Optimize prompts for LLMs and AI systems with structured techniques, evaluation patterns, and synthetic test data generation. Use when building AI features, improving agent performance, or crafting system prompts.

13 6
Explore
NickCrew/Claude-Cortex

compliance-audit

Regulatory compliance auditing across GDPR, HIPAA, PCI DSS, SOC 2, and ISO frameworks with automated evidence collection and gap analysis. Use when conducting compliance assessments, preparing for certifications, or implementing regulatory controls.

13 6
Explore
NickCrew/Claude-Cortex

react-performance-optimization

React performance optimization patterns using memoization, code splitting, and efficient rendering strategies. Use when optimizing slow React applications, reducing bundle size, or improving user experience with large datasets.

13 6
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results