Agent skill

graph-evolution

Compares Trailmark code graphs at two source code snapshots (git commits, tags, or directories) to surface security-relevant structural changes. Detects new attack paths, complexity shifts, blast radius growth, taint propagation changes, and privilege boundary modifications that text diffs miss. Use when comparing code between commits or tags, analyzing structural evolution, detecting attack surface growth, reviewing what changed between audit snapshots, or finding security-relevant changes that text diffs miss.

Stars 4,181
Forks 369

Install this agent skill to your Project

npx add-skill https://github.com/trailofbits/skills/tree/main/plugins/trailmark/skills/graph-evolution

SKILL.md

Graph Evolution

Builds Trailmark code graphs at two source snapshots and computes a structural diff. Surfaces security-relevant changes that text-level diffs miss: new attack paths, complexity shifts, blast radius growth, taint propagation changes, and privilege boundary modifications.

When to Use

  • Comparing two git refs to understand what structurally changed
  • Auditing a range of commits for security-relevant evolution
  • Detecting new attack paths created by code changes
  • Finding functions whose blast radius or complexity grew silently
  • Identifying taint propagation changes across refactors
  • Pre-release structural comparison (tag-to-tag or branch-to-branch)

When NOT to Use

  • Line-level code review (use differential-review for text-diff analysis)
  • Single-snapshot analysis (use the trailmark skill directly)
  • Diagram generation from a single snapshot (use the diagramming-code skill)
  • Mutation testing triage (use the genotoxic skill)

Rationalizations to Reject

Rationalization Why It's Wrong Required Action
"We just need the structural diff, skip pre-analysis" Without pre-analysis, you miss taint changes, blast radius growth, and privilege boundary shifts Run engine.preanalysis() on both snapshots
"Text diff covers what changed" Text diffs miss new attack paths, transitive complexity shifts, and subgraph membership changes Use structural diff to complement text diff
"Only added nodes matter" Removed security functions and shifted privilege boundaries are equally dangerous Review removals and modifications, not just additions
"Low-severity structural changes can be ignored" INFO-level changes (dead code removal) can mask removed security checks Classify every change, review removals for replaced functionality
"One snapshot's graph is enough for comparison" Single-snapshot analysis can't detect evolution — you need both before and after Always build and export both graphs
"Tool isn't installed, I'll compare manually" Manual comparison misses what graph analysis catches Install trailmark first

Prerequisites

trailmark must be installed. If uv run trailmark fails, run:

bash
uv pip install trailmark

DO NOT fall back to "manual comparison" or reading source files as a substitute for running trailmark. The tool must be installed and used programmatically. If installation fails, report the error.


Quick Start

bash
# Compare two git refs (e.g., tags, branches, commits)
# 1. Build graphs at each snapshot
# 2. Run pre-analysis on both
# 3. Compute structural diff
# 4. Generate report

# Step-by-step: see Workflow below

Decision Tree

├─ Need to understand what each metric means?
│  └─ Read: references/evolution-metrics.md
│
├─ Need the report output format?
│  └─ Read: references/report-format.md
│
├─ Already have two graph JSON exports?
│  └─ Jump to Phase 3 (run graph_diff.py directly)
│
└─ Starting from two git refs?
   └─ Start at Phase 1

Workflow

Graph Evolution Progress:
- [ ] Phase 1: Create snapshots (git worktrees)
- [ ] Phase 2: Build graphs + pre-analysis on both snapshots
- [ ] Phase 3: Compute structural diff
- [ ] Phase 4: Interpret diff and generate report
- [ ] Phase 5: Clean up worktrees

Phase 1: Create Snapshots

Use git worktrees to get clean copies of each ref without disturbing the working tree.

bash
# Create temp directories for worktrees
BEFORE_DIR=$(mktemp -d)
AFTER_DIR=$(mktemp -d)

# Create worktrees (run from repo root)
git worktree add "$BEFORE_DIR" {before_ref}
git worktree add "$AFTER_DIR" {after_ref}

If comparing two directories instead of git refs, skip this phase and use the directory paths directly in Phase 2.

Phase 2: Build Graphs and Run Pre-Analysis

Build Trailmark graphs for both snapshots and run pre-analysis on each. Pre-analysis computes blast radius, taint propagation, privilege boundaries, and entrypoint enumeration.

python
import json
from trailmark.query.api import QueryEngine

def build_and_export(target_dir, language, output_path):
    """Build graph, run pre-analysis, export JSON."""
    engine = QueryEngine.from_directory(target_dir, language=language)
    engine.preanalysis()
    json_str = engine.to_json()
    with open(output_path, "w") as f:
        f.write(json_str)
    return engine.summary()

import tempfile, os
work_dir = tempfile.mkdtemp(prefix="trailmark_evolution_")
before_json = os.path.join(work_dir, "before_graph.json")
after_json = os.path.join(work_dir, "after_graph.json")

before_summary = build_and_export(
    "{before_dir}", "{lang}", before_json
)
after_summary = build_and_export(
    "{after_dir}", "{lang}", after_json
)

Verify both graphs built successfully by checking the summary output. If either fails, check that the language parameter matches the codebase and that trailmark supports all file types present.

Phase 3: Compute Structural Diff

Run the diff script on the two exported JSON files (using the same work_dir from Phase 2):

bash
uv run {baseDir}/scripts/graph_diff.py \
    --before "{before_json}" \
    --after "{after_json}" > "{work_dir}/evolution_diff.json"

The output JSON contains:

Key Contents
summary_delta Changes in node/edge/entrypoint counts
nodes.added New functions, classes, methods
nodes.removed Deleted functions, classes, methods
nodes.modified Functions with changed CC, params, return type, span
edges.added New call/inheritance/import relationships
edges.removed Deleted relationships
subgraphs Per-subgraph membership changes (tainted, high_blast_radius, etc.)

Phase 4: Interpret Diff and Generate Report

Read the diff JSON and generate a security-focused markdown report. See references/report-format.md for the full template.

Interpretation priorities (highest to lowest):

  1. New tainted paths — nodes entering the tainted subgraph, especially if they also appear in added edges targeting sensitive functions
  2. Privilege boundary changes — new or removed trust transitions
  3. Attack surface growth — new entrypoints, especially untrusted_external
  4. Blast radius increases — nodes entering high_blast_radius
  5. Complexity spikes — CC increases > 3 on tainted or entrypoint-reachable nodes
  6. Structural additions — new nodes and edges (review needed)
  7. Structural removals — verify removed security functions were replaced

Cross-reference structural changes with git diff {before_ref}..{after_ref} to add source-level context to findings.

Severity classification:

Severity Structural Signal
CRITICAL New tainted path to sensitive function, removed auth boundary
HIGH New entrypoint + high blast radius, large CC increase on tainted node
MEDIUM New trust-boundary-crossing edges, moderate CC increase
LOW Added nodes without entrypoint reachability
INFO Dead code removal, complexity reductions

For detailed metric definitions, see references/evolution-metrics.md.

Phase 5: Clean Up

Remove git worktrees after the report is written:

bash
git worktree remove "{before_dir}"
git worktree remove "{after_dir}"

Diff Script Reference

uv run {baseDir}/scripts/graph_diff.py [OPTIONS]
Argument Default Description
--before required Path to the "before" graph JSON
--after required Path to the "after" graph JSON
--indent 2 JSON output indentation

Input format: Trailmark JSON exports from engine.to_json(). Output: JSON structural diff to stdout.


Quality Checklist

Before delivering the report:

  • Both graphs built successfully (check summaries)
  • Pre-analysis ran on both snapshots
  • Structural diff computed (non-empty diff JSON)
  • All subgraph changes interpreted (tainted, blast radius, etc.)
  • Critical findings include evidence (node IDs, edge diffs)
  • Severity levels assigned to all findings
  • Source-level context added via git diff cross-reference
  • Worktrees cleaned up (or temp dirs removed)
  • Report written to GRAPH_EVOLUTION_*.md

Integration

trailmark skill: Phase 2 uses the trailmark API for graph building and pre-analysis. All trailmark query patterns work on either snapshot's engine.

differential-review skill: Use graph-evolution for structural analysis, differential-review for line-level code review. The two are complementary — graph-evolution finds attack paths that text diffs miss, while differential-review provides git blame context and micro-adversarial analysis.

genotoxic skill: If graph-evolution reveals new high-CC tainted nodes, feed them to genotoxic for mutation testing triage.

diagramming-code skill: Generate before/after diagrams to visualize structural changes. Use call-graph or data-flow diagrams focused on changed nodes.


Supporting Documentation

  • references/evolution-metrics.md — What each structural metric means and why it matters for security
  • references/report-format.md — Report template, severity classification, and example findings

Expand your agent's capabilities with these related and highly-rated skills.

trailofbits/skills

gh-cli

Enforces authenticated gh CLI workflows over unauthenticated curl/WebFetch patterns. Use when working with GitHub URLs, API access, pull requests, or issues.

4,181 369
Explore
trailofbits/skills

supply-chain-risk-auditor

Identifies dependencies at heightened risk of exploitation or takeover. Use when assessing supply chain attack surface, evaluating dependency health, or scoping security engagements.

4,181 369
Explore
trailofbits/skills

zeroize-audit

Detects missing zeroization of sensitive data in source code and identifies zeroization removed by compiler optimizations, with assembly-level analysis, and control-flow verification. Use for auditing C/C++/Rust code handling secrets, keys, passwords, or other sensitive data.

4,181 369
Explore
trailofbits/skills

sharp-edges

Identifies error-prone APIs, dangerous configurations, and footgun designs that enable security mistakes. Use when reviewing API designs, configuration schemas, cryptographic library ergonomics, or evaluating whether code follows 'secure by default' and 'pit of success' principles. Triggers: footgun, misuse-resistant, secure defaults, API usability, dangerous configuration.

4,181 369
Explore
trailofbits/skills

insecure-defaults

Detects fail-open insecure defaults (hardcoded secrets, weak auth, permissive security) that allow apps to run insecurely in production. Use when auditing security, reviewing config management, or analyzing environment variable handling.

4,181 369
Explore
trailofbits/skills

dwarf-expert

Provides expertise for analyzing DWARF debug files and understanding the DWARF debug format/standard (v3-v5). Triggers when understanding DWARF information, interacting with DWARF files, answering DWARF-related questions, or working with code that parses DWARF data.

4,181 369
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results