Agent skill
path-tracing-reverse
This skill provides guidance for reverse engineering compiled binaries to produce equivalent source code. It applies when tasks require analyzing executables, extracting algorithms and constants, and recreating identical program behavior in source form. Use when the goal is byte-for-byte or pixel-perfect reproduction of binary output.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/path-tracing-reverse
SKILL.md
Path Tracing Reverse Engineering
Overview
This skill guides the systematic reverse engineering of compiled binaries to produce functionally identical source code. The primary challenge is achieving exact output reproduction, not approximate similarity. Common applications include recreating graphics programs (ray tracers, path tracers), understanding proprietary algorithms, and recovering lost source code.
Critical Success Criteria
Before beginning, establish clear success criteria:
- Exact output match: "identical" means byte-for-byte identical, not visually similar
- File size parity: Output files must match in size (header + data)
- Checksum verification: Use
md5sumorsha256sumto verify exact matches - No tolerance for approximation: A 99% match is still a failure if 100% is required
Systematic Approach
Phase 1: Output Format Analysis
Start with the output format before analyzing the algorithm. Mismatched output formatting causes file size differences that are independent of algorithmic correctness.
-
Capture reference output:
bash./mystery > reference_output.ppm ls -la reference_output.ppm # Note exact file size xxd reference_output.ppm | head -20 # Examine header bytes -
Analyze header format:
- For PPM: Check exact spacing, newlines, and number formatting
- Compare:
P6\n800 600\n255\nvsP6 800 600 255\n - Whitespace differences affect file size
-
Verify pixel data layout:
bashxxd -s 15 reference_output.ppm | head # Skip header, view raw pixels
Phase 2: Binary Analysis Setup
Create a systematic disassembly workspace:
-
Extract symbol information:
bashnm ./mystery | grep -E "^[0-9a-f]+ T" > functions.txt strings ./mystery > strings.txt objdump -t ./mystery > symbols.txt -
Generate complete disassembly:
bashobjdump -d ./mystery > disassembly.txt objdump -s -j .rodata ./mystery > rodata.txt # Read-only data objdump -s -j .data ./mystery > data.txt # Initialized data -
Identify main algorithm structure:
bashobjdump -d ./mystery | grep -A 50 "<main>:" > main_function.txt
Phase 3: Constant Extraction
Extract ALL constants systematically before writing any code:
-
Float constants: Located in
.rodatasectionpythonimport struct # Convert hex bytes to float hex_bytes = bytes.fromhex('0000803f') # Example: 1.0f value = struct.unpack('<f', hex_bytes)[0] -
Integer constants: Often embedded in instructions
bashgrep -E "mov.*\$0x" disassembly.txt # Find immediate values -
Create constant catalog: Document every constant with its:
- Memory address
- Raw hex value
- Decoded value (int/float/double)
- Suspected purpose
Phase 4: Algorithm Reconstruction
Reconstruct the algorithm methodically:
-
Map function call graph:
- Identify all
callinstructions in main - Trace each called function
- Document parameters and return values
- Identify all
-
Trace data flow:
- Follow register usage through functions
- Identify loop structures (counters, bounds)
- Map memory accesses to array/struct operations
-
Handle floating-point operations:
- Check if code uses SSE/AVX or x87 FPU
- Note precision:
float(32-bit) vsdouble(64-bit) - SSE:
movss,addss,mulss= single precision - SSE:
movsd,addsd,mulsd= double precision
Phase 5: Incremental Verification
Never write the entire solution at once. Verify components individually:
-
Background/base case first:
- Render only the background (sky, ground)
- Compare specific pixel coordinates
- Achieve 100% match on background before adding objects
-
Pixel-by-pixel debugging:
python# Create comparison script def compare_pixels(ref_file, test_file): with open(ref_file, 'rb') as f1, open(test_file, 'rb') as f2: ref = f1.read() test = f2.read() # Find first difference for i, (r, t) in enumerate(zip(ref, test)): if r != t: pixel = (i - header_size) // 3 x, y = pixel % width, pixel // width print(f"First diff at byte {i}, pixel ({x},{y})") print(f"Expected: {r}, Got: {t}") return -
Coordinate-specific testing:
bash# Extract specific pixel from PPM # At offset = header_size + (y * width + x) * 3
Common Pitfalls
Output Format Errors
- Whitespace in headers: PPM allows various separators; match exactly
- Numeric formatting:
printf("%d", n)vsprintf("%3d", n) - Line endings: Unix LF vs Windows CRLF
- Trailing content: Extra newlines or padding
Floating-Point Mismatches
- Precision mismatch: Using
doublewhen binary usesfloat - Rounding modes: Compiler optimizations may change rounding
- Order of operations:
(a + b) + cvsa + (b + c)differs in FP - Library differences:
sin(),sqrt()implementations vary
Algorithmic Assumptions
- Premature pattern matching: Don't assume "ray tracer" means standard formulas
- Missing components: Multiple light sources, reflections, ambient terms
- Coordinate systems: Left-handed vs right-handed, y-up vs y-down
- Iteration order: Row-major vs column-major pixel traversal
Verification Failures
- Visual comparison is insufficient: Images may look identical but differ by 1-2 RGB values
- Partial matches are failures: 25% match means 75% wrong
- File size differences indicate format issues: Address these first
Verification Strategy
Automated Testing Harness
Create this script early and use it consistently:
#!/bin/bash
# verify.sh - Compile, run, and compare
gcc -static -o reversed mystery.c -lm
./mystery > expected.ppm
./reversed > actual.ppm
echo "File sizes:"
ls -la expected.ppm actual.ppm
echo "Checksums:"
md5sum expected.ppm actual.ppm
if cmp -s expected.ppm actual.ppm; then
echo "SUCCESS: Files are identical"
else
echo "FAILURE: Files differ"
cmp -l expected.ppm actual.ppm | head -20
fi
Progressive Debugging
When outputs differ:
- Verify file sizes first - format issues vs algorithm issues
- Find first differing byte - localize the problem
- Convert byte offset to coordinates - identify which pixel/component
- Compare expected vs actual at that location - understand the discrepancy
- Trace the calculation - work backward to find the bug
Checkpoint Validation
At each phase, verify:
- Header format matches exactly
- Background pixels match (no objects)
- Object boundaries are correct
- Lighting/shading values match
- Final checksum matches
Tool Reference
Essential tools for binary analysis:
| Tool | Purpose |
|---|---|
objdump -d |
Disassembly |
objdump -s -j .rodata |
Read-only data section |
nm |
Symbol table |
strings |
Embedded strings |
xxd |
Hex dump |
gdb |
Dynamic analysis |
ltrace |
Library call tracing |
strace |
System call tracing |
Resources
This skill includes reference materials to support reverse engineering tasks:
references/
reverse_engineering_checklist.md- Step-by-step verification checklistfloat_extraction.md- Guide to extracting floating-point constants from binaries
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?