Agent skill
test-diff-analyzer
Analyze test differences between runs to identify flaky tests and consistency issues. Use to find tests that fail intermittently.
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/testing/test-diff-analyzer
SKILL.md
Analyze Test Differences Between Runs
Compare test results across multiple runs to identify flaky tests.
When to Use
- Test passes locally but fails in CI
- Test sometimes passes, sometimes fails (flaky test)
- Need to understand test consistency issues
- Comparing test results before/after code changes
- Debugging intermittent test failures
Quick Reference
bash
# Run tests and capture output
pixi run mojo test -I . tests/ > /tmp/test_run_1.log
# Compare two test runs
diff -u /tmp/test_run_1.log /tmp/test_run_2.log
# Extract failures from log
grep "FAILED" /tmp/test_run_*.log | sort | uniq -c
# Show tests that sometimes pass, sometimes fail
grep "FAILED\|PASSED" /tmp/test_run_*.log | cut -d: -f2 | sort | uniq -d
Analysis Workflow
- Collect baseline: Run tests locally N times
- Collect CI data: Get CI test results from recent runs
- Compare outputs: Diff between test runs
- Identify flaky tests: Tests with inconsistent results
- Find patterns: When does test fail vs pass
- Root cause: Timing, randomness, resource issues
- Remediation: Fix or isolate flaky test
Flaky Test Indicators
Timing Issues:
- Test passes when run in isolation
- Test fails when run with other tests
- Timeout values too aggressive
- Race conditions in setup/teardown
Randomness Issues:
- Random seed not fixed
- Hash ordering varies
- Dictionary/set iteration order
- Floating point precision
Resource Issues:
- Test passes locally but fails in CI
- Fails under resource constraints
- Out of memory errors intermittently
- Disk space dependent
Output Format
Report analysis with:
- Flaky Tests - Tests with inconsistent results
- Consistency Score - Pass rate across runs (e.g., 80% pass rate)
- Failure Patterns - When/how tests fail
- Impact - How many test runs affected
- Root Cause Hypothesis - What likely causes instability
- Recommendations - How to fix or isolate flaky test
Error Handling
| Problem | Solution |
|---|---|
| Different environment | Run in controlled environment (docker) |
| Insufficient data | Run more iterations to get pattern |
| No failure info | Enable debug output, increase verbosity |
| External dependencies | Mock or isolate external services |
| Timing-dependent | Add explicit waits or retry logic |
References
- See mojo-test-runner for test execution options
- See extract-test-failures for failure analysis
- See CLAUDE.md for test standards and TDD workflow
Didn't find tool you were looking for?