Agent skill
safe-refactoring
Change code structure without changing behavior, with zero tolerance for behavioral changes
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/safe-refactoring
SKILL.md
Safe Refactoring for Scientific Code
Overview
Change code structure without changing behavior. Zero tolerance for behavioral changes during refactoring.
Core principle: Establish baseline, refactor, verify exact match (within floating-point noise).
Announce at start: "I'm using the safe-refactoring skill to restructure this code."
When to Use This Skill
Use for:
- Improving code readability without changing logic
- Extracting reusable functions
- Renaming variables/functions for clarity
- Reorganizing code structure
- Performance optimization (without changing numerical behavior)
Don't use for:
- Changing behavior or algorithms (use scientific-tdd instead)
- Adding new features (use scientific-tdd instead)
- Fixing bugs (use scientific-tdd or fix directly with tests)
Process Checklist
Copy to TodoWrite:
Safe Refactoring Progress:
- [ ] Run full test suite (establish baseline)
- [ ] Run snapshot tests (establish baseline)
- [ ] Capture coverage report
- [ ] Perform refactoring
- [ ] Run full test suite (must match baseline exactly)
- [ ] Run snapshot tests (must match baseline exactly)
- [ ] Compare coverage (should stay same or improve)
- [ ] Run quality checks (ruff + black)
- [ ] Verify no numerical differences
- [ ] Commit refactoring
Strict Rules
ZERO tolerance for:
- Any test that passed before and fails after
- Any test that failed before and passes after (suggests test was broken)
- Any snapshot differences (not even floating-point noise)
- Decreased test coverage
- Any behavioral changes
If any of these occur: Revert and investigate why.
Detailed Steps
Step 1: Run Full Test Suite (Baseline)
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -v 2>&1 | tee /tmp/baseline_tests.txt
Record:
- Total tests:
grep "passed" /tmp/baseline_tests.txt - Any failures (if refactoring existing code with known issues)
- Test execution time
Expected: All tests pass (or document any known failures)
Step 2: Run Snapshot Tests (Baseline)
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -m snapshot -v 2>&1 | tee /tmp/baseline_snapshots.txt
CRITICAL: Snapshots must match exactly after refactoring.
Expected: All snapshot tests pass
Step 3: Capture Coverage Report
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest --cov=non_local_detector --cov-report=term --cov-report=json:coverage_baseline.json
Record: Coverage percentage for files being refactored
Why: Coverage should not decrease during refactoring (ideally improves)
Step 4: Perform Refactoring
Refactoring techniques:
-
Extract function:
python# Before def complex_function(): # ... 50 lines of code result = x * 2 + y # ... more code return final_result # After def complex_function(): # ... code result = _calculate_intermediate(x, y) # ... code return final_result def _calculate_intermediate(x, y): return x * 2 + y -
Rename for clarity:
python# Before def f(x): return x * 2 # After def calculate_doubled_value(value): return value * 2 -
Reorganize structure:
python# Before: All in one file # After: Separated into modules # - core_logic.py # - utilities.py # - validation.py -
Optimize performance (numerically equivalent):
python# Before for i in range(n): result[i] = f(x[i]) # After (JAX) result = jax.vmap(f)(x)
During refactoring:
- Make small, incremental changes
- Test after each change if possible
- Keep numerical operations identical
- Maintain exact same algorithms
Step 5: Run Full Test Suite (Verify Match)
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -v 2>&1 | tee /tmp/refactored_tests.txt
Compare to baseline:
diff /tmp/baseline_tests.txt /tmp/refactored_tests.txt
MUST verify:
- Same number of tests run
- Same tests pass
- Same tests fail (if any)
- Similar execution time (within 20%)
If differences:
- Any new test failures: REVERT IMMEDIATELY
- Any new test passes: Investigate (test was broken?)
- Different test count: Investigate (tests missing or duplicated?)
Step 6: Run Snapshot Tests (Verify Match)
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -m snapshot -v 2>&1 | tee /tmp/refactored_snapshots.txt
CRITICAL: Must match baseline EXACTLY.
Expected: All snapshot tests pass, no differences
If snapshot differences:
- DO NOT UPDATE SNAPSHOTS
- Investigate why behavior changed
- This is NOT a refactoring if behavior changed
- Revert and reconsider approach
Step 7: Compare Coverage
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest --cov=non_local_detector --cov-report=term --cov-report=json:coverage_refactored.json
Compare:
# If you have jq installed
jq '.totals.percent_covered' coverage_baseline.json
jq '.totals.percent_covered' coverage_refactored.json
Expected:
- Coverage stays same or improves
- Never decreases
If coverage decreased:
- Some code paths no longer tested
- Investigate and fix or revert
Step 8: Run Quality Checks
/Users/edeno/miniconda3/envs/spectral_connectivity/bin/ruff check src/
/Users/edeno/miniconda3/envs/spectral_connectivity/bin/ruff format src/
/Users/edeno/miniconda3/envs/non_local_detector/bin/black src/
Expected: All checks pass
Fix any issues: Refactoring is good opportunity to improve code quality
Step 9: Verify No Numerical Differences
For mathematical code, verify numerical equivalence:
# Run golden regression
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest \
src/non_local_detector/tests/test_golden_regression.py -v
Expected: Exact match (or differences < 1e-14)
If differences > 1e-14:
- This is NOT a pure refactoring
- Behavior has changed
- Use numerical-validation skill instead
Step 10: Commit Refactoring
Only commit if ALL checks pass:
git add <refactored_files> <test_files>
git commit -m "refactor: improve <component> code structure
- Extract <function> for reusability
- Rename <variable> for clarity
- Reorganize <module> structure
No behavioral changes:
- All tests pass (N tests)
- Snapshots unchanged
- Coverage: X% → Y%
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>"
Performance Optimization Refactoring
When optimizing for performance:
-
Capture performance baseline:
bashpytest --durations=10 > /tmp/baseline_durations.txt -
Make optimization
-
Verify numerical equivalence (use numerical-validation skill)
-
Measure performance improvement:
bashpytest --durations=10 > /tmp/optimized_durations.txt -
Document improvement:
Optimization: Use JAX vmap instead of for loop Speedup: 3.2x (450ms → 140ms) Numerical difference: < 1e-14 (verified)
Integration with Other Skills
- Before refactoring: Consider if change actually needs new behavior (use scientific-tdd instead)
- With numerical-validation: If refactoring mathematical code, use numerical-validation to verify equivalence
- With jax skill: When optimizing JAX code, use jax skill for best practices
Example Workflow
Task: Extract position decoding logic into reusable function
1. Baseline:
- Run pytest: 427 passed, 0 failed
- Run snapshots: 15 passed, 0 failed
- Coverage: 69%
2. Refactor:
- Extract _decode_position_from_posterior() function
- Update 3 call sites to use new function
- No logic changes, just extraction
3. Verify:
- Run pytest: 427 passed, 0 failed ✓
- Run snapshots: 15 passed, 0 failed ✓
- Coverage: 69% (unchanged) ✓
4. Quality:
- Ruff: All checks pass ✓
- Black: Formatted ✓
5. Commit:
"refactor: extract position decoding into reusable function"
Red Flags
STOP and revert if:
- Any test changes status (pass → fail or fail → pass)
- Any snapshot differences appear
- Coverage decreases
- Numerical differences > 1e-14
- You're tempted to update snapshots
- You're adding new logic (use scientific-tdd instead)
Safe to proceed if:
- All tests match baseline exactly
- No snapshot changes
- Coverage same or better
- Code quality improves
- No new functionality added
Common Mistakes
"It's just a small behavioral change"
- No such thing in refactoring
- Any behavioral change = not refactoring
- Use scientific-tdd for behavioral changes
"I'll update the snapshots since the new output is better"
- That's not refactoring, it's changing behavior
- Refactoring = zero snapshot changes
- Use scientific-tdd if output should change
"Tests are slow, I'll skip them"
- Never skip tests during refactoring
- Tests are your safety net
- Without tests, you can't verify it's a refactoring
"Coverage went down but the code is better"
- Better code shouldn't lose coverage
- Investigate why coverage decreased
- Fix or revert
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?