Agent skill

safe-refactoring

Change code structure without changing behavior, with zero tolerance for behavioral changes

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/safe-refactoring

SKILL.md

Safe Refactoring for Scientific Code

Overview

Change code structure without changing behavior. Zero tolerance for behavioral changes during refactoring.

Core principle: Establish baseline, refactor, verify exact match (within floating-point noise).

Announce at start: "I'm using the safe-refactoring skill to restructure this code."

When to Use This Skill

Use for:

Improving code readability without changing logic
Extracting reusable functions
Renaming variables/functions for clarity
Reorganizing code structure
Performance optimization (without changing numerical behavior)

Don't use for:

Changing behavior or algorithms (use scientific-tdd instead)
Adding new features (use scientific-tdd instead)
Fixing bugs (use scientific-tdd or fix directly with tests)

Process Checklist

Copy to TodoWrite:

Safe Refactoring Progress:
- [ ] Run full test suite (establish baseline)
- [ ] Run snapshot tests (establish baseline)
- [ ] Capture coverage report
- [ ] Perform refactoring
- [ ] Run full test suite (must match baseline exactly)
- [ ] Run snapshot tests (must match baseline exactly)
- [ ] Compare coverage (should stay same or improve)
- [ ] Run quality checks (ruff + black)
- [ ] Verify no numerical differences
- [ ] Commit refactoring

Strict Rules

ZERO tolerance for:

Any test that passed before and fails after
Any test that failed before and passes after (suggests test was broken)
Any snapshot differences (not even floating-point noise)
Decreased test coverage
Any behavioral changes

If any of these occur: Revert and investigate why.

Detailed Steps

Step 1: Run Full Test Suite (Baseline)

bash

/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -v 2>&1 | tee /tmp/baseline_tests.txt

Record:

Total tests: grep "passed" /tmp/baseline_tests.txt
Any failures (if refactoring existing code with known issues)
Test execution time

Expected: All tests pass (or document any known failures)

Step 2: Run Snapshot Tests (Baseline)

bash

/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -m snapshot -v 2>&1 | tee /tmp/baseline_snapshots.txt

CRITICAL: Snapshots must match exactly after refactoring.

Expected: All snapshot tests pass

Step 3: Capture Coverage Report

bash

/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest --cov=non_local_detector --cov-report=term --cov-report=json:coverage_baseline.json

Record: Coverage percentage for files being refactored

Why: Coverage should not decrease during refactoring (ideally improves)

Step 4: Perform Refactoring

Refactoring techniques:

Extract function:

python

# Before
def complex_function():
    # ... 50 lines of code
    result = x * 2 + y
    # ... more code
    return final_result

# After
def complex_function():
    # ... code
    result = _calculate_intermediate(x, y)
    # ... code
    return final_result

def _calculate_intermediate(x, y):
    return x * 2 + y

Rename for clarity:

python

# Before
def f(x):
    return x * 2

# After
def calculate_doubled_value(value):
    return value * 2

Reorganize structure:

python

# Before: All in one file

# After: Separated into modules
# - core_logic.py
# - utilities.py
# - validation.py

Optimize performance (numerically equivalent):

python

# Before
for i in range(n):
    result[i] = f(x[i])

# After (JAX)
result = jax.vmap(f)(x)

During refactoring:

Make small, incremental changes
Test after each change if possible
Keep numerical operations identical
Maintain exact same algorithms

Step 5: Run Full Test Suite (Verify Match)

bash

/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -v 2>&1 | tee /tmp/refactored_tests.txt

Compare to baseline:

bash

diff /tmp/baseline_tests.txt /tmp/refactored_tests.txt

MUST verify:

Same number of tests run
Same tests pass
Same tests fail (if any)
Similar execution time (within 20%)

If differences:

Any new test failures: REVERT IMMEDIATELY
Any new test passes: Investigate (test was broken?)
Different test count: Investigate (tests missing or duplicated?)

Step 6: Run Snapshot Tests (Verify Match)

bash

/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -m snapshot -v 2>&1 | tee /tmp/refactored_snapshots.txt

CRITICAL: Must match baseline EXACTLY.

Expected: All snapshot tests pass, no differences

If snapshot differences:

DO NOT UPDATE SNAPSHOTS
Investigate why behavior changed
This is NOT a refactoring if behavior changed
Revert and reconsider approach

Step 7: Compare Coverage

bash

/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest --cov=non_local_detector --cov-report=term --cov-report=json:coverage_refactored.json

Compare:

bash

# If you have jq installed
jq '.totals.percent_covered' coverage_baseline.json
jq '.totals.percent_covered' coverage_refactored.json

Expected:

Coverage stays same or improves
Never decreases

If coverage decreased:

Some code paths no longer tested
Investigate and fix or revert

Step 8: Run Quality Checks

bash

/Users/edeno/miniconda3/envs/spectral_connectivity/bin/ruff check src/
/Users/edeno/miniconda3/envs/spectral_connectivity/bin/ruff format src/
/Users/edeno/miniconda3/envs/non_local_detector/bin/black src/

Expected: All checks pass

Fix any issues: Refactoring is good opportunity to improve code quality

Step 9: Verify No Numerical Differences

For mathematical code, verify numerical equivalence:

bash

# Run golden regression
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest \
  src/non_local_detector/tests/test_golden_regression.py -v

Expected: Exact match (or differences < 1e-14)

If differences > 1e-14:

This is NOT a pure refactoring
Behavior has changed
Use numerical-validation skill instead

Step 10: Commit Refactoring

Only commit if ALL checks pass:

bash

git add <refactored_files> <test_files>
git commit -m "refactor: improve <component> code structure

- Extract <function> for reusability
- Rename <variable> for clarity
- Reorganize <module> structure

No behavioral changes:
- All tests pass (N tests)
- Snapshots unchanged
- Coverage: X% → Y%

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>"

Performance Optimization Refactoring

When optimizing for performance:

Capture performance baseline:

bash

pytest --durations=10 > /tmp/baseline_durations.txt

Make optimization
Verify numerical equivalence (use numerical-validation skill)

Measure performance improvement:

bash

pytest --durations=10 > /tmp/optimized_durations.txt

Document improvement:

Optimization: Use JAX vmap instead of for loop
Speedup: 3.2x (450ms → 140ms)
Numerical difference: < 1e-14 (verified)

Integration with Other Skills

Before refactoring: Consider if change actually needs new behavior (use scientific-tdd instead)
With numerical-validation: If refactoring mathematical code, use numerical-validation to verify equivalence
With jax skill: When optimizing JAX code, use jax skill for best practices

Example Workflow

Task: Extract position decoding logic into reusable function

1. Baseline:
   - Run pytest: 427 passed, 0 failed
   - Run snapshots: 15 passed, 0 failed
   - Coverage: 69%

2. Refactor:
   - Extract _decode_position_from_posterior() function
   - Update 3 call sites to use new function
   - No logic changes, just extraction

3. Verify:
   - Run pytest: 427 passed, 0 failed ✓
   - Run snapshots: 15 passed, 0 failed ✓
   - Coverage: 69% (unchanged) ✓

4. Quality:
   - Ruff: All checks pass ✓
   - Black: Formatted ✓

5. Commit:
   "refactor: extract position decoding into reusable function"

Red Flags

STOP and revert if:

Any test changes status (pass → fail or fail → pass)
Any snapshot differences appear
Coverage decreases
Numerical differences > 1e-14
You're tempted to update snapshots
You're adding new logic (use scientific-tdd instead)

Safe to proceed if:

All tests match baseline exactly
No snapshot changes
Coverage same or better
Code quality improves
No new functionality added

Common Mistakes

"It's just a small behavioral change"

No such thing in refactoring
Any behavioral change = not refactoring
Use scientific-tdd for behavioral changes

"I'll update the snapshots since the new output is better"

That's not refactoring, it's changing behavior
Refactoring = zero snapshot changes
Use scientific-tdd if output should change

"Tests are slow, I'll skip them"

Never skip tests during refactoring
Tests are your safety net
Without tests, you can't verify it's a refactoring

"Coverage went down but the code is better"

Better code shouldn't lose coverage
Investigate why coverage decreased
Fix or revert

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/safe-refactoring
License: MIT License

Featured Tools

Join Our Newsletter

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Safe Refactoring for Scientific Code

Overview

When to Use This Skill

Process Checklist

Strict Rules

Detailed Steps

Step 1: Run Full Test Suite (Baseline)

Step 2: Run Snapshot Tests (Baseline)

Step 3: Capture Coverage Report

Step 4: Perform Refactoring

Step 5: Run Full Test Suite (Verify Match)

Step 6: Run Snapshot Tests (Verify Match)

Step 7: Compare Coverage

Step 8: Run Quality Checks

Step 9: Verify No Numerical Differences

Step 10: Commit Refactoring

Performance Optimization Refactoring

Integration with Other Skills

Example Workflow

Red Flags

Common Mistakes

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state