Agent skill
duplication-detect
Find and eliminate code duplication with DRY refactoring strategies
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/duplication-detect
SKILL.md
Code Duplication Detection & DRY Refactoring
I'll analyze your codebase for duplicate code blocks, identify similar patterns across files, and suggest DRY (Don't Repeat Yourself) refactoring strategies based on obra principles.
Detection Capabilities:
- Exact code duplication (copy-paste detection)
- Similar code patterns (semantic duplication)
- Repeated logic across files
- Duplicated constants and magic numbers
- Repeated test patterns
Supported Languages:
- JavaScript/TypeScript
- Python
- Go
- Java
- Generic text-based detection
Token Optimization
This skill uses duplication detection-specific patterns to minimize token usage:
1. Source Directory Detection Caching (500 token savings)
Pattern: Cache project structure and source directories
- Store structure in
.duplication-structure-cache(1 hour TTL) - Cache: source directories, file extensions, excluded paths
- Read cached structure on subsequent runs (50 tokens vs 550 tokens fresh)
- Invalidate on directory structure changes
- Savings: 91% on repeat duplication checks
2. Bash-Based Duplication Tool Execution (1,800 token savings)
Pattern: Use jscpd/PMD directly via bash
- JavaScript:
jscpd --format json(300 tokens) - Python:
pylint --duplicate-code(300 tokens) - Generic:
simianor custom grep-based detection (400 tokens) - Parse JSON output with jq
- No Task agents for duplication detection
- Savings: 90% vs Task-based duplication analysis
3. Sample-Based Duplication Reporting (900 token savings)
Pattern: Report first 10 duplication instances only
- Show top 10 duplications by severity (600 tokens)
- Count remaining duplications without details
- Full report via
--allflag - Savings: 65% vs reporting every duplication
4. Template-Based DRY Refactoring Recommendations (800 token savings)
Pattern: Use predefined DRY patterns
- Standard strategies: extract function, extract constant, inheritance, composition
- Pattern-based recommendations for duplication types
- No creative refactoring generation
- Savings: 80% vs LLM-generated DRY strategies
5. Incremental Duplication Checks (1,000 token savings)
Pattern: Check only changed files via git diff
- Analyze files modified since last commit (500 tokens)
- Check if changes introduce new duplication
- Full codebase analysis via
--fullflag - Savings: 75% vs full codebase duplication detection
6. Grep-Based Similar Pattern Discovery (700 token savings)
Pattern: Find potential duplications with grep
- Grep for repeated patterns: function signatures, constant values (300 tokens)
- Flag files with high similarity
- Run full tool only on flagged file pairs
- Savings: 70% vs running tool on all file combinations
7. Cached Duplication Baseline (600 token savings)
Pattern: Store baseline duplication report
- Cache initial duplication report
- Compare new runs against baseline to detect new duplications
- Focus on newly introduced duplications
- Savings: 80% by focusing on deltas
8. Threshold-Based Filtering (400 token savings)
Pattern: Filter out small duplications
- Default: 6+ lines of duplication
- Skip trivial duplications (imports, boilerplate)
- Adjustable threshold
- Savings: 70% by filtering noise
Real-World Token Usage Distribution
Typical operation patterns:
- Check recent changes (git diff scope): 1,200 tokens
- Show top 10 duplications: 1,400 tokens
- Full codebase analysis (first time): 3,000 tokens
- Cached baseline comparison: 800 tokens
- Refactoring recommendations (top 5): 1,800 tokens
- Most common: Incremental checks on recent changes
Expected per-analysis: 1,500-2,500 tokens (60% reduction from 3,500-6,000 baseline) Real-world average: 1,100 tokens (due to incremental checks, sample-based reporting, cached baseline)
Arguments: $ARGUMENTS - optional: minimum duplication threshold (default: 6 lines) or specific directories
DRY refactoring strategies:
- Extract function/method (most common)
- Extract constant/configuration
- Inheritance or composition
- Higher-order functions
- Template method pattern
- Strategy pattern for variations
Phase 1: Tool Setup & Configuration
First, I'll set up duplication detection tools:
#!/bin/bash
# Duplication Detection - Setup & Configuration
echo "=== Code Duplication Detection ==="
echo ""
# Create analysis directory
mkdir -p .claude/duplication-analysis
ANALYSIS_DIR=".claude/duplication-analysis"
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
REPORT="$ANALYSIS_DIR/duplication-report-$TIMESTAMP.md"
MIN_LINES="${1:-6}" # Minimum lines for duplication detection
echo "Configuration:"
echo " Minimum duplication threshold: $MIN_LINES lines"
echo " Analysis directory: $ANALYSIS_DIR"
echo ""
# Detect project structure
echo "Detecting project structure..."
SOURCE_DIRS=""
# Common source directories
for dir in src lib app components pages services utils helpers; do
if [ -d "$dir" ]; then
SOURCE_DIRS="$SOURCE_DIRS $dir"
echo " ✓ Found: $dir/"
fi
done
# Python-specific
for dir in tests test __tests__; do
if [ -d "$dir" ]; then
echo " ✓ Found test directory: $dir/"
fi
done
if [ -z "$SOURCE_DIRS" ]; then
echo " Using current directory"
SOURCE_DIRS="."
fi
echo ""
Phase 2: Install Detection Tools
I'll install and configure duplication detection tools:
echo "=== Installing Duplication Detection Tools ==="
echo ""
install_jscpd() {
# JSCPD - Multi-language copy-paste detector
if ! command -v jscpd >/dev/null 2>&1 && ! npm list -g jscpd >/dev/null 2>&1; then
echo "Installing jscpd (copy-paste detector)..."
npm install -g jscpd 2>/dev/null || npm install --save-dev jscpd
if [ $? -eq 0 ]; then
echo "✓ jscpd installed"
else
echo "⚠️ Failed to install jscpd - using basic detection"
return 1
fi
else
echo "✓ jscpd already installed"
fi
# Create jscpd configuration
cat > "$ANALYSIS_DIR/.jscpd.json" << JSCPDCONFIG
{
"threshold": $MIN_LINES,
"reporters": ["json", "html"],
"ignore": [
"**/*.min.js",
"**/node_modules/**",
"**/dist/**",
"**/build/**",
"**/.git/**",
"**/coverage/**",
"**/__pycache__/**",
"**/venv/**",
"**/.venv/**"
],
"format": ["javascript", "typescript", "python", "go", "java"],
"absolute": false,
"output": "$ANALYSIS_DIR/jscpd-report"
}
JSCPDCONFIG
echo "✓ jscpd configuration created"
return 0
}
# Install tool
if install_jscpd; then
USE_JSCPD=true
else
USE_JSCPD=false
echo "ℹ️ Will use basic grep-based detection"
fi
echo ""
Phase 3: Detect Code Duplication
I'll analyze the codebase for duplicated code:
echo "=== Analyzing Code Duplication ==="
echo ""
run_jscpd_analysis() {
echo "Running jscpd analysis..."
echo ""
jscpd $SOURCE_DIRS \
--config "$ANALYSIS_DIR/.jscpd.json" \
2>&1 | tee "$ANALYSIS_DIR/jscpd-log.txt"
if [ $? -eq 0 ]; then
echo ""
echo "✓ jscpd analysis complete"
# Check if duplications found
if [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
DUPLICATIONS=$(jq '.statistics.total.duplications' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
PERCENTAGE=$(jq '.statistics.total.percentage' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
echo ""
echo "Duplication Statistics:"
echo " Total duplications: $DUPLICATIONS"
echo " Duplication percentage: $PERCENTAGE%"
echo ""
# Extract top duplications
echo "Top Duplicated Blocks:"
jq -r '.duplicates[] |
"\(.format) - \(.lines) lines duplicated in \(.fragment) (\(.firstFile.name):\(.firstFile.start) and \(.secondFile.name):\(.secondFile.start))"' \
"$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null | head -10
fi
else
echo "⚠️ jscpd analysis failed - see $ANALYSIS_DIR/jscpd-log.txt"
return 1
fi
}
run_basic_duplication_detection() {
echo "Running basic duplication detection..."
echo ""
# Find similar function signatures
echo "Detecting similar function signatures..."
# JavaScript/TypeScript functions
if find $SOURCE_DIRS -name "*.js" -o -name "*.jsx" -o -name "*.ts" -o -name "*.tsx" 2>/dev/null | grep -q .; then
grep -rh "^function\|^const.*=.*=>.*{$\|^export function" \
--include="*.js" --include="*.jsx" --include="*.ts" --include="*.tsx" \
$SOURCE_DIRS 2>/dev/null | \
sort | uniq -d | head -10 > "$ANALYSIS_DIR/duplicate-functions.txt"
if [ -s "$ANALYSIS_DIR/duplicate-functions.txt" ]; then
echo " Found duplicate function signatures (JavaScript/TypeScript):"
cat "$ANALYSIS_DIR/duplicate-functions.txt" | sed 's/^/ /'
echo ""
fi
fi
# Python functions
if find $SOURCE_DIRS -name "*.py" 2>/dev/null | grep -q .; then
grep -rh "^def \|^async def " \
--include="*.py" \
$SOURCE_DIRS 2>/dev/null | \
sort | uniq -d | head -10 > "$ANALYSIS_DIR/duplicate-python-functions.txt"
if [ -s "$ANALYSIS_DIR/duplicate-python-functions.txt" ]; then
echo " Found duplicate function signatures (Python):"
cat "$ANALYSIS_DIR/duplicate-python-functions.txt" | sed 's/^/ /'
echo ""
fi
fi
# Find magic numbers (potential constants)
echo "Detecting magic numbers (repeated literals)..."
find $SOURCE_DIRS -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" \) 2>/dev/null | \
xargs grep -oh "[0-9]\{2,\}" 2>/dev/null | \
sort | uniq -c | sort -rn | head -10 > "$ANALYSIS_DIR/magic-numbers.txt"
if [ -s "$ANALYSIS_DIR/magic-numbers.txt" ]; then
echo " Most common numeric literals:"
cat "$ANALYSIS_DIR/magic-numbers.txt" | sed 's/^/ /'
echo ""
fi
# Find repeated strings
echo "Detecting repeated string literals..."
find $SOURCE_DIRS -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" \) 2>/dev/null | \
xargs grep -oh "\"[^\"]\{10,\}\"" 2>/dev/null | \
sort | uniq -c | sort -rn | head -10 > "$ANALYSIS_DIR/repeated-strings.txt"
if [ -s "$ANALYSIS_DIR/repeated-strings.txt" ]; then
echo " Most common string literals:"
cat "$ANALYSIS_DIR/repeated-strings.txt" | sed 's/^/ /'
echo ""
fi
}
# Run appropriate analysis
if [ "$USE_JSCPD" = true ]; then
if ! run_jscpd_analysis; then
echo "Falling back to basic detection..."
run_basic_duplication_detection
fi
else
run_basic_duplication_detection
fi
Phase 4: Generate DRY Refactoring Strategies
I'll provide specific refactoring patterns for eliminating duplication:
echo ""
echo "=== Generating DRY Refactoring Strategies ==="
echo ""
cat > "$ANALYSIS_DIR/dry-refactoring-patterns.md" << 'DRYPATTERNS'
# DRY Refactoring Patterns
Based on obra YAGNI/DRY principles: Don't Repeat Yourself
---
## Pattern 1: Extract Function
**Problem:** Same code block repeated in multiple places
**Solution:** Extract to a reusable function
### Before (JavaScript)
```javascript
// File 1
function processOrderA(order) {
if (!order.customer || !order.customer.email) {
throw new Error('Invalid customer');
}
if (!order.items || order.items.length === 0) {
throw new Error('Empty order');
}
// ... process order
}
// File 2
function processOrderB(order) {
if (!order.customer || !order.customer.email) {
throw new Error('Invalid customer');
}
if (!order.items || order.items.length === 0) {
throw new Error('Empty order');
}
// ... different processing
}
After
// utils/validation.js
export function validateOrder(order) {
if (!order.customer?.email) {
throw new Error('Invalid customer');
}
if (!order.items?.length) {
throw new Error('Empty order');
}
}
// File 1
import { validateOrder } from './utils/validation';
function processOrderA(order) {
validateOrder(order);
// ... process order
}
// File 2
import { validateOrder } from './utils/validation';
function processOrderB(order) {
validateOrder(order);
// ... different processing
}
DRY Improvement: 8 duplicated lines → 1 function call
Pattern 2: Extract Configuration/Constants
Problem: Same literals repeated across codebase
Solution: Centralize in configuration
Before (Python)
# Multiple files with repeated values
def calculate_shipping():
if weight < 10:
return 5.99
return 9.99
def check_free_shipping(total):
return total >= 50.00
def apply_discount():
if quantity >= 5:
return 0.10
return 0
After
# config/constants.py
class ShippingConfig:
FREE_SHIPPING_THRESHOLD = 50.00
LIGHT_PACKAGE_WEIGHT = 10
LIGHT_PACKAGE_COST = 5.99
HEAVY_PACKAGE_COST = 9.99
class DiscountConfig:
BULK_QUANTITY = 5
BULK_DISCOUNT = 0.10
# services/shipping.py
from config.constants import ShippingConfig, DiscountConfig
def calculate_shipping(weight):
if weight < ShippingConfig.LIGHT_PACKAGE_WEIGHT:
return ShippingConfig.LIGHT_PACKAGE_COST
return ShippingConfig.HEAVY_PACKAGE_COST
def check_free_shipping(total):
return total >= ShippingConfig.FREE_SHIPPING_THRESHOLD
def apply_discount(quantity):
if quantity >= DiscountConfig.BULK_QUANTITY:
return DiscountConfig.BULK_DISCOUNT
return 0
DRY Improvement: Magic numbers eliminated, single source of truth
Pattern 3: Template Method Pattern
Problem: Similar algorithms with slight variations
Solution: Use template method or strategy pattern
Before (TypeScript)
class PDFReport {
generate() {
this.loadData();
this.formatHeader();
this.formatPDFContent();
this.addPDFFooter();
this.savePDF();
}
private loadData() { /* same logic */ }
private formatHeader() { /* same logic */ }
private formatPDFContent() { /* PDF-specific */ }
private addPDFFooter() { /* PDF-specific */ }
private savePDF() { /* PDF-specific */ }
}
class ExcelReport {
generate() {
this.loadData();
this.formatHeader();
this.formatExcelContent();
this.addExcelFooter();
this.saveExcel();
}
private loadData() { /* DUPLICATE logic */ }
private formatHeader() { /* DUPLICATE logic */ }
private formatExcelContent() { /* Excel-specific */ }
private addExcelFooter() { /* Excel-specific */ }
private saveExcel() { /* Excel-specific */ }
}
After
abstract class Report {
// Template method
generate() {
this.loadData();
this.formatHeader();
this.formatContent();
this.addFooter();
this.save();
}
// Common implementations
protected loadData() {
// Shared logic
}
protected formatHeader() {
// Shared logic
}
// Abstract methods for subclasses
protected abstract formatContent(): void;
protected abstract addFooter(): void;
protected abstract save(): void;
}
class PDFReport extends Report {
protected formatContent() { /* PDF-specific */ }
protected addFooter() { /* PDF-specific */ }
protected save() { /* PDF-specific */ }
}
class ExcelReport extends Report {
protected formatContent() { /* Excel-specific */ }
protected addFooter() { /* Excel-specific */ }
protected save() { /* Excel-specific */ }
}
DRY Improvement: Shared logic in base class, variations in subclasses
Pattern 4: Higher-Order Functions
Problem: Similar operations with different behaviors
Solution: Use higher-order functions or callbacks
Before (JavaScript)
function processUsersForEmail(users) {
const results = [];
for (const user of users) {
if (user.email && user.active) {
results.push(user);
}
}
return results;
}
function processUsersForSMS(users) {
const results = [];
for (const user of users) {
if (user.phone && user.active) {
results.push(user);
}
}
return results;
}
function processUsersForPush(users) {
const results = [];
for (const user of users) {
if (user.deviceToken && user.active) {
results.push(user);
}
}
return results;
}
After
function processUsers(users, channel) {
const validators = {
email: user => user.email && user.active,
sms: user => user.phone && user.active,
push: user => user.deviceToken && user.active
};
return users.filter(validators[channel]);
}
// Or more flexible with custom validator
function processUsers(users, isValid) {
return users.filter(isValid);
}
// Usage
const emailUsers = processUsers(users, user => user.email && user.active);
const smsUsers = processUsers(users, user => user.phone && user.active);
const pushUsers = processUsers(users, user => user.deviceToken && user.active);
DRY Improvement: 3 functions → 1 configurable function
Pattern 5: Composition Over Duplication
Problem: Repeated utility combinations
Solution: Compose smaller utilities
Before (Go)
// Repeated validation patterns
func ValidateUser(user User) error {
if user.Email == "" {
return errors.New("email required")
}
if !strings.Contains(user.Email, "@") {
return errors.New("invalid email")
}
if user.Age < 18 {
return errors.New("must be 18+")
}
return nil
}
func ValidateAdmin(admin Admin) error {
if admin.Email == "" {
return errors.New("email required")
}
if !strings.Contains(admin.Email, "@") {
return errors.New("invalid email")
}
if admin.Age < 18 {
return errors.New("must be 18+")
}
if admin.Role == "" {
return errors.New("role required")
}
return nil
}
After
// Composable validators
func requireEmail(email string) error {
if email == "" {
return errors.New("email required")
}
return nil
}
func validateEmailFormat(email string) error {
if !strings.Contains(email, "@") {
return errors.New("invalid email")
}
return nil
}
func requireAdult(age int) error {
if age < 18 {
return errors.New("must be 18+")
}
return nil
}
func requireRole(role string) error {
if role == "" {
return errors.New("role required")
}
return nil
}
// Compose validators
func ValidateUser(user User) error {
validators := []func() error{
func() error { return requireEmail(user.Email) },
func() error { return validateEmailFormat(user.Email) },
func() error { return requireAdult(user.Age) },
}
return runValidators(validators)
}
func ValidateAdmin(admin Admin) error {
validators := []func() error{
func() error { return requireEmail(admin.Email) },
func() error { return validateEmailFormat(admin.Email) },
func() error { return requireAdult(admin.Age) },
func() error { return requireRole(admin.Role) },
}
return runValidators(validators)
}
func runValidators(validators []func() error) error {
for _, validate := range validators {
if err := validate(); err != nil {
return err
}
}
return nil
}
DRY Improvement: Reusable validators, composable validation
Pattern 6: Extract Test Helpers
Problem: Repeated test setup/teardown
Solution: Create test utilities
Before
// test/user.test.js
describe('User tests', () => {
it('should create user', () => {
const db = createDatabase();
const user = { name: 'John', email: 'john@example.com' };
// ... test logic
db.close();
});
it('should update user', () => {
const db = createDatabase();
const user = { name: 'John', email: 'john@example.com' };
// ... test logic
db.close();
});
});
// test/order.test.js
describe('Order tests', () => {
it('should create order', () => {
const db = createDatabase();
const user = { name: 'John', email: 'john@example.com' };
// ... test logic
db.close();
});
});
After
// test/helpers/testUtils.js
export function setupTestDatabase() {
const db = createDatabase();
return {
db,
cleanup: () => db.close()
};
}
export function createTestUser(overrides = {}) {
return {
name: 'John',
email: 'john@example.com',
...overrides
};
}
// test/user.test.js
import { setupTestDatabase, createTestUser } from './helpers/testUtils';
describe('User tests', () => {
let db;
beforeEach(() => {
({ db } = setupTestDatabase());
});
afterEach(() => {
db.close();
});
it('should create user', () => {
const user = createTestUser();
// ... test logic (cleaner!)
});
it('should update user', () => {
const user = createTestUser({ name: 'Jane' });
// ... test logic
});
});
DRY Improvement: Shared test utilities, less boilerplate
obra YAGNI/DRY Principles
YAGNI: You Aren't Gonna Need It
- Don't add functionality until necessary
- Avoid premature abstraction
- Wait for duplication before extracting
DRY: Don't Repeat Yourself
- Every piece of knowledge should have a single representation
- Avoid copy-paste programming
- Extract when you see duplication 2-3 times
When to Extract
- Three strikes rule: Duplicate 3 times → extract
- Clear pattern: Similar code structure appears
- High maintenance cost: Changes require updates in multiple places
- Business logic: Domain rules should be centralized
When NOT to Extract
- Premature abstraction: Only seen once or twice
- Coincidental duplication: Similar code, different concepts
- Temporary code: Prototypes, experiments
- Over-engineering: Abstraction more complex than duplication
DRYPATTERNS
echo "✓ DRY refactoring patterns guide created"
## Phase 5: Generate Duplication Report
I'll create a comprehensive report with prioritized refactoring opportunities:
```bash
echo ""
echo "=== Generating Duplication Report ==="
echo ""
# Count duplications
TOTAL_DUPLICATIONS=0
DUPLICATION_PERCENTAGE=0
if [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
TOTAL_DUPLICATIONS=$(jq '.statistics.total.duplications' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
DUPLICATION_PERCENTAGE=$(jq '.statistics.total.percentage' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
fi
cat > "$REPORT" << EOF
# Code Duplication Analysis Report
**Generated:** $(date)
**Minimum Lines Threshold:** $MIN_LINES lines
**Total Duplications Found:** $TOTAL_DUPLICATIONS
**Duplication Percentage:** $DUPLICATION_PERCENTAGE%
---
## Summary
Code duplication violates the DRY (Don't Repeat Yourself) principle and leads to:
- Higher maintenance costs (fix bugs in multiple places)
- Increased bug likelihood (miss updating one location)
- Larger codebase (more code to understand)
- Lower code quality (copy-paste programming)
**Duplication Guidelines (obra principles):**
- **0-5%:** Excellent - minimal duplication
- **5-10%:** Good - acceptable for large projects
- **10-20%:** Moderate - consider refactoring
- **20%+:** High - significant technical debt
---
## Duplication Statistics
EOF
if [ "$USE_JSCPD" = true ] && [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
cat >> "$REPORT" << EOF
### Overall Statistics
- Total Lines Analyzed: $(jq '.statistics.total.lines' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
- Duplicated Lines: $(jq '.statistics.total.clones' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
- Duplication Blocks: $TOTAL_DUPLICATIONS
- Duplication Percentage: $DUPLICATION_PERCENTAGE%
### Top Duplicated Files
EOF
jq -r '.statistics.formats[] |
"\(.format):\n Files: \(.sources)\n Duplications: \(.duplications)\n Percentage: \(.percentage)%\n"' \
"$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null >> "$REPORT"
cat >> "$REPORT" << EOF
### Largest Duplication Blocks
EOF
jq -r '.duplicates[:10] | .[] |
"**\(.lines) lines** duplicated in \(.format)\n- Location 1: \(.firstFile.name):\(.firstFile.start)\n- Location 2: \(.secondFile.name):\(.secondFile.start)\n"' \
"$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null >> "$REPORT"
cat >> "$REPORT" << EOF
**Full HTML Report:** Open \`$ANALYSIS_DIR/jscpd-report/html/index.html\` in browser
EOF
else
cat >> "$REPORT" << EOF
### Basic Analysis Results
EOF
if [ -f "$ANALYSIS_DIR/duplicate-functions.txt" ] && [ -s "$ANALYSIS_DIR/duplicate-functions.txt" ]; then
cat >> "$REPORT" << EOF
**Duplicate Function Signatures (JavaScript/TypeScript):**
\`\`\`
$(cat "$ANALYSIS_DIR/duplicate-functions.txt")
\`\`\`
EOF
fi
if [ -f "$ANALYSIS_DIR/duplicate-python-functions.txt" ] && [ -s "$ANALYSIS_DIR/duplicate-python-functions.txt" ]; then
cat >> "$REPORT" << EOF
**Duplicate Function Signatures (Python):**
\`\`\`
$(cat "$ANALYSIS_DIR/duplicate-python-functions.txt")
\`\`\`
EOF
fi
if [ -f "$ANALYSIS_DIR/magic-numbers.txt" ] && [ -s "$ANALYSIS_DIR/magic-numbers.txt" ]; then
cat >> "$REPORT" << EOF
**Repeated Numeric Literals:**
\`\`\`
$(cat "$ANALYSIS_DIR/magic-numbers.txt")
\`\`\`
Consider extracting to named constants.
EOF
fi
if [ -f "$ANALYSIS_DIR/repeated-strings.txt" ] && [ -s "$ANALYSIS_DIR/repeated-strings.txt" ]; then
cat >> "$REPORT" << EOF
**Repeated String Literals:**
\`\`\`
$(cat "$ANALYSIS_DIR/repeated-strings.txt")
\`\`\`
Consider extracting to configuration.
EOF
fi
fi
cat >> "$REPORT" << 'EOF'
---
## Recommended DRY Refactoring
### Priority 1: Extract Duplicated Functions
- Identify exact code duplicates (10+ lines)
- Extract to shared utility functions
- Consolidate in common modules
### Priority 2: Centralize Configuration
- Extract magic numbers to constants
- Create configuration files
- Use environment variables for deployment-specific values
### Priority 3: Eliminate Similar Patterns
- Identify similar code structures
- Extract common patterns
- Use higher-order functions or templates
### Priority 4: Consolidate Test Helpers
- Extract common test setup/teardown
- Create test utilities
- Share fixtures across test suites
**See detailed patterns:** `cat $ANALYSIS_DIR/dry-refactoring-patterns.md`
---
## Implementation Steps
1. **Create Git Checkpoint**
```bash
git add -A
git commit -m "Pre DRY-refactoring checkpoint" || echo "No changes"
-
Prioritize Duplications
- Start with largest duplication blocks
- Focus on frequently changed code
- Consider domain importance
-
Apply DRY Patterns
- Extract function (most common)
- Extract configuration
- Template method pattern
- Higher-order functions
- Composition
-
Three Strikes Rule
- Wait for 3 occurrences before extracting
- Avoid premature abstraction
- Balance DRY with YAGNI
-
Verify Improvements
- Re-run duplication analysis
- Ensure tests pass
- Check for reduced code size
-
Document Changes
- Update documentation
- Note refactoring decisions
- Share patterns with team
Continuous Monitoring
Add to CI/CD
Pre-commit Hook
# .git/hooks/pre-commit
#!/bin/bash
# Check duplication before commit
jscpd --threshold $MIN_LINES . || exit 1
GitHub Actions
name: Code Quality
on: [push, pull_request]
jobs:
duplication:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Check duplication
run: |
npm install -g jscpd
jscpd --threshold 6 .
Integration with Other Skills
/refactor- Systematic code restructuring/complexity-reduce- Reduce cyclomatic complexity/make-it-pretty- Improve code readability/review- Include duplication in code reviews/test- Ensure tests after refactoring
Resources
- obra/superpowers - YAGNI/DRY principles
- The Pragmatic Programmer - DRY Principle
- Refactoring - Martin Fowler
- JSCPD - Copy-Paste Detector
- Rule of Three (refactoring)
Report generated at: $(date)
Next Steps:
- Review high-duplication areas
- Select appropriate DRY pattern
- Implement refactoring incrementally
- Re-run analysis to verify improvement
EOF
echo "✓ Duplication report generated: $REPORT"
## Summary
```bash
echo ""
echo "=== ✓ Duplication Analysis Complete ==="
echo ""
echo "📊 Analysis Results:"
if [ "$USE_JSCPD" = true ]; then
echo " Total duplications: $TOTAL_DUPLICATIONS"
echo " Duplication percentage: $DUPLICATION_PERCENTAGE%"
echo " Threshold: $MIN_LINES lines"
else
echo " Basic analysis complete"
echo " Install jscpd for detailed analysis: npm install -g jscpd"
fi
echo ""
echo "📁 Generated Files:"
echo " - Duplication Report: $REPORT"
echo " - DRY Patterns: $ANALYSIS_DIR/dry-refactoring-patterns.md"
[ -f "$ANALYSIS_DIR/jscpd-report/html/index.html" ] && echo " - HTML Report: $ANALYSIS_DIR/jscpd-report/html/index.html"
echo ""
echo "🎯 Recommended Actions:"
if [ "$DUPLICATION_PERCENTAGE" != "0" ] && [ "${DUPLICATION_PERCENTAGE%.*}" -gt 10 ]; then
echo " ⚠️ High duplication detected ($DUPLICATION_PERCENTAGE%)"
echo " 1. Review largest duplication blocks"
echo " 2. Extract duplicated functions"
echo " 3. Centralize configuration"
echo " 4. Run tests after refactoring"
elif [ "$DUPLICATION_PERCENTAGE" != "0" ] && [ "${DUPLICATION_PERCENTAGE%.*}" -gt 5 ]; then
echo " Moderate duplication ($DUPLICATION_PERCENTAGE%)"
echo " Consider refactoring during feature work"
else
echo " ✓ Low duplication - excellent!"
echo " Continue monitoring in code reviews"
fi
echo ""
echo "💡 DRY Refactoring Patterns:"
echo " 1. Extract Function (consolidate duplicates)"
echo " 2. Extract Configuration (centralize constants)"
echo " 3. Template Method (share algorithm structure)"
echo " 4. Higher-Order Functions (parameterize behavior)"
echo " 5. Composition (build from smaller pieces)"
echo ""
echo "📏 obra YAGNI/DRY Guidelines:"
echo " - Three strikes rule: Extract after 3 duplications"
echo " - Avoid premature abstraction"
echo " - Balance DRY with readability"
echo ""
echo "🔗 Integration Points:"
echo " - /refactor - Systematic restructuring"
echo " - /complexity-reduce - Reduce complexity"
echo " - /review - Include in code reviews"
echo ""
echo "View report: cat $REPORT"
echo "View patterns: cat $ANALYSIS_DIR/dry-refactoring-patterns.md"
[ -f "$ANALYSIS_DIR/jscpd-report/html/index.html" ] && echo "View HTML: open $ANALYSIS_DIR/jscpd-report/html/index.html"
Safety Guarantees
What I'll NEVER do:
- Automatically refactor without analysis
- Extract after single occurrence (violates YAGNI)
- Remove code without understanding context
- Skip testing after refactoring
- Add AI attribution to commits
What I WILL do:
- Identify genuine code duplication
- Suggest proven DRY patterns
- Follow obra YAGNI principles
- Recommend incremental refactoring
- Preserve functionality
- Provide clear examples
Credits
This skill is based on:
- obra/superpowers - YAGNI and DRY principles
- The Pragmatic Programmer - DRY principle origin
- Martin Fowler - Refactoring patterns
- JSCPD - Multi-language duplication detection
- Rule of Three - When to extract abstraction
Token Budget
Target: 2,000-3,500 tokens per execution
- Phase 1-2: ~600 tokens (setup + tool installation)
- Phase 3: ~1,000 tokens (duplication detection)
- Phase 4-5: ~1,500 tokens (patterns + reporting)
Optimization Strategy:
- Use jscpd for efficient detection
- Fallback to grep-based analysis
- Template-based pattern generation
- Focus on actionable duplications
- Clear DRY refactoring guidance
This ensures thorough duplication detection with practical DRY refactoring strategies based on obra principles while respecting token limits.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?