Agent skills
duplication-detect

Agent skill

duplication-detect

Find and eliminate code duplication with DRY refactoring strategies

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/duplication-detect

SKILL.md

Code Duplication Detection & DRY Refactoring

I'll analyze your codebase for duplicate code blocks, identify similar patterns across files, and suggest DRY (Don't Repeat Yourself) refactoring strategies based on obra principles.

Detection Capabilities:

Exact code duplication (copy-paste detection)
Similar code patterns (semantic duplication)
Repeated logic across files
Duplicated constants and magic numbers
Repeated test patterns

Supported Languages:

JavaScript/TypeScript
Python
Go
Java
Generic text-based detection

Token Optimization

This skill uses duplication detection-specific patterns to minimize token usage:

1. Source Directory Detection Caching (500 token savings)

Pattern: Cache project structure and source directories

Store structure in .duplication-structure-cache (1 hour TTL)
Cache: source directories, file extensions, excluded paths
Read cached structure on subsequent runs (50 tokens vs 550 tokens fresh)
Invalidate on directory structure changes
Savings: 91% on repeat duplication checks

2. Bash-Based Duplication Tool Execution (1,800 token savings)

Pattern: Use jscpd/PMD directly via bash

JavaScript: jscpd --format json (300 tokens)
Python: pylint --duplicate-code (300 tokens)
Generic: simian or custom grep-based detection (400 tokens)
Parse JSON output with jq
No Task agents for duplication detection
Savings: 90% vs Task-based duplication analysis

3. Sample-Based Duplication Reporting (900 token savings)

Pattern: Report first 10 duplication instances only

Show top 10 duplications by severity (600 tokens)
Count remaining duplications without details
Full report via --all flag
Savings: 65% vs reporting every duplication

4. Template-Based DRY Refactoring Recommendations (800 token savings)

Pattern: Use predefined DRY patterns

Standard strategies: extract function, extract constant, inheritance, composition
Pattern-based recommendations for duplication types
No creative refactoring generation
Savings: 80% vs LLM-generated DRY strategies

5. Incremental Duplication Checks (1,000 token savings)

Pattern: Check only changed files via git diff

Analyze files modified since last commit (500 tokens)
Check if changes introduce new duplication
Full codebase analysis via --full flag
Savings: 75% vs full codebase duplication detection

6. Grep-Based Similar Pattern Discovery (700 token savings)

Pattern: Find potential duplications with grep

Grep for repeated patterns: function signatures, constant values (300 tokens)
Flag files with high similarity
Run full tool only on flagged file pairs
Savings: 70% vs running tool on all file combinations

7. Cached Duplication Baseline (600 token savings)

Pattern: Store baseline duplication report

Cache initial duplication report
Compare new runs against baseline to detect new duplications
Focus on newly introduced duplications
Savings: 80% by focusing on deltas

8. Threshold-Based Filtering (400 token savings)

Pattern: Filter out small duplications

Default: 6+ lines of duplication
Skip trivial duplications (imports, boilerplate)
Adjustable threshold
Savings: 70% by filtering noise

Real-World Token Usage Distribution

Typical operation patterns:

Check recent changes (git diff scope): 1,200 tokens
Show top 10 duplications: 1,400 tokens
Full codebase analysis (first time): 3,000 tokens
Cached baseline comparison: 800 tokens
Refactoring recommendations (top 5): 1,800 tokens
Most common: Incremental checks on recent changes

Expected per-analysis: 1,500-2,500 tokens (60% reduction from 3,500-6,000 baseline) Real-world average: 1,100 tokens (due to incremental checks, sample-based reporting, cached baseline)

Arguments: $ARGUMENTS - optional: minimum duplication threshold (default: 6 lines) or specific directories

DRY refactoring strategies:

Extract function/method (most common)
Extract constant/configuration
Inheritance or composition
Higher-order functions
Template method pattern
Strategy pattern for variations

Phase 1: Tool Setup & Configuration

First, I'll set up duplication detection tools:

bash

#!/bin/bash
# Duplication Detection - Setup & Configuration

echo "=== Code Duplication Detection ==="
echo ""

# Create analysis directory
mkdir -p .claude/duplication-analysis
ANALYSIS_DIR=".claude/duplication-analysis"
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
REPORT="$ANALYSIS_DIR/duplication-report-$TIMESTAMP.md"
MIN_LINES="${1:-6}"  # Minimum lines for duplication detection

echo "Configuration:"
echo "  Minimum duplication threshold: $MIN_LINES lines"
echo "  Analysis directory: $ANALYSIS_DIR"
echo ""

# Detect project structure
echo "Detecting project structure..."
SOURCE_DIRS=""

# Common source directories
for dir in src lib app components pages services utils helpers; do
    if [ -d "$dir" ]; then
        SOURCE_DIRS="$SOURCE_DIRS $dir"
        echo "  ✓ Found: $dir/"
    fi
done

# Python-specific
for dir in tests test __tests__; do
    if [ -d "$dir" ]; then
        echo "  ✓ Found test directory: $dir/"
    fi
done

if [ -z "$SOURCE_DIRS" ]; then
    echo "  Using current directory"
    SOURCE_DIRS="."
fi

echo ""

Phase 2: Install Detection Tools

I'll install and configure duplication detection tools:

bash

echo "=== Installing Duplication Detection Tools ==="
echo ""

install_jscpd() {
    # JSCPD - Multi-language copy-paste detector
    if ! command -v jscpd >/dev/null 2>&1 && ! npm list -g jscpd >/dev/null 2>&1; then
        echo "Installing jscpd (copy-paste detector)..."
        npm install -g jscpd 2>/dev/null || npm install --save-dev jscpd

        if [ $? -eq 0 ]; then
            echo "✓ jscpd installed"
        else
            echo "⚠️  Failed to install jscpd - using basic detection"
            return 1
        fi
    else
        echo "✓ jscpd already installed"
    fi

    # Create jscpd configuration
    cat > "$ANALYSIS_DIR/.jscpd.json" << JSCPDCONFIG
{
    "threshold": $MIN_LINES,
    "reporters": ["json", "html"],
    "ignore": [
        "**/*.min.js",
        "**/node_modules/**",
        "**/dist/**",
        "**/build/**",
        "**/.git/**",
        "**/coverage/**",
        "**/__pycache__/**",
        "**/venv/**",
        "**/.venv/**"
    ],
    "format": ["javascript", "typescript", "python", "go", "java"],
    "absolute": false,
    "output": "$ANALYSIS_DIR/jscpd-report"
}
JSCPDCONFIG

    echo "✓ jscpd configuration created"
    return 0
}

# Install tool
if install_jscpd; then
    USE_JSCPD=true
else
    USE_JSCPD=false
    echo "ℹ️  Will use basic grep-based detection"
fi

echo ""

Phase 3: Detect Code Duplication

I'll analyze the codebase for duplicated code:

bash

echo "=== Analyzing Code Duplication ==="
echo ""

run_jscpd_analysis() {
    echo "Running jscpd analysis..."
    echo ""

    jscpd $SOURCE_DIRS \
        --config "$ANALYSIS_DIR/.jscpd.json" \
        2>&1 | tee "$ANALYSIS_DIR/jscpd-log.txt"

    if [ $? -eq 0 ]; then
        echo ""
        echo "✓ jscpd analysis complete"

        # Check if duplications found
        if [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
            DUPLICATIONS=$(jq '.statistics.total.duplications' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
            PERCENTAGE=$(jq '.statistics.total.percentage' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)

            echo ""
            echo "Duplication Statistics:"
            echo "  Total duplications: $DUPLICATIONS"
            echo "  Duplication percentage: $PERCENTAGE%"
            echo ""

            # Extract top duplications
            echo "Top Duplicated Blocks:"
            jq -r '.duplicates[] |
                "\(.format) - \(.lines) lines duplicated in \(.fragment) (\(.firstFile.name):\(.firstFile.start) and \(.secondFile.name):\(.secondFile.start))"' \
                "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null | head -10
        fi
    else
        echo "⚠️  jscpd analysis failed - see $ANALYSIS_DIR/jscpd-log.txt"
        return 1
    fi
}

run_basic_duplication_detection() {
    echo "Running basic duplication detection..."
    echo ""

    # Find similar function signatures
    echo "Detecting similar function signatures..."

    # JavaScript/TypeScript functions
    if find $SOURCE_DIRS -name "*.js" -o -name "*.jsx" -o -name "*.ts" -o -name "*.tsx" 2>/dev/null | grep -q .; then
        grep -rh "^function\|^const.*=.*=>.*{$\|^export function" \
            --include="*.js" --include="*.jsx" --include="*.ts" --include="*.tsx" \
            $SOURCE_DIRS 2>/dev/null | \
            sort | uniq -d | head -10 > "$ANALYSIS_DIR/duplicate-functions.txt"

        if [ -s "$ANALYSIS_DIR/duplicate-functions.txt" ]; then
            echo "  Found duplicate function signatures (JavaScript/TypeScript):"
            cat "$ANALYSIS_DIR/duplicate-functions.txt" | sed 's/^/    /'
            echo ""
        fi
    fi

    # Python functions
    if find $SOURCE_DIRS -name "*.py" 2>/dev/null | grep -q .; then
        grep -rh "^def \|^async def " \
            --include="*.py" \
            $SOURCE_DIRS 2>/dev/null | \
            sort | uniq -d | head -10 > "$ANALYSIS_DIR/duplicate-python-functions.txt"

        if [ -s "$ANALYSIS_DIR/duplicate-python-functions.txt" ]; then
            echo "  Found duplicate function signatures (Python):"
            cat "$ANALYSIS_DIR/duplicate-python-functions.txt" | sed 's/^/    /'
            echo ""
        fi
    fi

    # Find magic numbers (potential constants)
    echo "Detecting magic numbers (repeated literals)..."

    find $SOURCE_DIRS -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" \) 2>/dev/null | \
        xargs grep -oh "[0-9]\{2,\}" 2>/dev/null | \
        sort | uniq -c | sort -rn | head -10 > "$ANALYSIS_DIR/magic-numbers.txt"

    if [ -s "$ANALYSIS_DIR/magic-numbers.txt" ]; then
        echo "  Most common numeric literals:"
        cat "$ANALYSIS_DIR/magic-numbers.txt" | sed 's/^/    /'
        echo ""
    fi

    # Find repeated strings
    echo "Detecting repeated string literals..."

    find $SOURCE_DIRS -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" \) 2>/dev/null | \
        xargs grep -oh "\"[^\"]\{10,\}\"" 2>/dev/null | \
        sort | uniq -c | sort -rn | head -10 > "$ANALYSIS_DIR/repeated-strings.txt"

    if [ -s "$ANALYSIS_DIR/repeated-strings.txt" ]; then
        echo "  Most common string literals:"
        cat "$ANALYSIS_DIR/repeated-strings.txt" | sed 's/^/    /'
        echo ""
    fi
}

# Run appropriate analysis
if [ "$USE_JSCPD" = true ]; then
    if ! run_jscpd_analysis; then
        echo "Falling back to basic detection..."
        run_basic_duplication_detection
    fi
else
    run_basic_duplication_detection
fi

Phase 4: Generate DRY Refactoring Strategies

I'll provide specific refactoring patterns for eliminating duplication:

bash

echo ""
echo "=== Generating DRY Refactoring Strategies ==="
echo ""

cat > "$ANALYSIS_DIR/dry-refactoring-patterns.md" << 'DRYPATTERNS'
# DRY Refactoring Patterns

Based on obra YAGNI/DRY principles: Don't Repeat Yourself

---

## Pattern 1: Extract Function

**Problem:** Same code block repeated in multiple places

**Solution:** Extract to a reusable function

### Before (JavaScript)
```javascript
// File 1
function processOrderA(order) {
    if (!order.customer || !order.customer.email) {
        throw new Error('Invalid customer');
    }
    if (!order.items || order.items.length === 0) {
        throw new Error('Empty order');
    }
    // ... process order
}

// File 2
function processOrderB(order) {
    if (!order.customer || !order.customer.email) {
        throw new Error('Invalid customer');
    }
    if (!order.items || order.items.length === 0) {
        throw new Error('Empty order');
    }
    // ... different processing
}

After

javascript

// utils/validation.js
export function validateOrder(order) {
    if (!order.customer?.email) {
        throw new Error('Invalid customer');
    }
    if (!order.items?.length) {
        throw new Error('Empty order');
    }
}

// File 1
import { validateOrder } from './utils/validation';

function processOrderA(order) {
    validateOrder(order);
    // ... process order
}

// File 2
import { validateOrder } from './utils/validation';

function processOrderB(order) {
    validateOrder(order);
    // ... different processing
}

DRY Improvement: 8 duplicated lines → 1 function call

Pattern 2: Extract Configuration/Constants

Problem: Same literals repeated across codebase

Solution: Centralize in configuration

Before (Python)

python

# Multiple files with repeated values
def calculate_shipping():
    if weight < 10:
        return 5.99
    return 9.99

def check_free_shipping(total):
    return total >= 50.00

def apply_discount():
    if quantity >= 5:
        return 0.10
    return 0

After

python

# config/constants.py
class ShippingConfig:
    FREE_SHIPPING_THRESHOLD = 50.00
    LIGHT_PACKAGE_WEIGHT = 10
    LIGHT_PACKAGE_COST = 5.99
    HEAVY_PACKAGE_COST = 9.99

class DiscountConfig:
    BULK_QUANTITY = 5
    BULK_DISCOUNT = 0.10

# services/shipping.py
from config.constants import ShippingConfig, DiscountConfig

def calculate_shipping(weight):
    if weight < ShippingConfig.LIGHT_PACKAGE_WEIGHT:
        return ShippingConfig.LIGHT_PACKAGE_COST
    return ShippingConfig.HEAVY_PACKAGE_COST

def check_free_shipping(total):
    return total >= ShippingConfig.FREE_SHIPPING_THRESHOLD

def apply_discount(quantity):
    if quantity >= DiscountConfig.BULK_QUANTITY:
        return DiscountConfig.BULK_DISCOUNT
    return 0

DRY Improvement: Magic numbers eliminated, single source of truth

Pattern 3: Template Method Pattern

Problem: Similar algorithms with slight variations

Solution: Use template method or strategy pattern

Before (TypeScript)

typescript

class PDFReport {
    generate() {
        this.loadData();
        this.formatHeader();
        this.formatPDFContent();
        this.addPDFFooter();
        this.savePDF();
    }

    private loadData() { /* same logic */ }
    private formatHeader() { /* same logic */ }
    private formatPDFContent() { /* PDF-specific */ }
    private addPDFFooter() { /* PDF-specific */ }
    private savePDF() { /* PDF-specific */ }
}

class ExcelReport {
    generate() {
        this.loadData();
        this.formatHeader();
        this.formatExcelContent();
        this.addExcelFooter();
        this.saveExcel();
    }

    private loadData() { /* DUPLICATE logic */ }
    private formatHeader() { /* DUPLICATE logic */ }
    private formatExcelContent() { /* Excel-specific */ }
    private addExcelFooter() { /* Excel-specific */ }
    private saveExcel() { /* Excel-specific */ }
}

After

typescript

abstract class Report {
    // Template method
    generate() {
        this.loadData();
        this.formatHeader();
        this.formatContent();
        this.addFooter();
        this.save();
    }

    // Common implementations
    protected loadData() {
        // Shared logic
    }

    protected formatHeader() {
        // Shared logic
    }

    // Abstract methods for subclasses
    protected abstract formatContent(): void;
    protected abstract addFooter(): void;
    protected abstract save(): void;
}

class PDFReport extends Report {
    protected formatContent() { /* PDF-specific */ }
    protected addFooter() { /* PDF-specific */ }
    protected save() { /* PDF-specific */ }
}

class ExcelReport extends Report {
    protected formatContent() { /* Excel-specific */ }
    protected addFooter() { /* Excel-specific */ }
    protected save() { /* Excel-specific */ }
}

DRY Improvement: Shared logic in base class, variations in subclasses

Pattern 4: Higher-Order Functions

Problem: Similar operations with different behaviors

Solution: Use higher-order functions or callbacks

Before (JavaScript)

javascript

function processUsersForEmail(users) {
    const results = [];
    for (const user of users) {
        if (user.email && user.active) {
            results.push(user);
        }
    }
    return results;
}

function processUsersForSMS(users) {
    const results = [];
    for (const user of users) {
        if (user.phone && user.active) {
            results.push(user);
        }
    }
    return results;
}

function processUsersForPush(users) {
    const results = [];
    for (const user of users) {
        if (user.deviceToken && user.active) {
            results.push(user);
        }
    }
    return results;
}

After

javascript

function processUsers(users, channel) {
    const validators = {
        email: user => user.email && user.active,
        sms: user => user.phone && user.active,
        push: user => user.deviceToken && user.active
    };

    return users.filter(validators[channel]);
}

// Or more flexible with custom validator
function processUsers(users, isValid) {
    return users.filter(isValid);
}

// Usage
const emailUsers = processUsers(users, user => user.email && user.active);
const smsUsers = processUsers(users, user => user.phone && user.active);
const pushUsers = processUsers(users, user => user.deviceToken && user.active);

DRY Improvement: 3 functions → 1 configurable function

Pattern 5: Composition Over Duplication

Problem: Repeated utility combinations

Solution: Compose smaller utilities

Before (Go)

// Repeated validation patterns
func ValidateUser(user User) error {
    if user.Email == "" {
        return errors.New("email required")
    }
    if !strings.Contains(user.Email, "@") {
        return errors.New("invalid email")
    }
    if user.Age < 18 {
        return errors.New("must be 18+")
    }
    return nil
}

func ValidateAdmin(admin Admin) error {
    if admin.Email == "" {
        return errors.New("email required")
    }
    if !strings.Contains(admin.Email, "@") {
        return errors.New("invalid email")
    }
    if admin.Age < 18 {
        return errors.New("must be 18+")
    }
    if admin.Role == "" {
        return errors.New("role required")
    }
    return nil
}

After

// Composable validators
func requireEmail(email string) error {
    if email == "" {
        return errors.New("email required")
    }
    return nil
}

func validateEmailFormat(email string) error {
    if !strings.Contains(email, "@") {
        return errors.New("invalid email")
    }
    return nil
}

func requireAdult(age int) error {
    if age < 18 {
        return errors.New("must be 18+")
    }
    return nil
}

func requireRole(role string) error {
    if role == "" {
        return errors.New("role required")
    }
    return nil
}

// Compose validators
func ValidateUser(user User) error {
    validators := []func() error{
        func() error { return requireEmail(user.Email) },
        func() error { return validateEmailFormat(user.Email) },
        func() error { return requireAdult(user.Age) },
    }
    return runValidators(validators)
}

func ValidateAdmin(admin Admin) error {
    validators := []func() error{
        func() error { return requireEmail(admin.Email) },
        func() error { return validateEmailFormat(admin.Email) },
        func() error { return requireAdult(admin.Age) },
        func() error { return requireRole(admin.Role) },
    }
    return runValidators(validators)
}

func runValidators(validators []func() error) error {
    for _, validate := range validators {
        if err := validate(); err != nil {
            return err
        }
    }
    return nil
}

DRY Improvement: Reusable validators, composable validation

Pattern 6: Extract Test Helpers

Problem: Repeated test setup/teardown

Solution: Create test utilities

Before

javascript

// test/user.test.js
describe('User tests', () => {
    it('should create user', () => {
        const db = createDatabase();
        const user = { name: 'John', email: 'john@example.com' };
        // ... test logic
        db.close();
    });

    it('should update user', () => {
        const db = createDatabase();
        const user = { name: 'John', email: 'john@example.com' };
        // ... test logic
        db.close();
    });
});

// test/order.test.js
describe('Order tests', () => {
    it('should create order', () => {
        const db = createDatabase();
        const user = { name: 'John', email: 'john@example.com' };
        // ... test logic
        db.close();
    });
});

After

javascript

// test/helpers/testUtils.js
export function setupTestDatabase() {
    const db = createDatabase();
    return {
        db,
        cleanup: () => db.close()
    };
}

export function createTestUser(overrides = {}) {
    return {
        name: 'John',
        email: 'john@example.com',
        ...overrides
    };
}

// test/user.test.js
import { setupTestDatabase, createTestUser } from './helpers/testUtils';

describe('User tests', () => {
    let db;

    beforeEach(() => {
        ({ db } = setupTestDatabase());
    });

    afterEach(() => {
        db.close();
    });

    it('should create user', () => {
        const user = createTestUser();
        // ... test logic (cleaner!)
    });

    it('should update user', () => {
        const user = createTestUser({ name: 'Jane' });
        // ... test logic
    });
});

DRY Improvement: Shared test utilities, less boilerplate

obra YAGNI/DRY Principles

YAGNI: You Aren't Gonna Need It

Don't add functionality until necessary
Avoid premature abstraction
Wait for duplication before extracting

DRY: Don't Repeat Yourself

Every piece of knowledge should have a single representation
Avoid copy-paste programming
Extract when you see duplication 2-3 times

When to Extract

Three strikes rule: Duplicate 3 times → extract
Clear pattern: Similar code structure appears
High maintenance cost: Changes require updates in multiple places
Business logic: Domain rules should be centralized

When NOT to Extract

Premature abstraction: Only seen once or twice
Coincidental duplication: Similar code, different concepts
Temporary code: Prototypes, experiments
Over-engineering: Abstraction more complex than duplication

DRYPATTERNS

echo "✓ DRY refactoring patterns guide created"


## Phase 5: Generate Duplication Report

I'll create a comprehensive report with prioritized refactoring opportunities:

```bash
echo ""
echo "=== Generating Duplication Report ==="
echo ""

# Count duplications
TOTAL_DUPLICATIONS=0
DUPLICATION_PERCENTAGE=0

if [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
    TOTAL_DUPLICATIONS=$(jq '.statistics.total.duplications' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
    DUPLICATION_PERCENTAGE=$(jq '.statistics.total.percentage' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
fi

cat > "$REPORT" << EOF
# Code Duplication Analysis Report

**Generated:** $(date)
**Minimum Lines Threshold:** $MIN_LINES lines
**Total Duplications Found:** $TOTAL_DUPLICATIONS
**Duplication Percentage:** $DUPLICATION_PERCENTAGE%

---

## Summary

Code duplication violates the DRY (Don't Repeat Yourself) principle and leads to:
- Higher maintenance costs (fix bugs in multiple places)
- Increased bug likelihood (miss updating one location)
- Larger codebase (more code to understand)
- Lower code quality (copy-paste programming)

**Duplication Guidelines (obra principles):**
- **0-5%:** Excellent - minimal duplication
- **5-10%:** Good - acceptable for large projects
- **10-20%:** Moderate - consider refactoring
- **20%+:** High - significant technical debt

---

## Duplication Statistics

EOF

if [ "$USE_JSCPD" = true ] && [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
    cat >> "$REPORT" << EOF
### Overall Statistics
- Total Lines Analyzed: $(jq '.statistics.total.lines' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
- Duplicated Lines: $(jq '.statistics.total.clones' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
- Duplication Blocks: $TOTAL_DUPLICATIONS
- Duplication Percentage: $DUPLICATION_PERCENTAGE%

### Top Duplicated Files

EOF
    jq -r '.statistics.formats[] |
        "\(.format):\n  Files: \(.sources)\n  Duplications: \(.duplications)\n  Percentage: \(.percentage)%\n"' \
        "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null >> "$REPORT"

    cat >> "$REPORT" << EOF

### Largest Duplication Blocks

EOF
    jq -r '.duplicates[:10] | .[] |
        "**\(.lines) lines** duplicated in \(.format)\n- Location 1: \(.firstFile.name):\(.firstFile.start)\n- Location 2: \(.secondFile.name):\(.secondFile.start)\n"' \
        "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null >> "$REPORT"

    cat >> "$REPORT" << EOF

**Full HTML Report:** Open \`$ANALYSIS_DIR/jscpd-report/html/index.html\` in browser

EOF
else
    cat >> "$REPORT" << EOF
### Basic Analysis Results

EOF
    if [ -f "$ANALYSIS_DIR/duplicate-functions.txt" ] && [ -s "$ANALYSIS_DIR/duplicate-functions.txt" ]; then
        cat >> "$REPORT" << EOF
**Duplicate Function Signatures (JavaScript/TypeScript):**
\`\`\`
$(cat "$ANALYSIS_DIR/duplicate-functions.txt")
\`\`\`

EOF
    fi

    if [ -f "$ANALYSIS_DIR/duplicate-python-functions.txt" ] && [ -s "$ANALYSIS_DIR/duplicate-python-functions.txt" ]; then
        cat >> "$REPORT" << EOF
**Duplicate Function Signatures (Python):**
\`\`\`
$(cat "$ANALYSIS_DIR/duplicate-python-functions.txt")
\`\`\`

EOF
    fi

    if [ -f "$ANALYSIS_DIR/magic-numbers.txt" ] && [ -s "$ANALYSIS_DIR/magic-numbers.txt" ]; then
        cat >> "$REPORT" << EOF
**Repeated Numeric Literals:**
\`\`\`
$(cat "$ANALYSIS_DIR/magic-numbers.txt")
\`\`\`
Consider extracting to named constants.

EOF
    fi

    if [ -f "$ANALYSIS_DIR/repeated-strings.txt" ] && [ -s "$ANALYSIS_DIR/repeated-strings.txt" ]; then
        cat >> "$REPORT" << EOF
**Repeated String Literals:**
\`\`\`
$(cat "$ANALYSIS_DIR/repeated-strings.txt")
\`\`\`
Consider extracting to configuration.

EOF
    fi
fi

cat >> "$REPORT" << 'EOF'
---

## Recommended DRY Refactoring

### Priority 1: Extract Duplicated Functions
- Identify exact code duplicates (10+ lines)
- Extract to shared utility functions
- Consolidate in common modules

### Priority 2: Centralize Configuration
- Extract magic numbers to constants
- Create configuration files
- Use environment variables for deployment-specific values

### Priority 3: Eliminate Similar Patterns
- Identify similar code structures
- Extract common patterns
- Use higher-order functions or templates

### Priority 4: Consolidate Test Helpers
- Extract common test setup/teardown
- Create test utilities
- Share fixtures across test suites

**See detailed patterns:** `cat $ANALYSIS_DIR/dry-refactoring-patterns.md`

---

## Implementation Steps

1. **Create Git Checkpoint**
   ```bash
   git add -A
   git commit -m "Pre DRY-refactoring checkpoint" || echo "No changes"

Prioritize Duplications
- Start with largest duplication blocks
- Focus on frequently changed code
- Consider domain importance
Apply DRY Patterns
- Extract function (most common)
- Extract configuration
- Template method pattern
- Higher-order functions
- Composition
Three Strikes Rule
- Wait for 3 occurrences before extracting
- Avoid premature abstraction
- Balance DRY with YAGNI
Verify Improvements
- Re-run duplication analysis
- Ensure tests pass
- Check for reduced code size
Document Changes
- Update documentation
- Note refactoring decisions
- Share patterns with team

Continuous Monitoring

Add to CI/CD

Pre-commit Hook

bash

# .git/hooks/pre-commit
#!/bin/bash
# Check duplication before commit
jscpd --threshold $MIN_LINES . || exit 1

GitHub Actions

yaml

name: Code Quality
on: [push, pull_request]
jobs:
  duplication:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Check duplication
        run: |
          npm install -g jscpd
          jscpd --threshold 6 .

Integration with Other Skills

/refactor - Systematic code restructuring
/complexity-reduce - Reduce cyclomatic complexity
/make-it-pretty - Improve code readability
/review - Include duplication in code reviews
/test - Ensure tests after refactoring

Resources

Report generated at: $(date)

Next Steps:

Review high-duplication areas
Select appropriate DRY pattern
Implement refactoring incrementally
Re-run analysis to verify improvement

EOF

echo "✓ Duplication report generated: $REPORT"


## Summary

```bash
echo ""
echo "=== ✓ Duplication Analysis Complete ==="
echo ""
echo "📊 Analysis Results:"
if [ "$USE_JSCPD" = true ]; then
    echo "  Total duplications: $TOTAL_DUPLICATIONS"
    echo "  Duplication percentage: $DUPLICATION_PERCENTAGE%"
    echo "  Threshold: $MIN_LINES lines"
else
    echo "  Basic analysis complete"
    echo "  Install jscpd for detailed analysis: npm install -g jscpd"
fi
echo ""
echo "📁 Generated Files:"
echo "  - Duplication Report: $REPORT"
echo "  - DRY Patterns: $ANALYSIS_DIR/dry-refactoring-patterns.md"
[ -f "$ANALYSIS_DIR/jscpd-report/html/index.html" ] && echo "  - HTML Report: $ANALYSIS_DIR/jscpd-report/html/index.html"
echo ""
echo "🎯 Recommended Actions:"
if [ "$DUPLICATION_PERCENTAGE" != "0" ] && [ "${DUPLICATION_PERCENTAGE%.*}" -gt 10 ]; then
    echo "  ⚠️  High duplication detected ($DUPLICATION_PERCENTAGE%)"
    echo "  1. Review largest duplication blocks"
    echo "  2. Extract duplicated functions"
    echo "  3. Centralize configuration"
    echo "  4. Run tests after refactoring"
elif [ "$DUPLICATION_PERCENTAGE" != "0" ] && [ "${DUPLICATION_PERCENTAGE%.*}" -gt 5 ]; then
    echo "  Moderate duplication ($DUPLICATION_PERCENTAGE%)"
    echo "  Consider refactoring during feature work"
else
    echo "  ✓ Low duplication - excellent!"
    echo "  Continue monitoring in code reviews"
fi
echo ""
echo "💡 DRY Refactoring Patterns:"
echo "  1. Extract Function (consolidate duplicates)"
echo "  2. Extract Configuration (centralize constants)"
echo "  3. Template Method (share algorithm structure)"
echo "  4. Higher-Order Functions (parameterize behavior)"
echo "  5. Composition (build from smaller pieces)"
echo ""
echo "📏 obra YAGNI/DRY Guidelines:"
echo "  - Three strikes rule: Extract after 3 duplications"
echo "  - Avoid premature abstraction"
echo "  - Balance DRY with readability"
echo ""
echo "🔗 Integration Points:"
echo "  - /refactor - Systematic restructuring"
echo "  - /complexity-reduce - Reduce complexity"
echo "  - /review - Include in code reviews"
echo ""
echo "View report: cat $REPORT"
echo "View patterns: cat $ANALYSIS_DIR/dry-refactoring-patterns.md"
[ -f "$ANALYSIS_DIR/jscpd-report/html/index.html" ] && echo "View HTML: open $ANALYSIS_DIR/jscpd-report/html/index.html"

Safety Guarantees

What I'll NEVER do:

Automatically refactor without analysis
Extract after single occurrence (violates YAGNI)
Remove code without understanding context
Skip testing after refactoring
Add AI attribution to commits

What I WILL do:

Identify genuine code duplication
Suggest proven DRY patterns
Follow obra YAGNI principles
Recommend incremental refactoring
Preserve functionality
Provide clear examples

Credits

This skill is based on:

obra/superpowers - YAGNI and DRY principles
The Pragmatic Programmer - DRY principle origin
Martin Fowler - Refactoring patterns
JSCPD - Multi-language duplication detection
Rule of Three - When to extract abstraction

Token Budget

Target: 2,000-3,500 tokens per execution

Phase 1-2: ~600 tokens (setup + tool installation)
Phase 3: ~1,000 tokens (duplication detection)
Phase 4-5: ~1,500 tokens (patterns + reporting)

Optimization Strategy:

Use jscpd for efficient detection
Fallback to grep-based analysis
Template-based pattern generation
Focus on actionable duplications
Clear DRY refactoring guidance

This ensures thorough duplication detection with practical DRY refactoring strategies based on obra principles while respecting token limits.

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/duplication-detect
License: MIT License

Featured Tools

Join Our Newsletter

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Code Duplication Detection & DRY Refactoring

Token Optimization

1. Source Directory Detection Caching (500 token savings)

2. Bash-Based Duplication Tool Execution (1,800 token savings)

3. Sample-Based Duplication Reporting (900 token savings)

4. Template-Based DRY Refactoring Recommendations (800 token savings)

5. Incremental Duplication Checks (1,000 token savings)

6. Grep-Based Similar Pattern Discovery (700 token savings)

7. Cached Duplication Baseline (600 token savings)

8. Threshold-Based Filtering (400 token savings)

Real-World Token Usage Distribution

Phase 1: Tool Setup & Configuration

Phase 2: Install Detection Tools

Phase 3: Detect Code Duplication

Phase 4: Generate DRY Refactoring Strategies

After

Pattern 2: Extract Configuration/Constants

Before (Python)

After

Pattern 3: Template Method Pattern

Before (TypeScript)

After

Pattern 4: Higher-Order Functions

Before (JavaScript)

After

Pattern 5: Composition Over Duplication

Before (Go)

After

Pattern 6: Extract Test Helpers

Before

After

obra YAGNI/DRY Principles

YAGNI: You Aren't Gonna Need It

DRY: Don't Repeat Yourself

When to Extract

When NOT to Extract

Continuous Monitoring

Add to CI/CD

Pre-commit Hook

GitHub Actions

Integration with Other Skills

Resources

Safety Guarantees

Credits

Token Budget

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state