Agent skills
resolve-duplicates

Agent skill

resolve-duplicates

Guidelines for identifying duplicate dictionary entries, selecting which to keep, and safely removing unneeded ones.

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/resolve-duplicates

SKILL.md

Resolving Duplicate Entries

Use this skill when duplicate entries are detected during validation, or when you suspect multiple entries exist for the same word.

What Counts as a Duplicate?

Duplicates are entries with the same reading AND same headword. The validation script (build/validate.py) checks for this:

python

key = (reading, headword)  # Duplicates share both values

NOT duplicates (these are valid separate entries):

Same reading, different headword (homophones): {橋|はし} vs {箸|はし} vs {端|はし}
Same headword written differently: {行く|いく} vs {行く|ゆく} (different readings)
Related but distinct words: {見る|みる} vs {見える|みえる}

Step 1: Identify Duplicates

Run validation to find duplicates:

bash

python3 build/validate.py 2>&1 | grep -i "duplicate"

Or search manually:

bash

# Find entries with the same reading
grep -r '"reading": "たべる"' entries/

# Find entries with similar headwords
grep -r '食べる' entries/

Step 2: Compare the Duplicate Entries

Read both (or all) entries carefully. Check:

Are they truly the same lexical item?
- Same part of speech?
- Same core meaning?
- If they cover different senses of a polysemous word, they should be ONE entry with multiple definitions, not separate entries.
If they're NOT the same (rare):
- Different parts of speech (noun vs verb homographs)
- Genuinely different words that happen to share writing
- In this case, differentiate by adjusting headwords or adding disambiguating notes

Step 3: Select Which Entry to Keep

Choose the entry with better quality. Evaluate:

Criterion	Higher Priority
Definition depth	More complete explanations
Example quality	Natural, varied, useful examples
Notes richness	Covers usage, collocations, learner traps
Cross-references	Links to related entries
Vocabulary tier	Has appropriate tier assigned (if any)
Furigana	All kanji properly annotated

Tie-breaker: Keep the entry with the lower ID number (older entry).

Step 4: Merge Content (If Needed)

Before deleting, check if the inferior entry has unique content worth preserving:

Unique examples not in the better entry
Additional usage notes or collocations
Cross-references to other entries
Different vocabulary tier assignment (choose the more accurate one)

If so, edit the keeper entry first to incorporate the valuable content:

bash

# Read both entries
cat entries/00000/00123_taberu.json
cat entries/04500/04567_taberu.json

# Edit the keeper to add any missing valuable content
# Then delete the duplicate

Step 5: Delete the Duplicate

Use the delete-entry skill for safe deletion. The process:

Delete the entry file:

bash

rm entries/path/to/duplicate_entry.json

Update indexes:
bash
```
python3 build/update_indexes.py
```
Rebuild the flat file:
bash
```
python3 build/build_flat.py
```
Verify deletion:
bash
```
python3 build/validate.py
```

Complete Workflow Example

bash

# 1. Find duplicates
python3 build/validate.py 2>&1 | grep "Duplicate"

# 2. Read both entries (example)
cat entries/00000/00123_taberu.json
cat entries/04500/04567_taberu.json

# 3. Decide: Keep 00123_taberu (better examples), delete 04567_taberu

# 4. Check if 04567_taberu has content worth merging
# (If yes, edit 00123_taberu first to add the content)

# 5. Delete the duplicate
rm entries/04500/04567_taberu.json

# 6. Update indexes and rebuild
python3 build/update_indexes.py
python3 build/build_flat.py

# 7. Validate
python3 build/validate.py

Handling Cross-References to Deleted Entries

If other entries reference the deleted duplicate:

Search for references:

bash

grep -r '"reading": "たべる"' entries/ --include="*.json" | grep cross_references

Update references to point to the kept entry's details (or remove if no longer relevant)
The delete-entry skill provides detailed guidance on this.

Prevention

To avoid creating duplicates in the future:

Always search before creating:

bash

grep -r '"reading": "newword"' entries/
grep -r 'headword_kanji' entries/

Check candidate_words.json - words there may already have entries
Run validation frequently during entry creation sessions

Checklist

Duplicates identified (same reading AND headword)
Confirmed entries are for the same lexical item
Selected which entry to keep (better quality)
Merged any valuable unique content to keeper
Deleted duplicate entry file
Updated any cross-references pointing to deleted entry
Ran update_indexes.py
Ran build_flat.py
Validation passes with no duplicate errors

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/resolve-duplicates
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Resolving Duplicate Entries

What Counts as a Duplicate?

Step 1: Identify Duplicates

Step 2: Compare the Duplicate Entries

Step 3: Select Which Entry to Keep

Step 4: Merge Content (If Needed)

Step 5: Delete the Duplicate

Complete Workflow Example

Handling Cross-References to Deleted Entries

Prevention

Checklist

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state