Agent skill
Cleaning Up Research Sessions
Safely remove intermediate files from completed research sessions while preserving important data
Install this agent skill to your Project
npx add-skill https://github.com/kthorn/research-superpower/tree/main/skills/research/cleaning-up-research-sessions
SKILL.md
Cleaning Up Research Sessions
Overview
Remove intermediate files created during research workflow while preserving all important data.
Core principle: Conservative cleanup with user confirmation. Never delete anything important.
When to Use
Use this skill when:
- Research session is complete and consolidated
- Preparing to archive or share research session folder
- Research folder has accumulated temporary/intermediate files
- User explicitly asks to clean up
When NOT to use:
- Research is still in progress
- User hasn't reviewed final outputs yet
- Unsure what files are safe to delete
Files That Are ALWAYS KEPT
NEVER delete these (protected list):
Core outputs:
SUMMARY.md- Enhanced findings with methodologyrelevant-papers.json- Filtered relevant paperspapers-reviewed.json- Complete screening historypapers/directory - All PDFs and supplementary filescitations/citation-graph.json- Citation relationships
Methodology documentation:
screening-criteria.json- Rubric definition (if exists)test-set.json- Rubric validation papers (if exists)abstracts-cache.json- Cached abstracts for re-screening (if exists)rubric-changelog.md- Rubric version history (if exists)
Auxiliary documentation (if exists):
README.md- Project overviewTOP_PRIORITY_PAPERS.md- Curated priority listevaluated-papers.json- Rich structured data
Project configuration:
.claude/directory - Permissions and settings*.pyhelper scripts that were created - Keep for reproducibility
Files That May Be Cleaned Up
Candidates for removal (with confirmation):
Intermediate search results:
initial-search-results.json- Raw PubMed results before screening- Safe to delete: Data is in papers-reviewed.json
- Reason to keep: Shows raw search results for reproducibility
Temporary files:
*.tmpfiles*.swpfiles (vim swap files).DS_Store(macOS)__pycache__/(Python cache)*.pyc(Python compiled)
Log files:
*.logfilesdebug-*.txtfiles
Cleanup Workflow
Step 1: Analyze Research Session
cd research-sessions/YYYY-MM-DD-description/
# List all files with sizes
find . -type f -exec ls -lh {} \; | awk '{print $5, $9}' | sort -rh
Identify files by category:
- Core outputs (MUST keep)
- Methodology files (SHOULD keep)
- Intermediate files (candidates for cleanup)
- Temporary files (safe to delete)
Step 2: Present Cleanup Plan to User
Show what will be deleted:
๐งน Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/
Files to KEEP (protected):
โ
SUMMARY.md (45 KB)
โ
relevant-papers.json (12 KB)
โ
papers-reviewed.json (28 KB)
โ
papers/ (14 PDFs, 32 MB)
โ
citations/citation-graph.json (5 KB)
โ
screening-criteria.json (2 KB)
โ
abstracts-cache.json (156 KB)
Files that CAN be removed (intermediate):
๐๏ธ initial-search-results.json (8 KB) - Raw PubMed results
๐๏ธ .DS_Store (6 KB) - macOS metadata
Total space to recover: 14 KB
Proceed with cleanup? (y/n/review)
Options:
y- Delete intermediate filesn- Cancel cleanup, keep everythingreview- Show contents of each file before deciding
Step 3: Confirm Deletions
Before deleting ANY file:
- Verify it's not in protected list
- Check file isn't referenced in SUMMARY.md
- Confirm with user one more time
Example confirmation:
About to delete:
- initial-search-results.json (8 KB)
This file contains raw PubMed search results. The data is preserved in
papers-reviewed.json, so this is safe to delete.
Confirm deletion? (y/n)
Step 4: Perform Cleanup
Delete confirmed files:
# Move to trash instead of rm (safer)
# On macOS:
mv initial-search-results.json ~/.Trash/
# On Linux:
mv initial-search-results.json ~/.local/share/Trash/files/
# Or use rm if user confirms
rm initial-search-results.json
Report results:
โ
Cleanup complete!
Removed:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)
Space recovered: 14 KB
Protected files preserved:
- All 8 core files kept
- All 14 PDFs kept
- All methodology documentation kept
Step 5: Verify Integrity
After cleanup, verify critical files:
# Check core files exist
test -f SUMMARY.md && echo "โ SUMMARY.md"
test -f relevant-papers.json && echo "โ relevant-papers.json"
test -f papers-reviewed.json && echo "โ papers-reviewed.json"
test -d papers && echo "โ papers/ directory"
# Verify JSON files are valid
jq empty relevant-papers.json && echo "โ relevant-papers.json valid JSON"
jq empty papers-reviewed.json && echo "โ papers-reviewed.json valid JSON"
Report to user:
โ
Integrity check passed
- All core files present
- All JSON files valid
- All PDFs intact
Special Cases
Case 1: Large abstracts-cache.json
If abstracts-cache.json is very large (>100 MB):
โ ๏ธ abstracts-cache.json is 256 MB
This file enables re-screening if you update the rubric. Options:
1. Keep (recommended if you might refine rubric)
2. Compress (gzip to ~50 MB, can decompress later)
3. Delete (only if research is final and won't be updated)
Choice? (1/2/3)
If user chooses compress:
gzip abstracts-cache.json
# Creates abstracts-cache.json.gz
echo "Compressed abstracts-cache.json to $(du -h abstracts-cache.json.gz | cut -f1)"
Case 2: Helper Scripts
If user created helper scripts during research:
๐ Found helper scripts:
- screen_papers.py (created for batch screening)
- deep_dive_papers.py (created for data extraction)
These scripts document your methodology. Recommendations:
- Keep for reproducibility
- Add comments if not already documented
- Reference in SUMMARY.md under "Reproducibility" section
Keep scripts? (y/n)
Case 3: Multiple Research Sessions
If cleaning up multiple sessions:
# Find all research sessions
find research-sessions/ -maxdepth 1 -type d
# For each session:
for session in research-sessions/*/; do
echo "Analyzing: $session"
# Run cleanup analysis
done
Ask user:
Found 5 completed research sessions.
Clean up all sessions? (y/n/select)
- y: Analyze and clean all sessions
- n: Cancel
- select: Choose which sessions to clean
Safety Mechanisms
Protected File List
Maintain hardcoded list of patterns to NEVER delete:
PROTECTED_PATTERNS = [
'SUMMARY.md',
'relevant-papers.json',
'papers-reviewed.json',
'papers/*.pdf',
'papers/*.zip',
'citations/citation-graph.json',
'screening-criteria.json',
'test-set.json',
'abstracts-cache.json',
'rubric-changelog.md',
'README.md',
'TOP_PRIORITY_PAPERS.md',
'evaluated-papers.json',
'*.py', # Helper scripts
'.claude/*', # Project settings
]
Before deleting any file:
def is_protected(filepath):
"""Check if file matches any protected pattern"""
for pattern in PROTECTED_PATTERNS:
if fnmatch(filepath, pattern):
return True
return False
# Never delete protected files
if is_protected(file_to_delete):
print(f"โ ๏ธ ERROR: {file_to_delete} is protected and cannot be deleted")
return
Dry Run Mode
Always show what will be deleted before doing it:
# Dry run (show only, don't delete)
echo "DRY RUN - No files will be deleted"
for file in $candidate_files; do
if is_safe_to_delete "$file"; then
echo "Would delete: $file ($(du -h $file | cut -f1))"
fi
done
echo ""
echo "Proceed with actual deletion? (y/n)"
Integration with Other Skills
After answering-research-questions workflow:
- Complete Phase 8 (consolidation)
- User reviews SUMMARY.md and relevant-papers.json
- Optionally: Run cleaning-up-research-sessions
- Archive or share research folder
Add to answering-research-questions Phase 8:
### Optional: Cleanup
After reviewing outputs, optionally clean up intermediate files:
"Research session is complete. Would you like me to clean up intermediate files?
I'll show you what will be deleted before removing anything."
If yes: Use `cleaning-up-research-sessions` skill
Common Mistakes
Deleting papers-reviewed.json: This is the deduplication database - NEVER delete โ Always protect with hardcoded list Deleting abstracts-cache.json: Needed for re-screening โ Ask user, default to keep Deleting helper scripts: Important for reproducibility โ Keep by default, ask if user wants to remove Not showing user what will be deleted: User needs to see the plan โ Always show dry run first Deleting without confirmation: Too risky โ Always ask for final confirmation Not verifying after cleanup: Files could be corrupted โ Run integrity check after deletion
Quick Reference
| Action | Command |
|---|---|
| List all files with sizes | find . -type f -exec ls -lh {} \; | sort -rh |
| Check if JSON valid | jq empty file.json |
| Move to trash (macOS) | mv file ~/.Trash/ |
| Move to trash (Linux) | mv file ~/.local/share/Trash/files/ |
| Compress large file | gzip file |
| Calculate folder size | du -sh . |
Success Criteria
Cleanup successful when:
- All protected files still exist and are valid
- Only confirmed intermediate files deleted
- User knows what was deleted and why
- Space recovered is reported
- Integrity check passed
- Research session is still fully functional
Example Session
User: "Can you clean up the research session folder?"
Agent: "I'll analyze the research session and show you what can be safely cleaned up."
[Analyzes folder]
Agent: "๐งน Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/
Files to KEEP (protected):
โ
SUMMARY.md (45 KB) - Final findings
โ
relevant-papers.json (12 KB) - Filtered results
โ
papers-reviewed.json (28 KB) - Complete history
โ
papers/ (14 PDFs, 32 MB) - Downloaded papers
โ
citations/citation-graph.json (5 KB) - Citation network
โ
screening-criteria.json (2 KB) - Rubric definition
โ
abstracts-cache.json (156 KB) - Cached abstracts
Files that CAN be removed:
๐๏ธ initial-search-results.json (8 KB) - Raw PubMed results
โ Data is preserved in papers-reviewed.json
๐๏ธ .DS_Store (6 KB) - macOS metadata
โ Not needed
Total space to recover: 14 KB
Proceed with cleanup? (y/n/review)"
User: "y"
Agent: "Final confirmation - About to delete:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)
All protected files will be preserved. Confirm? (y/n)"
User: "y"
Agent: [Deletes files]
"โ
Cleanup complete!
Removed:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)
Space recovered: 14 KB
Running integrity check...
โ
All core files intact
โ
All JSON files valid
โ
All 14 PDFs present
Research session is ready for archiving or sharing."
Next Steps
After cleanup:
- Research folder is clean and ready to archive
- Share folder with collaborators
- Move to long-term storage
- Continue with follow-up research if needed
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
Getting Started with Research Superpowers
Introduction to literature search & review skills - systematic paper finding, screening, extraction, and citation traversal
Subagent-Driven Literature Review
Use parallel subagents for large-scale paper screening and deep dive analysis
Building Paper Screening Rubrics
Collaboratively build and refine paper screening rubrics through brainstorming, test-driven development, and iterative feedback
Searching Scientific Literature
PubMed search with keyword optimization, result parsing, and metadata extraction
Checking ChEMBL for Structured SAR Data
Check if medicinal chemistry papers are in ChEMBL database to access curated bioactivity data
Answering Research Questions
Main orchestration workflow for systematic literature research - search, evaluate, traverse, synthesize
Didn't find tool you were looking for?