Agent skill
Traversing Citation Networks
Smart backward and forward citation following via Semantic Scholar, with relevance filtering and deduplication
Install this agent skill to your Project
npx add-skill https://github.com/kthorn/research-superpower/tree/main/skills/research/traversing-citations
SKILL.md
Traversing Citation Networks
Overview
Intelligently follow citations backward (references) and forward (citing papers) using Semantic Scholar API.
Core principle: Only follow citations relevant to user's query. Avoid exponential explosion by filtering before traversing.
When to Use
Use this skill when:
- Found a highly relevant paper (score ≥ 7)
- Need to find related work
- User asks "what papers cite this?"
- Building comprehensive understanding of a topic
When NOT to use:
- Paper scored < 7 (not relevant enough to follow)
- Already at 50 papers (check with user first)
- Citations look off-topic from abstract
Citation Traversal Strategy
1. Get Paper ID from Semantic Scholar
Lookup by DOI:
curl "https://api.semanticscholar.org/graph/v1/paper/DOI:10.1234/example.2023?fields=paperId,title,year"
Response:
{
"paperId": "abc123def456",
"title": "Paper Title",
"year": 2023
}
Save paperId - needed for citations/references queries
2. Backward Traversal (References)
Get references from paper:
curl "https://api.semanticscholar.org/graph/v1/paper/abc123def456/references?fields=contexts,intents,title,year,abstract,externalIds&limit=100"
Response format:
{
"data": [
{
"citedPaper": {
"paperId": "xyz789",
"title": "Referenced Paper Title",
"year": 2020,
"abstract": "...",
"externalIds": {
"DOI": "10.5678/referenced.2020",
"PubMed": "87654321"
}
},
"contexts": [
"...as described in previous work [15]...",
"...we used the method from [15] to..."
],
"intents": ["methodology", "background"]
}
]
}
Filter for relevance:
For each reference, check:
- Context keywords: Do citation contexts mention user's query terms?
- Example: If user asks about "IC50 values", look for contexts mentioning "IC50", "activity", "potency"
- Title match: Does title contain relevant keywords?
- Intent: Is intent "methodology" or "result" (more relevant) vs "background" (less relevant)?
Scoring:
- Context keywords match: +3 points
- Title keywords match: +2 points
- Intent is methodology/result: +2 points
- Recent (< 5 years old): +1 point
Only add to queue if score ≥ 5
3. Forward Traversal (Citations)
Get papers citing this one:
curl "https://api.semanticscholar.org/graph/v1/paper/abc123def456/citations?fields=title,year,abstract,externalIds&limit=100"
Response format:
{
"data": [
{
"citingPaper": {
"paperId": "def456ghi",
"title": "Newer Paper Citing This",
"year": 2024,
"abstract": "We extended the work of [original paper]...",
"externalIds": {
"DOI": "10.9012/citing.2024"
}
}
}
]
}
Filter for relevance:
For each citing paper:
- Title match: Keywords present in title?
- Abstract match: User's query terms in abstract?
- Recency: Newer papers often build on findings (prioritize < 2 years)
- Citation count: If Semantic Scholar provides, highly cited papers more likely relevant
Scoring:
- Title keywords match: +3 points
- Abstract keywords match: +2 points
- Recent (< 2 years): +2 points
- Moderate recency (2-5 years): +1 point
Only add to queue if score ≥ 5
4. Deduplication
Before adding to queue:
Check papers-reviewed.json:
doi = paper["externalIds"].get("DOI")
if doi in papers_reviewed:
skip # Already processed
else:
add to queue
CRITICAL: After evaluating any paper from citation traversal, add it to papers-reviewed.json regardless of score. This prevents re-processing the same paper from multiple sources.
Track citation relationship in citations/citation-graph.json:
{
"10.1234/example.2023": {
"references": ["10.5678/ref1.2020", "10.5678/ref2.2021"],
"cited_by": ["10.9012/cite1.2024", "10.9012/cite2.2024"]
}
}
CRITICAL: Use ONLY citation-graph.json for citation tracking. Do NOT create custom files like forward_citation_pmids.txt or citation_analysis.md. All findings go in SUMMARY.md.
5. Process Queue
Add relevant citations to processing queue:
{
"doi": "10.5678/referenced.2020",
"title": "Referenced Paper",
"relevance_score": 7,
"source": "backward_from:10.1234/example.2023",
"context": "Method citation - describes IC50 measurement protocol"
}
Then:
- Evaluate using
evaluating-paper-relevanceskill - If relevant, extract data and potentially traverse its citations too
Smart Traversal Limits
To avoid explosion:
- Only traverse papers scoring ≥ 7 in initial evaluation
- Only follow citations scoring ≥ 5 in relevance filtering
- Limit traversal depth to 2 levels (original → references → references of references)
- Check with user after every 50 papers total
Breadth-first strategy:
- Get all references + citations for current paper
- Filter and score them
- Add high-scoring ones to queue
- Process next paper in queue
- Repeat until queue empty or hit limit
Progress Reporting
Report as you traverse:
🔗 Analyzing citations for: "Original Paper Title"
→ Found 45 references, 12 look relevant
→ Found 23 citing papers, 8 look relevant
→ Adding 20 papers to queue
📄 [51/127] Following reference: "Method for measuring IC50"
Source: Referenced by original paper in Methods section
Abstract score: 7 → Fetching full text...
API Rate Limiting
Semantic Scholar limits:
- Free tier: 100 requests per 5 minutes
- With API key: 1000 requests per 5 minutes
Be efficient:
- Request multiple fields in one call (
?fields=title,abstract,externalIds,year) - Use
limit=100to get more results per request - Cache responses - don't re-fetch same paper
If rate limited:
- Wait 5 minutes
- Report to user: "⏸️ Rate limited by Semantic Scholar API. Waiting 5 minutes..."
- Consider getting API key for higher limits
Integration with Other Skills
After traversing citations:
- Queue now has N new papers to evaluate
- For each, use
evaluating-paper-relevanceskill - If relevant, extract to SUMMARY.md
- If highly relevant (≥9), traverse its citations too
- Update citation-graph.json to track relationships
Quick Reference
| Task | API Endpoint |
|---|---|
| Get paper by DOI | GET /graph/v1/paper/DOI:{doi}?fields=paperId,title |
| Get references | GET /graph/v1/paper/{paperId}/references?fields=contexts,title,abstract,externalIds |
| Get citations | GET /graph/v1/paper/{paperId}/citations?fields=title,abstract,externalIds |
| Check if processed | Look up DOI in papers-reviewed.json |
| Filter relevance | Score based on context/title/intent/recency |
Relevance Filtering Checklist
Before adding citation to queue:
- Check if already in papers-reviewed.json (skip if yes)
- Score based on context/title keywords (need ≥ 5)
- Verify external ID (DOI or PMID) exists
- Add source tracking ("backward_from:DOI" or "forward_from:DOI")
- Add to queue with metadata
Common Mistakes
Not tracking all evaluated papers: Only adding relevant papers to papers-reviewed.json → Add EVERY paper after evaluation to prevent re-review Creating custom analysis files: Making forward_citation_pmids.txt, CITATION_ANALYSIS.md, etc. → Use ONLY citation-graph.json and SUMMARY.md Following all citations: Exponential explosion → Filter before adding to queue Ignoring context: Citation might be tangential → Read context strings Not deduplicating: Re-process same papers → Always check papers-reviewed.json before and after evaluation Too deep: Following 5+ levels → Limit to 2 levels, check with user Missing forward citations: Only checking references → Use both backward and forward No rate limiting awareness: API blocks you → Add delays, handle 429 errors
Example Workflow
1. User asks: "Find selectivity data for BTK inhibitors"
2. Search finds Paper A (score: 9, has great IC50 data)
3. Traverse citations for Paper A:
- References: 45 total, 12 relevant (mention "selectivity", "IC50")
- Citations: 23 total, 8 relevant (newer papers on BTK)
4. Add 20 papers to queue
5. Evaluate first queued paper (score: 8)
6. Extract data, traverse its citations (add 5 more)
7. Continue until queue empty or user says stop
Next Steps
After traversing citations:
- Process queued papers with
evaluating-paper-relevance - Update SUMMARY.md with new findings
- Check if reached checkpoint (50 papers or 5 minutes)
- If checkpoint: ask user to continue or stop
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
Getting Started with Research Superpowers
Introduction to literature search & review skills - systematic paper finding, screening, extraction, and citation traversal
Cleaning Up Research Sessions
Safely remove intermediate files from completed research sessions while preserving important data
Subagent-Driven Literature Review
Use parallel subagents for large-scale paper screening and deep dive analysis
Building Paper Screening Rubrics
Collaboratively build and refine paper screening rubrics through brainstorming, test-driven development, and iterative feedback
Searching Scientific Literature
PubMed search with keyword optimization, result parsing, and metadata extraction
Checking ChEMBL for Structured SAR Data
Check if medicinal chemistry papers are in ChEMBL database to access curated bioactivity data
Didn't find tool you were looking for?