Agent skill
Searching Scientific Literature
PubMed search with keyword optimization, result parsing, and metadata extraction
Install this agent skill to your Project
npx add-skill https://github.com/kthorn/research-superpower/tree/main/skills/research/searching-literature
SKILL.md
Searching Scientific Literature
Overview
Search PubMed for scientific literature using optimized queries. Extract metadata and prepare papers for relevance evaluation.
Core principle: Cast a wide enough net to find relevant papers, but use targeted keywords to keep results manageable.
When to Use
Use this skill when:
- Starting a new research question
- User asks "find papers about..."
- Need initial paper set for evaluation
- Searching for specific methods, compounds, diseases, techniques
Search Strategy
1. Parse User Query
Extract:
- Keywords: Main concepts (e.g., "BTK inhibitor", "selectivity", "kinase")
- Data types: What user needs (IC50 values, methods, structures, results)
- Constraints: Date ranges, specific journals, author names
- Synonyms: Alternative terms (e.g., "Bruton's tyrosine kinase" = "BTK")
2. Construct PubMed Query
Boolean operators:
- AND - narrow results (must have both terms)
- OR - broaden results (either term)
- NOT - exclude terms
Example queries:
"BTK inhibitor"[Title/Abstract] AND selectivity[Title/Abstract]
("kinase inhibitor" OR "protein kinase") AND (selectivity OR "off-target")
"ibrutinib"[Title/Abstract] AND ("IC50" OR "inhibitory concentration")
Field tags:
[Title/Abstract]- search title and abstract only[Title]- title only (more precise)[Author]- specific author[Journal]- specific journal[Date]- date range
3. Execute Search
API endpoint:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?\
db=pubmed&\
term=YOUR_QUERY&\
retmax=100&\
retmode=json&\
sort=relevance
Parameters:
db=pubmed- search PubMed databaseterm=- your query (URL encode spaces and special chars)retmax=100- max results (start with 100)retmode=json- return JSONsort=relevance- most relevant first (orpub_datefor newest)
Example bash:
curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=BTK+inhibitor+selectivity&retmax=100&retmode=json&sort=relevance"
Response format:
{
"esearchresult": {
"count": "156",
"retmax": "100",
"idlist": ["12345678", "87654321", ...]
}
}
4. Fetch Paper Metadata
API endpoint:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?\
db=pubmed&\
id=12345678,87654321&\
retmode=json
Extract from response:
- Title
- Authors (list)
- Journal name
- Publication date
- Abstract (via separate efetch call or use esummary)
- PMID
- DOI (if available in
articleids)
Getting DOI from PMID:
"articleids": [
{"idtype": "pubmed", "value": "12345678"},
{"idtype": "doi", "value": "10.1234/example.2023"}
]
If DOI missing:
- Use PMID as fallback identifier
- Try to resolve DOI via PubMed Central or publisher APIs later
Output Format
Create list of paper objects:
[
{
"pmid": "12345678",
"doi": "10.1234/example.2023",
"title": "Selective BTK inhibitors for autoimmune diseases",
"authors": ["Smith J", "Doe A", "Johnson B"],
"journal": "Nature Chemical Biology",
"year": "2023",
"abstract": "We developed a series of...",
"source": "pubmed_search"
}
]
Error Handling
Rate limits (CRITICAL - shared across all processes/subagents):
- No API key: 3 requests/second (official limit)
- With API key: 10 requests/second
- Single agent/script: Use 500ms delays (2 req/sec, safe margin)
- 350ms is theoretically sufficient but causes ~20% HTTP 429 errors in practice
- Multiple parallel subagents: Use longer delays to share capacity
- 2 parallel: 1 second each (2 total req/sec)
- 3 parallel: 1.5 seconds each (2 total req/sec)
- 5 parallel: 2.5 seconds each (2 total req/sec)
- Formula:
delay_seconds = (num_parallel / rate_limit) + safety_margin
- If you get HTTP 429 errors: Wait 5 seconds, resume with doubled delays
Empty results:
- Try broader terms
- Remove field tags
- Check for typos
- Use OR to add synonyms
Too many results (>500):
- Add more specific terms
- Use field tags to narrow
- Add date constraints
- Consider splitting into sub-queries
Integration with Other Skills
After search completes:
- Save results to research folder as
initial-search-results.json - For each paper, call
evaluating-paper-relevanceskill - Track in
papers-reviewed.json(use DOI as key, fallback to PMID)
Quick Reference
| Task | Command |
|---|---|
| Search PubMed | curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=QUERY&retmax=100&retmode=json" |
| Get metadata | curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=PMID1,PMID2&retmode=json" |
| URL encode query | Replace spaces with +, special chars with %XX |
| Narrow results | Use AND, add field tags, more specific terms |
| Broaden results | Use OR, remove field tags, add synonyms |
Common Mistakes
Too narrow: Only 5 results → Use OR, remove constraints Too broad: 5000 results → Add AND terms, use field tags Missing abstracts: Use efetch instead of esummary for full abstract text DOI not found: Many older papers lack DOI - use PMID as fallback Rate limiting: Add 500ms delays (single agent) or longer (parallel subagents sharing rate limit)
Next Steps
After completing search:
- Announce: "Found N papers matching query"
- Begin evaluation using
skills/research/evaluating-paper-relevance - Update user with progress as papers are screened
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
Getting Started with Research Superpowers
Introduction to literature search & review skills - systematic paper finding, screening, extraction, and citation traversal
Cleaning Up Research Sessions
Safely remove intermediate files from completed research sessions while preserving important data
Subagent-Driven Literature Review
Use parallel subagents for large-scale paper screening and deep dive analysis
Building Paper Screening Rubrics
Collaboratively build and refine paper screening rubrics through brainstorming, test-driven development, and iterative feedback
Checking ChEMBL for Structured SAR Data
Check if medicinal chemistry papers are in ChEMBL database to access curated bioactivity data
Answering Research Questions
Main orchestration workflow for systematic literature research - search, evaluate, traverse, synthesize
Didn't find tool you were looking for?