Agent skill
Answering Research Questions
Main orchestration workflow for systematic literature research - search, evaluate, traverse, synthesize
Install this agent skill to your Project
npx add-skill https://github.com/kthorn/research-superpower/tree/main/skills/research/answering-research-questions
SKILL.md
Answering Research Questions
Overview
Orchestrate the complete research workflow from query to findings.
Core principle: Systematic, trackable, comprehensive. Search β Evaluate β Traverse β Synthesize.
Announce at start: "I'm using the Answering Research Questions skill to find [specific data] about [topic]."
The Process
Phase 1: Parse Query
Extract from user's request:
Keywords:
- Main concepts (e.g., "BTK inhibitor", "selectivity")
- Synonyms and alternatives (e.g., "Bruton tyrosine kinase")
- Related terms (e.g., "off-target", "kinase panel")
Data types needed:
- Specific measurements (IC50, KD, EC50, etc.)
- Methods or protocols
- Structures or sequences
- Results or conclusions
Constraints:
- Date ranges
- Specific compounds/targets
- Organisms or systems
- Publication types
Ask clarifying questions if needed:
- "Are you looking for in vitro or in vivo data?"
- "Any specific time frame?"
- "Which kinases are you most interested in?"
- "What email address should I use for Unpaywall API requests?" (Required for finding open access papers)
Phase 2: Initialize Research Session
Propose folder name:
research-sessions/YYYY-MM-DD-brief-description/
Example: research-sessions/2025-10-11-btk-inhibitor-selectivity/
Show proposal to user:
π Creating research folder: research-sessions/2025-10-11-btk-inhibitor-selectivity/
Proceed? (y/n)
Create folder structure:
mkdir -p "research-sessions/YYYY-MM-DD-description"/{papers,citations}
Initialize files:
Core files (always create these):
papers-reviewed.json:
{}
citations/citation-graph.json:
{}
SUMMARY.md:
# Research Query: [User's question]
**Started:** YYYY-MM-DD HH:MM
**Keywords:** keyword1, keyword2, keyword3
**Data types sought:** IC50 values, selectivity data, synthesis methods
---
## Highly Relevant Papers (Score β₯ 8)
Papers scored using `evaluating-paper-relevance` skill:
- Score 0-10 based on: Keywords (0-3) + Data type (0-4) + Specificity (0-3)
- Score β₯ 8: Highly relevant with significant data
- Score 7: Relevant with useful data
- Score 5-6: Possibly relevant
- Score < 5: Not relevant
(Papers will be added here as found)
Example format:
### [Paper Title](https://doi.org/10.1234/example)
**DOI:** [10.1234/example](https://doi.org/10.1234/example) | **PMID:** [12345678](https://pubmed.ncbi.nlm.nih.gov/12345678/)
---
## Relevant Papers (Score 7)
(Papers will be added here as found)
---
## Possibly Relevant Papers (Score 5-6)
(Noted for potential follow-up)
---
## Search Progress
- Initial PubMed search: X results
- Papers reviewed: Y
- Papers with relevant data: Z
- Citations followed: N
---
## Key Findings
(Synthesized findings will be added as research progresses)
CRITICAL: Always use clickable markdown links for DOIs and PMIDs
Auxiliary files (for large searches >100 papers):
See evaluating-paper-relevance skill for guidance on when to create:
- README.md - Project overview, methodology, file inventory
- TOP_PRIORITY_PAPERS.md - Curated priority list organized by tier
- evaluated-papers.json - Rich structured data for programmatic access
For small searches (<50 papers), stick to core files only. For large searches (>100 papers), auxiliary files add significant organizational value.
Phase 3: Search Literature
Use searching-literature skill:
- Construct PubMed query from keywords
- Execute search (start with 100 results)
- Save results to
initial-search-results.json - Report: "π Found N papers matching query"
Phase 4: Evaluate Papers
Use evaluating-paper-relevance skill:
For each paper:
- Check papers-reviewed.json (skip if already processed)
- Stage 1: Score abstract (0-10)
- If score β₯ 7: Stage 2 deep dive
- Extract findings to SUMMARY.md
- Download PDF and supplementary if available
- Update papers-reviewed.json (for ALL papers, even low-scoring ones)
- If score β₯ 7: proceed to Phase 5 for this paper
CRITICAL: Add every paper to papers-reviewed.json regardless of score. This prevents re-review and tracks complete search history.
Report progress for EVERY paper:
π [15/100] Screening: "Paper Title"
Abstract score: 8 β Fetching full text...
β Found IC50 data for 8 compounds
β Added to SUMMARY.md
π [16/100] Screening: "Another Paper"
Abstract score: 3 β Skipping (not relevant)
π [17/100] Screening: "Third Paper"
Abstract score: 7 β Relevant, adding to queue...
Every 10 papers, give summary update
Phase 5: Traverse Citations
Use traversing-citations skill:
For papers scoring β₯ 7:
- Get references (backward)
- Get citations (forward)
- Filter for relevance (score β₯ 5)
- Add to processing queue
- Evaluate queued papers (return to Phase 4)
Report progress:
π Following citations from highly relevant paper
β Found 12 relevant references
β Found 8 relevant citing papers
β Adding 20 papers to queue
Phase 6: Checkpoint
Check after:
- Every 50 papers reviewed
- Every 5 minutes of processing
- Queue exhausted
Ask user:
βΈοΈ Checkpoint: Reviewed 50 papers, found 12 relevant
Papers with data: 7
Continue searching? (y/n/summary)
Options:
y- Continue processingn- Stop and finalizesummary- Show current findings, then decide
Phase 7: Synthesize Findings
When stopping (user says no or queue empty):
Option A: Manual synthesis (small research sessions)
- Review SUMMARY.md - Organize by relevance and topic
- Extract key findings - Group by data type
- Add synthesis section:
## Key Findings Summary
### IC50 Values for BTK Inhibitors
- Compound A: 12 nM (Smith et al., 2023)
- Compound B: 45 nM (Doe et al., 2024)
- [More compounds...]
### Selectivity Data
- Compound A shows >80-fold selectivity vs other kinases
- Tested against panel of 50 kinases (Jones et al., 2023)
### Synthesis Methods
- Lead compounds synthesized via [method]
- Yields: 30-45%
- Full protocols in [papers]
### Gaps Identified
- No data on selectivity vs [specific kinase]
- Limited in vivo data
- Few papers on resistance mechanisms
- Update search progress stats
- List all files downloaded
Option B: Script-based synthesis (large research sessions >50 papers)
For large research sessions, consider creating a synthesis script:
create generate_summary.py:
- Read
evaluated-papers.jsonfrom helper scripts - Aggregate findings by priority and scaffold type
- Generate comprehensive SUMMARY.md with:
- Executive summary with statistics
- Papers grouped by relevance score
- Priority recommendations for next steps
- Methodology documentation
- Include timestamps and reproducibility info
Benefits:
- Consistent formatting across sessions
- Easy to regenerate as more papers added
- Can customize grouping/filtering logic
- Documents complete methodology
Final report:
β
Research complete!
π Summary:
- Papers reviewed: 127
- Relevant papers: 18
- Highly relevant: 7
- Data extracted: IC50 values for 45 compounds, selectivity data, synthesis methods
π All findings in: research-sessions/2025-10-11-btk-inhibitor-selectivity/
- SUMMARY.md (organized findings)
- papers/ (14 PDFs + supplementary data)
- papers-reviewed.json (complete tracking)
Phase 8: Final Consolidation
CRITICAL: Always consolidate findings at the end
1. Create relevant-papers.json
Filter papers-reviewed.json to extract only relevant papers (score β₯ 7):
# Read papers-reviewed.json
with open('papers-reviewed.json') as f:
all_papers = json.load(f)
# Filter for relevant papers (score >= 7)
relevant_papers = {
doi: data for doi, data in all_papers.items()
if data.get('score', 0) >= 7
}
# Save to relevant-papers.json
with open('relevant-papers.json', 'w') as f:
json.dump(relevant_papers, f, indent=2)
Format:
{
"10.1234/example1.2023": {
"pmid": "12345678",
"title": "Paper title",
"status": "highly_relevant",
"score": 9,
"source": "pubmed_search",
"timestamp": "2025-10-11T16:00:00Z",
"found_data": ["IC50 values", "synthesis methods"],
"chembl_id": "CHEMBL1234567"
},
"10.1234/example2.2023": {
"pmid": "23456789",
"title": "Another paper",
"status": "relevant",
"score": 7,
"source": "forward_citation",
"timestamp": "2025-10-11T16:15:00Z",
"found_data": ["MIC data"]
}
}
2. Enhance SUMMARY.md with Methodology Section
Add these sections to the TOP of existing SUMMARY.md (before paper listings):
# Research Query: [User's question]
**Date:** 2025-10-11
**Duration:** 2h 15m
**Status:** Complete
---
## Search Strategy
**Keywords:** BTK, Bruton tyrosine kinase, inhibitor, selectivity, off-target, kinase panel, IC50
**Data types sought:** IC50 values, selectivity data, kinase panel screening
**Constraints:** None (open date range)
**PubMed Query:**
("BTK" OR "Bruton tyrosine kinase") AND (inhibitor OR "kinase inhibitor") AND (selectivity OR "off-target")
---
## Screening Methodology
**Rubric:** Abstract scoring (0-10)
- Key terms: +3 pts each (or Keywords 0-3, Data type 0-4, Specificity 0-3 if using old rubric)
- Relevant terms: +1 pt each
- Threshold: β₯7 = relevant
**Sources:**
- Initial PubMed search
- Forward/backward citations via Semantic Scholar
---
## Results Statistics
**Papers Screened:**
- Total reviewed: 127 papers
- Highly relevant (β₯8): 12 papers
- Relevant (7): 18 papers
- Possibly relevant (5-6): 23 papers
- Not relevant (<5): 74 papers
**Data Extracted:**
- IC50 values: 45 compounds across 12 papers
- Selectivity data: 8 papers with kinase panel screening
- Full text obtained: 18/30 relevant papers (60%)
**Citation Traversal:**
- Papers with citations followed: 7
- References screened: 45 papers
- Citing papers screened: 38 papers
- Relevant papers found via citations: 8 papers
---
## Key Findings Summary
### IC50 Values for BTK Inhibitors
- Ibrutinib: 0.5 nM (Smith et al., 2023)
- Acalabrutinib: 3 nM (Doe et al., 2024)
- [Additional findings synthesized from papers below]
### Selectivity Patterns
- Most inhibitors show >50-fold selectivity vs other kinases
- Common off-targets: TEC, BMX (other TEC family kinases)
### Gaps Identified
- Limited data on selectivity vs JAK/SYK
- Few papers on resistance mechanisms
- No in vivo selectivity data found
---
## File Inventory
- `SUMMARY.md` - This file (methodology + findings)
- `relevant-papers.json` - 30 relevant papers (score β₯7)
- `papers-reviewed.json` - All 127 papers screened
- `papers/` - 18 PDFs + 5 supplementary files
- `citations/citation-graph.json` - Citation relationships
---
## Reproducibility
**To reproduce:**
1. Use PubMed query above
2. Apply screening rubric (threshold β₯7)
3. Follow citations from highly relevant papers (β₯8)
4. Check Unpaywall for paywalled papers
**Software:** Research Superpowers skills v2025-10-11
---
[Existing paper listings follow below...]
## Highly Relevant Papers (Score β₯ 8)
### [Paper Title]...
Report to user:
β
Research session complete!
π Consolidation complete:
1. SUMMARY.md - Enhanced with methodology, statistics, and findings
2. relevant-papers.json - 30 relevant papers (score β₯7) in JSON format
π All files in: research-sessions/2025-10-11-btk-inhibitor-selectivity/
- SUMMARY.md (complete: methodology + paper-by-paper findings)
- relevant-papers.json (30 relevant papers for programmatic access)
- papers-reviewed.json (127 total papers screened)
- papers/ (18 PDFs)
π Quick access:
- Open SUMMARY.md for complete findings and methodology
- Use relevant-papers.json for programmatic access
π‘ Optional: Clean up intermediate files?
β Use cleaning-up-research-sessions skill to safely remove temporary files
Workflow Checklist
Use TodoWrite to track these steps:
- Parse user query (keywords, data types, constraints)
- Propose and create research folder
- Initialize tracking files (SUMMARY.md, papers-reviewed.json, citation-graph.json)
- Search PubMed using searching-literature skill
- For each paper: evaluate using evaluating-paper-relevance skill
- For relevant papers (β₯7): traverse citations using traversing-citations skill
- Report progress regularly
- Checkpoint every 50 papers or 5 minutes
- When done: synthesize findings and enhance SUMMARY.md with methodology
- Create relevant-papers.json (filtered JSON for programmatic access)
- Final report with stats and file locations
Integration Points
Skills used:
searching-literature- Initial PubMed searchevaluating-paper-relevance- Score and extract from paperstraversing-citations- Follow citation networks
All skills coordinate through:
- Shared
papers-reviewed.json(deduplication) - Shared
SUMMARY.md(findings accumulation) - Shared
citation-graph.json(relationship tracking)
File organization:
- Small searches (<50 papers): Core files only (papers-reviewed.json, SUMMARY.md, citation-graph.json)
- All searches: Create relevant-papers.json at end; enhance SUMMARY.md with methodology
- Large searches (>100 papers): May add auxiliary files (README.md, TOP_PRIORITY_PAPERS.md, evaluated-papers.json) for better organization
Error Handling
No results found:
- Try broader keywords
- Remove constraints
- Check spelling
- Try different synonyms
API rate limiting:
- Report to user: "βΈοΈ Rate limited, waiting..."
- Wait required time
- Resume automatically
Full text unavailable:
- Note in SUMMARY.md
- Continue with abstract-only evaluation
- Flag for manual retrieval if highly relevant
Too many results (>500):
- Suggest narrowing query
- Process first 100, ask if continue
- Focus on most recent or most cited
Quick Reference
| Phase | Skill | Output |
|---|---|---|
| Parse | (built-in) | Keywords, data types, constraints |
| Initialize | (built-in) | Folder, SUMMARY.md, tracking files |
| Search | searching-literature | List of papers with metadata |
| Evaluate | evaluating-paper-relevance | Scored papers, extracted findings |
| Traverse | traversing-citations | Additional papers from citations |
| Synthesize | (built-in) | Enhanced SUMMARY.md with methodology + findings |
| Consolidate | (built-in) | relevant-papers.json (filtered to score β₯7) |
Common Mistakes
Not tracking all papers: Only adding relevant papers to papers-reviewed.json β Add EVERY paper to prevent re-review, track complete history Creating unnecessary auxiliary files for small searches: For <50 papers, stick to core files (papers-reviewed.json, SUMMARY.md, citation-graph.json). For large searches (>100 papers), auxiliary files like README.md and TOP_PRIORITY_PAPERS.md add value. Silent work: User can't see progress β Report EVERY paper, give updates every 10 Non-clickable identifiers: Plain text DOIs/PMIDs β Always use markdown links Jumping to evaluation without good search: Too narrow results β Optimize search first Not tracking papers: Re-reviewing same papers β Always use papers-reviewed.json Following all citations: Exponential explosion β Filter before traversing No checkpoints: User loses context β Report and ask every 50 papers Poor synthesis: Just list papers β Group by data type, extract key findings Batch reporting: Reporting 20 papers at once β Report each one as you go
User Communication (CRITICAL)
NEVER work silently! User needs continuous feedback.
Report frequency:
- Every paper: Brief status as you screen (
π [N/Total] Title... Score: X) - Every 5-10 papers: Progress summary with counts
- Every finding: Immediately report what data you found
- Every decision point: Ask before changing direction
Be specific in progress reports:
- β "Found IC50 = 12 nM for compound 7 (Table 2)"
- β "Found data"
- β "Screening paper 25/127: Not relevant (score 3)"
- β Silently skip papers
Ask for clarification when needed:
- β "Are you looking for in vitro or in vivo IC50 values?"
- β Assume and potentially waste time
Report blockers immediately:
- β "β οΈ Paper behind paywall - evaluating from abstract only"
- β Silently skip without mentioning
Periodic summaries (every 10-15 papers):
π Progress update:
- Reviewed: 30/127 papers
- Highly relevant: 3 (scores 8-10)
- Relevant: 5 (score 7)
- Currently: Screening paper 31...
Why: User can course-correct early, knows work is happening, can stop if needed
Success Criteria
Research session successful when:
- All relevant papers found and evaluated
- Specific data extracted and organized
- Citations followed systematically
- No duplicate processing
- Clear SUMMARY.md with actionable findings
- User questions answered with evidence
Next Steps
After completing research:
- User reviews SUMMARY.md and relevant-papers.json
- Optional: Run cleaning-up-research-sessions skill to remove intermediate files
- May request deeper dive into specific papers
- May request follow-up searches with refined keywords
- May archive or share research session folder
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
Getting Started with Research Superpowers
Introduction to literature search & review skills - systematic paper finding, screening, extraction, and citation traversal
Cleaning Up Research Sessions
Safely remove intermediate files from completed research sessions while preserving important data
Subagent-Driven Literature Review
Use parallel subagents for large-scale paper screening and deep dive analysis
Building Paper Screening Rubrics
Collaboratively build and refine paper screening rubrics through brainstorming, test-driven development, and iterative feedback
Searching Scientific Literature
PubMed search with keyword optimization, result parsing, and metadata extraction
Checking ChEMBL for Structured SAR Data
Check if medicinal chemistry papers are in ChEMBL database to access curated bioactivity data
Didn't find tool you were looking for?