Agent skill
Finding Open Access Papers
Use Unpaywall API to find free full-text versions of paywalled papers
Install this agent skill to your Project
npx add-skill https://github.com/kthorn/research-superpower/tree/main/skills/research/finding-open-access-papers
SKILL.md
Finding Open Access Papers
Overview
Use Unpaywall to find legally available open access versions of papers that appear to be behind paywalls.
Core principle: Many paywalled papers have free versions (preprints, author manuscripts, institutional repositories). Unpaywall finds them.
When to Use
Use this skill when:
- DOI resolution hits a paywall
- Paper not available in PubMed Central
- Publisher site requires subscription
- Need full text for highly relevant paper (score ≥7)
Use BEFORE giving up on full text access
Unpaywall API
Simple REST API - no authentication required for reasonable usage
Basic Request
curl "https://api.unpaywall.org/v2/DOI?email=YOUR_EMAIL"
Parameters:
DOI- The paper's DOI (URL-encoded if needed)email- User's email (required, for courtesy/contact)
IMPORTANT: Ask user for their email at the start of research session. Do NOT use placeholder emails like claude@anthropic.com or researcher@example.com.
Example:
curl "https://api.unpaywall.org/v2/10.1038/nature12373?email=user@example.com"
Response Format
{
"doi": "10.1038/nature12373",
"title": "Paper Title",
"is_oa": true,
"best_oa_location": {
"url": "https://europepmc.org/articles/pmc3858213",
"url_for_pdf": "https://europepmc.org/articles/pmc3858213?pdf=render",
"version": "publishedVersion",
"license": "cc-by",
"host_type": "repository"
},
"oa_locations": [
{
"url": "https://europepmc.org/articles/pmc3858213",
"version": "publishedVersion"
},
{
"url": "https://arxiv.org/abs/1234.5678",
"version": "submittedVersion"
}
]
}
Key Response Fields
is_oa (boolean)
true- Open access version availablefalse- No free version found
best_oa_location (object or null)
- Unpaywall's recommended best open access source
- Prioritizes published versions over preprints
- Includes PDF URL when available
oa_locations (array)
- All known open access locations
- Includes repositories, preprint servers, institutional sites
- Ordered by quality/version
version types:
publishedVersion- Final published version (best)acceptedVersion- Author's accepted manuscript (good)submittedVersion- Preprint before peer review (useful)
Implementation Pattern
1. Check Unpaywall After Paywall Hit
# Try DOI first
curl -L "https://doi.org/10.1234/example.2023"
# If paywall detected (403, subscription required, etc):
curl "https://api.unpaywall.org/v2/10.1234/example.2023?email=your@email.com"
2. Extract Best URL
# Parse JSON response
response=$(curl -s "https://api.unpaywall.org/v2/DOI?email=EMAIL")
# Check if OA available
is_oa=$(echo $response | jq -r '.is_oa')
if [ "$is_oa" = "true" ]; then
# Get best PDF URL
pdf_url=$(echo $response | jq -r '.best_oa_location.url_for_pdf // .best_oa_location.url')
# Download
curl -L -o "papers/paper.pdf" "$pdf_url"
fi
3. Report to User
When OA found:
⚠️ Paper behind paywall at publisher
✓ Found open access version via Unpaywall!
Source: Europe PMC (published version)
PDF: https://europepmc.org/articles/pmc3858213?pdf=render
→ Downloading...
When no OA found:
⚠️ Paper behind paywall at publisher
✗ No open access version found via Unpaywall
Options:
- Request via institutional access
- Contact authors for preprint
- Continue with abstract only
4. Prioritize by Version
If multiple locations available:
Priority order:
publishedVersionfrom publisher or PMCacceptedVersionfrom institutional repositorysubmittedVersionfrom preprint server (arXiv, bioRxiv)
Integration with evaluating-paper-relevance
Add to full text fetching workflow:
Stage 2: Fetch Full Text
Try in order:
A. PubMed Central (free full text)
B. DOI resolution → If paywall, try Unpaywall
C. Unpaywall direct lookup
D. Preprints (bioRxiv, arXiv)
Updated workflow:
# 1. Try PMC
pmc_result=$(curl "https://eutils.ncbi.nlm.nih.gov/...")
if has_pmc_fulltext; then
fetch_pmc
exit 0
fi
# 2. Try DOI
doi_result=$(curl -L "https://doi.org/$doi")
if is_paywall; then
# 3. Try Unpaywall
unpaywall_result=$(curl "https://api.unpaywall.org/v2/$doi?email=$EMAIL")
if has_oa; then
fetch_unpaywall_pdf
exit 0
fi
fi
# 4. No full text available
report_no_fulltext
Rate Limiting
Free tier (with email):
- 100,000 requests per day
- No hard rate limit, but be respectful
- Include email in requests (required)
Best practices:
- Add 100ms delay between requests
- Cache responses (don't re-check same DOI)
- Only check for papers you actually need
Python Helper Example
import requests
import time
def find_open_access(doi, email):
"""
Find open access version via Unpaywall
Returns: (pdf_url, version, source) or (None, None, None)
"""
url = f"https://api.unpaywall.org/v2/{doi}"
params = {"email": email}
try:
response = requests.get(url, params=params, timeout=10)
response.raise_for_status()
data = response.json()
if not data.get('is_oa'):
return None, None, None
best_loc = data.get('best_oa_location')
if not best_loc:
return None, None, None
pdf_url = best_loc.get('url_for_pdf') or best_loc.get('url')
version = best_loc.get('version', 'unknown')
source = best_loc.get('host_type', 'unknown')
return pdf_url, version, source
except Exception as e:
print(f"Error checking Unpaywall for {doi}: {e}")
return None, None, None
# Usage
doi = "10.1038/nature12373"
pdf_url, version, source = find_open_access(doi, "researcher@example.com")
if pdf_url:
print(f"Found {version} at {source}")
print(f"PDF: {pdf_url}")
# Download PDF
response = requests.get(pdf_url)
with open(f'papers/{doi.replace("/", "_")}.pdf', 'wb') as f:
f.write(response.content)
else:
print("No open access version found")
time.sleep(0.1) # Rate limiting
Common Sources Found
Repositories:
- Europe PMC / PubMed Central
- Institutional repositories (university sites)
- PubMed Central international mirrors
Preprint servers:
- bioRxiv (biology)
- medRxiv (medicine)
- arXiv (physics, CS, math)
- ChemRxiv (chemistry)
Publisher sites:
- Open access journals
- Hybrid journals (OA articles in subscription journals)
- Delayed open access (embargo expired)
Error Handling
DOI not found:
{
"error": "true",
"message": "DOI not found"
}
→ Check DOI format, try alternative identifiers
Network errors:
- Retry with exponential backoff
- Maximum 3 attempts
- Report to user if all fail
Malformed response:
- Check for
is_oafield - Fallback to
oa_locationsarray ifbest_oa_locationmissing
Quick Reference
| Task | Command |
|---|---|
| Check if OA available | curl "https://api.unpaywall.org/v2/DOI?email=EMAIL" |
| Get best PDF URL | Parse .best_oa_location.url_for_pdf |
| List all OA sources | Parse .oa_locations[] |
| Check version type | Look at .version field |
| Download PDF | curl -L -o paper.pdf "$pdf_url" |
Integration Points
Called by:
evaluating-paper-relevance- When full text not in PMCanswering-research-questions- For highly relevant papers
Updates:
papers-reviewed.json- Note if OA foundSUMMARY.md- Include OA source info
Common Mistakes
Using placeholder email: Using claude@anthropic.com or researcher@example.com → Ask user for their real email
Not including email: Required parameter, requests will fail
Checking every paper: Only check when needed (score ≥7, no PMC)
Ignoring version type: Published version better than preprint
Single source only: Check oa_locations array for alternatives
No rate limiting: Add delays even though no hard limit
Success Criteria
Successful when:
- Paywalled paper's OA version found and downloaded
- Version type recorded (published/accepted/submitted)
- User informed about source and version
- Fallback options provided if no OA available
Next Steps
After finding OA version:
- Download PDF to papers/ folder
- Note source and version in SUMMARY.md
- Continue with deep dive analysis
- If no OA: note in summary, continue with abstract only
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
Getting Started with Research Superpowers
Introduction to literature search & review skills - systematic paper finding, screening, extraction, and citation traversal
Cleaning Up Research Sessions
Safely remove intermediate files from completed research sessions while preserving important data
Subagent-Driven Literature Review
Use parallel subagents for large-scale paper screening and deep dive analysis
Building Paper Screening Rubrics
Collaboratively build and refine paper screening rubrics through brainstorming, test-driven development, and iterative feedback
Searching Scientific Literature
PubMed search with keyword optimization, result parsing, and metadata extraction
Checking ChEMBL for Structured SAR Data
Check if medicinal chemistry papers are in ChEMBL database to access curated bioactivity data
Didn't find tool you were looking for?