Agent skills
Finding Open Access Papers

Agent skill

Finding Open Access Papers

Use Unpaywall API to find free full-text versions of paywalled papers

View SKILL.md on GitHub Repository

Stars 38

Forks 2

Install this agent skill to your Project

npx add-skill https://github.com/kthorn/research-superpower/tree/main/skills/research/finding-open-access-papers

SKILL.md

Finding Open Access Papers

Overview

Use Unpaywall to find legally available open access versions of papers that appear to be behind paywalls.

Core principle: Many paywalled papers have free versions (preprints, author manuscripts, institutional repositories). Unpaywall finds them.

When to Use

Use this skill when:

DOI resolution hits a paywall
Paper not available in PubMed Central
Publisher site requires subscription
Need full text for highly relevant paper (score ≥7)

Use BEFORE giving up on full text access

Unpaywall API

Simple REST API - no authentication required for reasonable usage

Basic Request

bash

curl "https://api.unpaywall.org/v2/DOI?email=YOUR_EMAIL"

Parameters:

DOI - The paper's DOI (URL-encoded if needed)
email - User's email (required, for courtesy/contact)

IMPORTANT: Ask user for their email at the start of research session. Do NOT use placeholder emails like claude@anthropic.com or researcher@example.com.

Example:

bash

curl "https://api.unpaywall.org/v2/10.1038/nature12373?email=user@example.com"

Response Format

json

{
  "doi": "10.1038/nature12373",
  "title": "Paper Title",
  "is_oa": true,
  "best_oa_location": {
    "url": "https://europepmc.org/articles/pmc3858213",
    "url_for_pdf": "https://europepmc.org/articles/pmc3858213?pdf=render",
    "version": "publishedVersion",
    "license": "cc-by",
    "host_type": "repository"
  },
  "oa_locations": [
    {
      "url": "https://europepmc.org/articles/pmc3858213",
      "version": "publishedVersion"
    },
    {
      "url": "https://arxiv.org/abs/1234.5678",
      "version": "submittedVersion"
    }
  ]
}

Key Response Fields

is_oa (boolean)

true - Open access version available
false - No free version found

best_oa_location (object or null)

Unpaywall's recommended best open access source
Prioritizes published versions over preprints
Includes PDF URL when available

oa_locations (array)

All known open access locations
Includes repositories, preprint servers, institutional sites
Ordered by quality/version

version types:

publishedVersion - Final published version (best)
acceptedVersion - Author's accepted manuscript (good)
submittedVersion - Preprint before peer review (useful)

Implementation Pattern

1. Check Unpaywall After Paywall Hit

bash

# Try DOI first
curl -L "https://doi.org/10.1234/example.2023"

# If paywall detected (403, subscription required, etc):
curl "https://api.unpaywall.org/v2/10.1234/example.2023?email=your@email.com"

2. Extract Best URL

bash

# Parse JSON response
response=$(curl -s "https://api.unpaywall.org/v2/DOI?email=EMAIL")

# Check if OA available
is_oa=$(echo $response | jq -r '.is_oa')

if [ "$is_oa" = "true" ]; then
  # Get best PDF URL
  pdf_url=$(echo $response | jq -r '.best_oa_location.url_for_pdf // .best_oa_location.url')

  # Download
  curl -L -o "papers/paper.pdf" "$pdf_url"
fi

3. Report to User

When OA found:

⚠️ Paper behind paywall at publisher
✓ Found open access version via Unpaywall!
   Source: Europe PMC (published version)
   PDF: https://europepmc.org/articles/pmc3858213?pdf=render
   → Downloading...

When no OA found:

⚠️ Paper behind paywall at publisher
✗ No open access version found via Unpaywall
   Options:
   - Request via institutional access
   - Contact authors for preprint
   - Continue with abstract only

4. Prioritize by Version

If multiple locations available:

Priority order:

publishedVersion from publisher or PMC
acceptedVersion from institutional repository
submittedVersion from preprint server (arXiv, bioRxiv)

Integration with evaluating-paper-relevance

Add to full text fetching workflow:

Stage 2: Fetch Full Text

Try in order:
A. PubMed Central (free full text)
B. DOI resolution → If paywall, try Unpaywall
C. Unpaywall direct lookup
D. Preprints (bioRxiv, arXiv)

Updated workflow:

bash

# 1. Try PMC
pmc_result=$(curl "https://eutils.ncbi.nlm.nih.gov/...")
if has_pmc_fulltext; then
  fetch_pmc
  exit 0
fi

# 2. Try DOI
doi_result=$(curl -L "https://doi.org/$doi")
if is_paywall; then
  # 3. Try Unpaywall
  unpaywall_result=$(curl "https://api.unpaywall.org/v2/$doi?email=$EMAIL")
  if has_oa; then
    fetch_unpaywall_pdf
    exit 0
  fi
fi

# 4. No full text available
report_no_fulltext

Rate Limiting

Free tier (with email):

100,000 requests per day
No hard rate limit, but be respectful
Include email in requests (required)

Best practices:

Add 100ms delay between requests
Cache responses (don't re-check same DOI)
Only check for papers you actually need

Python Helper Example

python

import requests
import time

def find_open_access(doi, email):
    """
    Find open access version via Unpaywall
    Returns: (pdf_url, version, source) or (None, None, None)
    """
    url = f"https://api.unpaywall.org/v2/{doi}"
    params = {"email": email}

    try:
        response = requests.get(url, params=params, timeout=10)
        response.raise_for_status()
        data = response.json()

        if not data.get('is_oa'):
            return None, None, None

        best_loc = data.get('best_oa_location')
        if not best_loc:
            return None, None, None

        pdf_url = best_loc.get('url_for_pdf') or best_loc.get('url')
        version = best_loc.get('version', 'unknown')
        source = best_loc.get('host_type', 'unknown')

        return pdf_url, version, source

    except Exception as e:
        print(f"Error checking Unpaywall for {doi}: {e}")
        return None, None, None

# Usage
doi = "10.1038/nature12373"
pdf_url, version, source = find_open_access(doi, "researcher@example.com")

if pdf_url:
    print(f"Found {version} at {source}")
    print(f"PDF: {pdf_url}")
    # Download PDF
    response = requests.get(pdf_url)
    with open(f'papers/{doi.replace("/", "_")}.pdf', 'wb') as f:
        f.write(response.content)
else:
    print("No open access version found")

time.sleep(0.1)  # Rate limiting

Common Sources Found

Repositories:

Europe PMC / PubMed Central
Institutional repositories (university sites)
PubMed Central international mirrors

Preprint servers:

bioRxiv (biology)
medRxiv (medicine)
arXiv (physics, CS, math)
ChemRxiv (chemistry)

Publisher sites:

Open access journals
Hybrid journals (OA articles in subscription journals)
Delayed open access (embargo expired)

Error Handling

DOI not found:

json

{
  "error": "true",
  "message": "DOI not found"
}

→ Check DOI format, try alternative identifiers

Network errors:

Retry with exponential backoff
Maximum 3 attempts
Report to user if all fail

Malformed response:

Check for is_oa field
Fallback to oa_locations array if best_oa_location missing

Quick Reference

Task	Command
Check if OA available	`curl "https://api.unpaywall.org/v2/DOI?email=EMAIL"`
Get best PDF URL	Parse `.best_oa_location.url_for_pdf`
List all OA sources	Parse `.oa_locations[]`
Check version type	Look at `.version` field
Download PDF	`curl -L -o paper.pdf "$pdf_url"`

Integration Points

Called by:

evaluating-paper-relevance - When full text not in PMC
answering-research-questions - For highly relevant papers

Updates:

papers-reviewed.json - Note if OA found
SUMMARY.md - Include OA source info

Common Mistakes

Using placeholder email: Using claude@anthropic.com or researcher@example.com → Ask user for their real email Not including email: Required parameter, requests will fail Checking every paper: Only check when needed (score ≥7, no PMC) Ignoring version type: Published version better than preprint Single source only: Check oa_locations array for alternatives No rate limiting: Add delays even though no hard limit

Success Criteria

Successful when:

Paywalled paper's OA version found and downloaded
Version type recorded (published/accepted/submitted)
User informed about source and version
Fallback options provided if no OA available

Next Steps

After finding OA version:

Download PDF to papers/ folder
Note source and version in SUMMARY.md
Continue with deep dive analysis
If no OA: note in summary, continue with abstract only

Maintainer

kthorn Core maintainer

Source details

Full Name: kthorn/research-superpower
Branch: main
Path in repo: skills/research/finding-open-access-papers
License: MIT License

Featured Tools

Join Our Newsletter

Check if medicinal chemistry papers are in ChEMBL database to access curated bioactivity data

38 2

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Finding Open Access Papers

Overview

When to Use

Unpaywall API

Basic Request

Response Format

Key Response Fields

Implementation Pattern

1. Check Unpaywall After Paywall Hit

2. Extract Best URL

3. Report to User

4. Prioritize by Version

Integration with evaluating-paper-relevance

Rate Limiting

Python Helper Example

Common Sources Found

Error Handling

Quick Reference

Integration Points

Common Mistakes

Success Criteria

Next Steps

Recommended Agent Skills

Getting Started with Research Superpowers

Cleaning Up Research Sessions

Subagent-Driven Literature Review

Building Paper Screening Rubrics

Searching Scientific Literature

Checking ChEMBL for Structured SAR Data