Agent skill

arxiv

Search and retrieve academic papers from arXiv using their free REST API. No API key needed. Search by keyword, author, category, or ID. Combine with web_extract or the ocr-and-documents skill to read full paper content.

View SKILL.md on GitHub Repository

Stars 56,643

Forks 7,481

Install this agent skill to your Project

npx add-skill https://github.com/NousResearch/hermes-agent/tree/main/skills/research/arxiv

Metadata

Additional technical details for this skill

hermes: { "tags": [ "Research", "Arxiv", "Papers", "Academic", "Science", "API" ], "related_skills": [ "ocr-and-documents" ] }

SKILL.md

arXiv Research

Search and retrieve academic papers from arXiv via their free REST API. No API key, no dependencies — just curl.

Quick Reference

Action	Command
Search papers	`curl "https://export.arxiv.org/api/query?search_query=all:QUERY&max_results=5"`
Get specific paper	`curl "https://export.arxiv.org/api/query?id_list=2402.03300"`
Read abstract (web)	`web_extract(urls=["https://arxiv.org/abs/2402.03300"])`
Read full paper (PDF)	`web_extract(urls=["https://arxiv.org/pdf/2402.03300"])`

Searching Papers

The API returns Atom XML. Parse with grep/sed or pipe through python3 for clean output.

Basic search

bash

curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5"

Clean output (parse XML to readable format)

bash

curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5&sortBy=submittedDate&sortOrder=descending" | python3 -c "
import sys, xml.etree.ElementTree as ET
ns = {'a': 'http://www.w3.org/2005/Atom'}
root = ET.parse(sys.stdin).getroot()
for i, entry in enumerate(root.findall('a:entry', ns)):
    title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
    arxiv_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
    published = entry.find('a:published', ns).text[:10]
    authors = ', '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
    summary = entry.find('a:summary', ns).text.strip()[:200]
    cats = ', '.join(c.get('term') for c in entry.findall('a:category', ns))
    print(f'{i+1}. [{arxiv_id}] {title}')
    print(f'   Authors: {authors}')
    print(f'   Published: {published} | Categories: {cats}')
    print(f'   Abstract: {summary}...')
    print(f'   PDF: https://arxiv.org/pdf/{arxiv_id}')
    print()
"

Search Query Syntax

Prefix	Searches	Example
`all:`	All fields	`all:transformer+attention`
`ti:`	Title	`ti:large+language+models`
`au:`	Author	`au:vaswani`
`abs:`	Abstract	`abs:reinforcement+learning`
`cat:`	Category	`cat:cs.AI`
`co:`	Comment	`co:accepted+NeurIPS`

Boolean operators

# AND (default when using +)
search_query=all:transformer+attention

# OR
search_query=all:GPT+OR+all:BERT

# AND NOT
search_query=all:language+model+ANDNOT+all:vision

# Exact phrase
search_query=ti:"chain+of+thought"

# Combined
search_query=au:hinton+AND+cat:cs.LG

Sort and Pagination

Parameter	Options
`sortBy`	`relevance`, `lastUpdatedDate`, `submittedDate`
`sortOrder`	`ascending`, `descending`
`start`	Result offset (0-based)
`max_results`	Number of results (default 10, max 30000)

bash

# Latest 10 papers in cs.AI
curl -s "https://export.arxiv.org/api/query?search_query=cat:cs.AI&sortBy=submittedDate&sortOrder=descending&max_results=10"

Fetching Specific Papers

bash

# By arXiv ID
curl -s "https://export.arxiv.org/api/query?id_list=2402.03300"

# Multiple papers
curl -s "https://export.arxiv.org/api/query?id_list=2402.03300,2401.12345,2403.00001"

BibTeX Generation

After fetching metadata for a paper, generate a BibTeX entry:

{% raw %}

bash

curl -s "https://export.arxiv.org/api/query?id_list=1706.03762" | python3 -c "
import sys, xml.etree.ElementTree as ET
ns = {'a': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'}
root = ET.parse(sys.stdin).getroot()
entry = root.find('a:entry', ns)
if entry is None: sys.exit('Paper not found')
title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
authors = ' and '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
year = entry.find('a:published', ns).text[:4]
raw_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
cat = entry.find('arxiv:primary_category', ns)
primary = cat.get('term') if cat is not None else 'cs.LG'
last_name = entry.find('a:author', ns).find('a:name', ns).text.split()[-1]
print(f'@article{{{last_name}{year}_{raw_id.replace(\".\", \"\")},')
print(f'  title     = {{{title}}},')
print(f'  author    = {{{authors}}},')
print(f'  year      = {{{year}}},')
print(f'  eprint    = {{{raw_id}}},')
print(f'  archivePrefix = {{arXiv}},')
print(f'  primaryClass  = {{{primary}}},')
print(f'  url       = {{https://arxiv.org/abs/{raw_id}}}')
print('}')
"

{% endraw %}

Reading Paper Content

After finding a paper, read it:

# Abstract page (fast, metadata + abstract)
web_extract(urls=["https://arxiv.org/abs/2402.03300"])

# Full paper (PDF → markdown via Firecrawl)
web_extract(urls=["https://arxiv.org/pdf/2402.03300"])

For local PDF processing, see the ocr-and-documents skill.

Common Categories

Category	Field
`cs.AI`	Artificial Intelligence
`cs.CL`	Computation and Language (NLP)
`cs.CV`	Computer Vision
`cs.LG`	Machine Learning
`cs.CR`	Cryptography and Security
`stat.ML`	Machine Learning (Statistics)
`math.OC`	Optimization and Control
`physics.comp-ph`	Computational Physics

Full list: https://arxiv.org/category_taxonomy

Helper Script

The scripts/search_arxiv.py script handles XML parsing and provides clean output:

bash

python scripts/search_arxiv.py "GRPO reinforcement learning"
python scripts/search_arxiv.py "transformer attention" --max 10 --sort date
python scripts/search_arxiv.py --author "Yann LeCun" --max 5
python scripts/search_arxiv.py --category cs.AI --sort date
python scripts/search_arxiv.py --id 2402.03300
python scripts/search_arxiv.py --id 2402.03300,2401.12345

No dependencies — uses only Python stdlib.

Semantic Scholar (Citations, Related Papers, Author Profiles)

arXiv doesn't provide citation data or recommendations. Use the Semantic Scholar API for that — free, no key needed for basic use (1 req/sec), returns JSON.

Get paper details + citations

bash

# By arXiv ID
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300?fields=title,authors,citationCount,referenceCount,influentialCitationCount,year,abstract" | python3 -m json.tool

# By Semantic Scholar paper ID or DOI
curl -s "https://api.semanticscholar.org/graph/v1/paper/DOI:10.1234/example?fields=title,citationCount"

Get citations OF a paper (who cited it)

bash

curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/citations?fields=title,authors,year,citationCount&limit=10" | python3 -m json.tool

Get references FROM a paper (what it cites)

bash

curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/references?fields=title,authors,year,citationCount&limit=10" | python3 -m json.tool

Search papers (alternative to arXiv search, returns JSON)

bash

curl -s "https://api.semanticscholar.org/graph/v1/paper/search?query=GRPO+reinforcement+learning&limit=5&fields=title,authors,year,citationCount,externalIds" | python3 -m json.tool

Get paper recommendations

bash

curl -s -X POST "https://api.semanticscholar.org/recommendations/v1/papers/" \
  -H "Content-Type: application/json" \
  -d '{"positivePaperIds": ["arXiv:2402.03300"], "negativePaperIds": []}' | python3 -m json.tool

Author profile

bash

curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=Yann+LeCun&fields=name,hIndex,citationCount,paperCount" | python3 -m json.tool

Useful Semantic Scholar fields

title, authors, year, abstract, citationCount, referenceCount, influentialCitationCount, isOpenAccess, openAccessPdf, fieldsOfStudy, publicationVenue, externalIds (contains arXiv ID, DOI, etc.)

Complete Research Workflow

Discover: python scripts/search_arxiv.py "your topic" --sort date --max 10
Assess impact: curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID?fields=citationCount,influentialCitationCount"
Read abstract: web_extract(urls=["https://arxiv.org/abs/ID"])
Read full paper: web_extract(urls=["https://arxiv.org/pdf/ID"])
Find related work: curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID/references?fields=title,citationCount&limit=20"
Get recommendations: POST to Semantic Scholar recommendations endpoint
Track authors: curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=NAME"

Rate Limits

API	Rate	Auth
arXiv	~1 req / 3 seconds	None needed
Semantic Scholar	1 req / second	None (100/sec with API key)

Notes

arXiv returns Atom XML — use the helper script or parsing snippet for clean output
Semantic Scholar returns JSON — pipe through python3 -m json.tool for readability
arXiv IDs: old format (hep-th/0601001) vs new (2402.03300)
PDF: https://arxiv.org/pdf/{id} — Abstract: https://arxiv.org/abs/{id}
HTML (when available): https://arxiv.org/html/{id}
For local PDF processing, see the ocr-and-documents skill

ID Versioning

arxiv.org/abs/1706.03762 always resolves to the latest version
arxiv.org/abs/1706.03762v1 points to a specific immutable version
When generating citations, preserve the version suffix you actually read to prevent citation drift (a later version may substantially change content)
The API <id> field returns the versioned URL (e.g., http://arxiv.org/abs/1706.03762v7)

Withdrawn Papers

Papers can be withdrawn after submission. When this happens:

The <summary> field contains a withdrawal notice (look for "withdrawn" or "retracted")
Metadata fields may be incomplete
Always check the summary before treating a result as a valid paper

Maintainer

NousResearch Core maintainer

Source details

Full Name: NousResearch/hermes-agent
Branch: main
Path in repo: skills/research/arxiv
License: MIT License
Topics: ai claude-code anthropic claude ai-agents clawdbot llm openclaw ai-agent codex chatgpt moltbot openai hermes hermes-agent nous-research

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

NousResearch/hermes-agent

agentmail

Give the agent its own dedicated email inbox via AgentMail. Send, receive, and manage email autonomously using agent-owned email addresses (e.g. hermes-agent@agentmail.to).

56,643 7,481

Explore

NousResearch/hermes-agent

base

Query Base (Ethereum L2) blockchain data with USD pricing — wallet balances, token info, transaction details, gas analysis, contract inspection, whale detection, and live network stats. Uses Base RPC + CoinGecko. No API key required.

56,643 7,481

Explore

NousResearch/hermes-agent

solana

Query Solana blockchain data with USD pricing — wallet balances, token portfolios with values, transaction details, NFTs, whale detection, and live network stats. Uses Solana RPC + CoinGecko. No API key required.

56,643 7,481

Explore

NousResearch/hermes-agent

one-three-one-rule

Structured decision-making framework for technical proposals and trade-off analysis. When the user faces a choice between multiple approaches (architecture decisions, tool selection, refactoring strategies, migration paths), this skill produces a 1-3-1 format: one clear problem statement, three distinct options with pros/cons, and one concrete recommendation with definition of done and implementation plan. Use when the user asks for a "1-3-1", says "give me options", or needs help choosing between competing approaches.

56,643 7,481

Explore

NousResearch/hermes-agent

fastmcp

Build, test, inspect, install, and deploy MCP servers with FastMCP in Python. Use when creating a new MCP server, wrapping an API or database as MCP tools, exposing resources or prompts, or preparing a FastMCP server for Claude Code, Cursor, or HTTP deployment.

56,643 7,481

Explore

NousResearch/hermes-agent

qdrant-vector-search

High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.

56,643 7,481

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

arXiv Research

Quick Reference

Searching Papers

Basic search

Clean output (parse XML to readable format)

Search Query Syntax

Boolean operators

Sort and Pagination

Fetching Specific Papers

BibTeX Generation

Reading Paper Content

Common Categories

Helper Script

Semantic Scholar (Citations, Related Papers, Author Profiles)

Get paper details + citations

Get citations OF a paper (who cited it)

Get references FROM a paper (what it cites)

Search papers (alternative to arXiv search, returns JSON)

Get paper recommendations

Author profile

Useful Semantic Scholar fields

Complete Research Workflow

Rate Limits

Notes

ID Versioning

Withdrawn Papers

Recommended Agent Skills

agentmail

base

solana

one-three-one-rule

fastmcp

qdrant-vector-search