Agent skill

bio-entrez-fetch

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-entrez-fetch

SKILL.md

name: bio-entrez-fetch description: Retrieve records from NCBI databases using Biopython Bio.Entrez. Use when downloading sequences, fetching GenBank records, getting document summaries, or parsing NCBI data into Biopython objects. tool_type: python primary_tool: Bio.Entrez measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

read_file
run_shell_command

Entrez Fetch

Retrieve records from NCBI databases using Biopython's Entrez module (EFetch, ESummary utilities).

Required Setup

python

from Bio import Entrez

Entrez.email = 'your.email@example.com'  # Required by NCBI
Entrez.api_key = 'your_api_key'          # Optional, raises rate limit 3->10 req/sec

Core Functions

Entrez.efetch() - Retrieve Full Records

Fetch complete records in various formats from any NCBI database.

python

# Fetch GenBank record by ID
handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='gb', retmode='text')
genbank_text = handle.read()
handle.close()

# Fetch FASTA sequence
handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='fasta', retmode='text')
fasta_text = handle.read()
handle.close()

# Fetch multiple records
handle = Entrez.efetch(db='nucleotide', id='NM_007294,NM_000059', rettype='fasta', retmode='text')

Key Parameters:

Parameter	Description	Example
`db`	Database name	`'nucleotide'`, `'protein'`, `'pubmed'`
`id`	Record ID(s)	`'NM_007294'` or `'123,456,789'`
`rettype`	Return type	`'fasta'`, `'gb'`, `'abstract'`
`retmode`	Return mode	`'text'`, `'xml'`
`retstart`	Start index	`0`
`retmax`	Max records	`20`
`WebEnv`	History server session	From esearch
`query_key`	History server query	From esearch

Common Return Types by Database

Nucleotide/Protein:

rettype	retmode	Description
`'fasta'`	`'text'`	FASTA sequence
`'gb'`	`'text'`	GenBank flat file
`'gp'`	`'text'`	GenPept flat file (protein)
`'gbwithparts'`	`'text'`	GenBank with contig sequences
`'seqid'`	`'text'`	Seq-id only
`'acc'`	`'text'`	Accession only

PubMed:

rettype	retmode	Description
`'abstract'`	`'text'`	Abstract text
`'medline'`	`'text'`	MEDLINE format
`'xml'`	`'xml'`	Full PubMed XML

Gene:

rettype	retmode	Description
`'gene_table'`	`'text'`	Gene table format
`'xml'`	`'xml'`	Full gene XML

Entrez.esummary() - Document Summaries

Get brief summaries without downloading full records. Faster than efetch.

python

# Get summary for nucleotide record
handle = Entrez.esummary(db='nucleotide', id='NM_007294')
record = Entrez.read(handle)
handle.close()

summary = record[0]  # First (only) record
print(f"Title: {summary['Title']}")
print(f"Length: {summary['Length']}")
print(f"Organism: {summary['Organism']}")

Common Summary Fields:

python

# Nucleotide/Protein
summary['Title']          # Record title/description
summary['Caption']        # Short identifier
summary['Length']         # Sequence length
summary['Organism']       # Source organism
summary['TaxId']          # Taxonomy ID
summary['AccessionVersion']  # Full accession.version

# PubMed
summary['Title']          # Article title
summary['AuthorList']     # Authors
summary['Source']         # Journal
summary['PubDate']        # Publication date
summary['DOI']            # Digital Object Identifier

Parsing with Biopython

Parse into SeqRecord Objects

python

from Bio import Entrez, SeqIO

Entrez.email = 'your.email@example.com'

# Parse GenBank into SeqRecord
handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='gb', retmode='text')
record = SeqIO.read(handle, 'genbank')
handle.close()

print(f"ID: {record.id}")
print(f"Length: {len(record.seq)}")
print(f"Features: {len(record.features)}")

# Parse FASTA into SeqRecord
handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='fasta', retmode='text')
record = SeqIO.read(handle, 'fasta')
handle.close()

Parse Multiple Records

python

# Fetch multiple as FASTA
handle = Entrez.efetch(db='nucleotide', id='NM_007294,NM_000059,NM_000546', rettype='fasta', retmode='text')
records = list(SeqIO.parse(handle, 'fasta'))
handle.close()

for record in records:
    print(f"{record.id}: {len(record.seq)} bp")

Parse XML with Entrez.read()

python

# For structured data, use XML mode
handle = Entrez.efetch(db='gene', id='672', retmode='xml')
records = Entrez.read(handle)
handle.close()

# Navigate nested structure
gene = records[0]
print(f"Gene: {gene['Entrezgene_gene']['Gene-ref']['Gene-ref_locus']}")

Code Patterns

Fetch Sequence by Accession

python

from Bio import Entrez, SeqIO

Entrez.email = 'your.email@example.com'

def fetch_sequence(accession, db='nucleotide'):
    handle = Entrez.efetch(db=db, id=accession, rettype='fasta', retmode='text')
    record = SeqIO.read(handle, 'fasta')
    handle.close()
    return record

seq = fetch_sequence('NM_007294')
print(f"{seq.id}: {seq.seq[:50]}...")

Fetch GenBank with Features

python

def fetch_genbank(accession):
    handle = Entrez.efetch(db='nucleotide', id=accession, rettype='gb', retmode='text')
    record = SeqIO.read(handle, 'genbank')
    handle.close()
    return record

gb = fetch_genbank('NM_007294')
for feature in gb.features:
    if feature.type == 'CDS':
        print(f"CDS: {feature.location}")
        print(f"Product: {feature.qualifiers.get('product', ['?'])[0]}")

Fetch PubMed Abstract

python

def fetch_abstract(pmid):
    handle = Entrez.efetch(db='pubmed', id=pmid, rettype='abstract', retmode='text')
    abstract = handle.read()
    handle.close()
    return abstract

abstract = fetch_abstract('35412348')
print(abstract)

Get Record Summaries

python

def get_summaries(db, ids):
    if isinstance(ids, list):
        ids = ','.join(ids)
    handle = Entrez.esummary(db=db, id=ids)
    records = Entrez.read(handle)
    handle.close()
    return records

summaries = get_summaries('nucleotide', ['NM_007294', 'NM_000059'])
for s in summaries:
    print(f"{s['Caption']}: {s['Title'][:50]}... ({s['Length']} bp)")

Search Then Fetch

python

# Search for records
handle = Entrez.esearch(db='nucleotide', term='human[orgn] AND insulin[gene] AND mRNA[fkey]', retmax=5)
search_results = Entrez.read(handle)
handle.close()

ids = search_results['IdList']

# Fetch the sequences
handle = Entrez.efetch(db='nucleotide', id=','.join(ids), rettype='fasta', retmode='text')
records = list(SeqIO.parse(handle, 'fasta'))
handle.close()

for record in records:
    print(f"{record.id}: {len(record.seq)} bp")

Fetch Protein by Gene ID

python

# Search gene database
handle = Entrez.esearch(db='gene', term='BRCA1[sym] AND human[orgn]')
result = Entrez.read(handle)
handle.close()
gene_id = result['IdList'][0]

# Get linked protein IDs
handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id)
links = Entrez.read(handle)
handle.close()

protein_ids = [link['Id'] for link in links[0]['LinkSetDb'][0]['Link'][:3]]

# Fetch proteins
handle = Entrez.efetch(db='protein', id=','.join(protein_ids), rettype='fasta', retmode='text')
proteins = list(SeqIO.parse(handle, 'fasta'))
handle.close()

Save Fetched Records to File

python

def download_sequences(ids, output_file, db='nucleotide', format='fasta'):
    handle = Entrez.efetch(db=db, id=','.join(ids), rettype=format, retmode='text')
    with open(output_file, 'w') as out:
        out.write(handle.read())
    handle.close()

download_sequences(['NM_007294', 'NM_000059'], 'brca_genes.fasta')

Common Errors

Error	Cause	Solution
`HTTPError 400`	Invalid ID or parameters	Verify ID exists, check rettype
`HTTPError 429`	Rate limit exceeded	Add delays or use API key
Empty result	Record doesn't exist	Verify accession in web browser
`ValueError` in SeqIO	Wrong format specified	Match rettype with SeqIO format
`ExpatError`	XML parsing error	Use `retmode='text'` instead

Decision Tree

Need to retrieve NCBI records?
├── Need full sequence?
│   └── Use efetch with rettype='fasta'
├── Need sequence + annotations?
│   └── Use efetch with rettype='gb' (GenBank)
├── Just need metadata (length, organism)?
│   └── Use esummary (faster)
├── Need PubMed abstract?
│   └── Use efetch with rettype='abstract'
├── Need structured data for parsing?
│   └── Use efetch with retmode='xml' + Entrez.read()
├── Downloading many records?
│   └── See batch-downloads skill
└── Need records from multiple databases?
    └── See entrez-link skill first

Related Skills

entrez-search - Find record IDs before fetching
entrez-link - Find related records in other databases
batch-downloads - Download large numbers of records efficiently
sequence-io/read-sequences - Parse downloaded sequences with SeqIO

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/bio-entrez-fetch
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Entrez Fetch

Required Setup

Core Functions

Entrez.efetch() - Retrieve Full Records

Common Return Types by Database

Entrez.esummary() - Document Summaries

Parsing with Biopython

Parse into SeqRecord Objects

Parse Multiple Records

Parse XML with Entrez.read()

Code Patterns

Fetch Sequence by Accession

Fetch GenBank with Features

Fetch PubMed Abstract

Get Record Summaries

Search Then Fetch

Fetch Protein by Gene ID

Save Fetched Records to File

Common Errors

Decision Tree

Related Skills

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations