Agent skill

bio-entrez-fetch

Stars 2,009
Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-entrez-fetch

SKILL.md


name: bio-entrez-fetch description: Retrieve records from NCBI databases using Biopython Bio.Entrez. Use when downloading sequences, fetching GenBank records, getting document summaries, or parsing NCBI data into Biopython objects. tool_type: python primary_tool: Bio.Entrez measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

Entrez Fetch

Retrieve records from NCBI databases using Biopython's Entrez module (EFetch, ESummary utilities).

Required Setup

python
from Bio import Entrez

Entrez.email = 'your.email@example.com'  # Required by NCBI
Entrez.api_key = 'your_api_key'          # Optional, raises rate limit 3->10 req/sec

Core Functions

Entrez.efetch() - Retrieve Full Records

Fetch complete records in various formats from any NCBI database.

python
# Fetch GenBank record by ID
handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='gb', retmode='text')
genbank_text = handle.read()
handle.close()

# Fetch FASTA sequence
handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='fasta', retmode='text')
fasta_text = handle.read()
handle.close()

# Fetch multiple records
handle = Entrez.efetch(db='nucleotide', id='NM_007294,NM_000059', rettype='fasta', retmode='text')

Key Parameters:

Parameter Description Example
db Database name 'nucleotide', 'protein', 'pubmed'
id Record ID(s) 'NM_007294' or '123,456,789'
rettype Return type 'fasta', 'gb', 'abstract'
retmode Return mode 'text', 'xml'
retstart Start index 0
retmax Max records 20
WebEnv History server session From esearch
query_key History server query From esearch

Common Return Types by Database

Nucleotide/Protein:

rettype retmode Description
'fasta' 'text' FASTA sequence
'gb' 'text' GenBank flat file
'gp' 'text' GenPept flat file (protein)
'gbwithparts' 'text' GenBank with contig sequences
'seqid' 'text' Seq-id only
'acc' 'text' Accession only

PubMed:

rettype retmode Description
'abstract' 'text' Abstract text
'medline' 'text' MEDLINE format
'xml' 'xml' Full PubMed XML

Gene:

rettype retmode Description
'gene_table' 'text' Gene table format
'xml' 'xml' Full gene XML

Entrez.esummary() - Document Summaries

Get brief summaries without downloading full records. Faster than efetch.

python
# Get summary for nucleotide record
handle = Entrez.esummary(db='nucleotide', id='NM_007294')
record = Entrez.read(handle)
handle.close()

summary = record[0]  # First (only) record
print(f"Title: {summary['Title']}")
print(f"Length: {summary['Length']}")
print(f"Organism: {summary['Organism']}")

Common Summary Fields:

python
# Nucleotide/Protein
summary['Title']          # Record title/description
summary['Caption']        # Short identifier
summary['Length']         # Sequence length
summary['Organism']       # Source organism
summary['TaxId']          # Taxonomy ID
summary['AccessionVersion']  # Full accession.version

# PubMed
summary['Title']          # Article title
summary['AuthorList']     # Authors
summary['Source']         # Journal
summary['PubDate']        # Publication date
summary['DOI']            # Digital Object Identifier

Parsing with Biopython

Parse into SeqRecord Objects

python
from Bio import Entrez, SeqIO

Entrez.email = 'your.email@example.com'

# Parse GenBank into SeqRecord
handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='gb', retmode='text')
record = SeqIO.read(handle, 'genbank')
handle.close()

print(f"ID: {record.id}")
print(f"Length: {len(record.seq)}")
print(f"Features: {len(record.features)}")

# Parse FASTA into SeqRecord
handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='fasta', retmode='text')
record = SeqIO.read(handle, 'fasta')
handle.close()

Parse Multiple Records

python
# Fetch multiple as FASTA
handle = Entrez.efetch(db='nucleotide', id='NM_007294,NM_000059,NM_000546', rettype='fasta', retmode='text')
records = list(SeqIO.parse(handle, 'fasta'))
handle.close()

for record in records:
    print(f"{record.id}: {len(record.seq)} bp")

Parse XML with Entrez.read()

python
# For structured data, use XML mode
handle = Entrez.efetch(db='gene', id='672', retmode='xml')
records = Entrez.read(handle)
handle.close()

# Navigate nested structure
gene = records[0]
print(f"Gene: {gene['Entrezgene_gene']['Gene-ref']['Gene-ref_locus']}")

Code Patterns

Fetch Sequence by Accession

python
from Bio import Entrez, SeqIO

Entrez.email = 'your.email@example.com'

def fetch_sequence(accession, db='nucleotide'):
    handle = Entrez.efetch(db=db, id=accession, rettype='fasta', retmode='text')
    record = SeqIO.read(handle, 'fasta')
    handle.close()
    return record

seq = fetch_sequence('NM_007294')
print(f"{seq.id}: {seq.seq[:50]}...")

Fetch GenBank with Features

python
def fetch_genbank(accession):
    handle = Entrez.efetch(db='nucleotide', id=accession, rettype='gb', retmode='text')
    record = SeqIO.read(handle, 'genbank')
    handle.close()
    return record

gb = fetch_genbank('NM_007294')
for feature in gb.features:
    if feature.type == 'CDS':
        print(f"CDS: {feature.location}")
        print(f"Product: {feature.qualifiers.get('product', ['?'])[0]}")

Fetch PubMed Abstract

python
def fetch_abstract(pmid):
    handle = Entrez.efetch(db='pubmed', id=pmid, rettype='abstract', retmode='text')
    abstract = handle.read()
    handle.close()
    return abstract

abstract = fetch_abstract('35412348')
print(abstract)

Get Record Summaries

python
def get_summaries(db, ids):
    if isinstance(ids, list):
        ids = ','.join(ids)
    handle = Entrez.esummary(db=db, id=ids)
    records = Entrez.read(handle)
    handle.close()
    return records

summaries = get_summaries('nucleotide', ['NM_007294', 'NM_000059'])
for s in summaries:
    print(f"{s['Caption']}: {s['Title'][:50]}... ({s['Length']} bp)")

Search Then Fetch

python
# Search for records
handle = Entrez.esearch(db='nucleotide', term='human[orgn] AND insulin[gene] AND mRNA[fkey]', retmax=5)
search_results = Entrez.read(handle)
handle.close()

ids = search_results['IdList']

# Fetch the sequences
handle = Entrez.efetch(db='nucleotide', id=','.join(ids), rettype='fasta', retmode='text')
records = list(SeqIO.parse(handle, 'fasta'))
handle.close()

for record in records:
    print(f"{record.id}: {len(record.seq)} bp")

Fetch Protein by Gene ID

python
# Search gene database
handle = Entrez.esearch(db='gene', term='BRCA1[sym] AND human[orgn]')
result = Entrez.read(handle)
handle.close()
gene_id = result['IdList'][0]

# Get linked protein IDs
handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id)
links = Entrez.read(handle)
handle.close()

protein_ids = [link['Id'] for link in links[0]['LinkSetDb'][0]['Link'][:3]]

# Fetch proteins
handle = Entrez.efetch(db='protein', id=','.join(protein_ids), rettype='fasta', retmode='text')
proteins = list(SeqIO.parse(handle, 'fasta'))
handle.close()

Save Fetched Records to File

python
def download_sequences(ids, output_file, db='nucleotide', format='fasta'):
    handle = Entrez.efetch(db=db, id=','.join(ids), rettype=format, retmode='text')
    with open(output_file, 'w') as out:
        out.write(handle.read())
    handle.close()

download_sequences(['NM_007294', 'NM_000059'], 'brca_genes.fasta')

Common Errors

Error Cause Solution
HTTPError 400 Invalid ID or parameters Verify ID exists, check rettype
HTTPError 429 Rate limit exceeded Add delays or use API key
Empty result Record doesn't exist Verify accession in web browser
ValueError in SeqIO Wrong format specified Match rettype with SeqIO format
ExpatError XML parsing error Use retmode='text' instead

Decision Tree

Need to retrieve NCBI records?
├── Need full sequence?
│   └── Use efetch with rettype='fasta'
├── Need sequence + annotations?
│   └── Use efetch with rettype='gb' (GenBank)
├── Just need metadata (length, organism)?
│   └── Use esummary (faster)
├── Need PubMed abstract?
│   └── Use efetch with rettype='abstract'
├── Need structured data for parsing?
│   └── Use efetch with retmode='xml' + Entrez.read()
├── Downloading many records?
│   └── See batch-downloads skill
└── Need records from multiple databases?
    └── See entrez-link skill first

Related Skills

  • entrez-search - Find record IDs before fetching
  • entrez-link - Find related records in other databases
  • batch-downloads - Download large numbers of records efficiently
  • sequence-io/read-sequences - Parse downloaded sequences with SeqIO

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results