Agent skill

bio-entrez-search

Stars 2,009
Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-entrez-search

SKILL.md


name: bio-entrez-search description: Search NCBI databases using Biopython Bio.Entrez. Use when finding records by keyword, building complex search queries, discovering database structure, or getting global query counts across databases. tool_type: python primary_tool: Bio.Entrez measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

Entrez Search

Search NCBI databases using Biopython's Entrez module (ESearch, EInfo, EGQuery utilities).

Required Setup

python
from Bio import Entrez

Entrez.email = 'your.email@example.com'  # Required by NCBI
Entrez.api_key = 'your_api_key'          # Optional, raises rate limit 3->10 req/sec

Core Functions

Entrez.esearch() - Search a Database

Search any NCBI database and get matching record IDs.

python
handle = Entrez.esearch(db='nucleotide', term='human[orgn] AND BRCA1[gene]')
record = Entrez.read(handle)
handle.close()

print(f"Found {record['Count']} records")
print(f"IDs: {record['IdList']}")  # First 20 IDs by default

Key Parameters:

Parameter Description Default
db Database to search Required
term Search query Required
retmax Max IDs to return 20
retstart Starting index (pagination) 0
usehistory Store results on server 'n'
sort Sort order database-specific
datetype Date field to search 'pdat'
reldate Records from last N days None
mindate Start date (YYYY/MM/DD) None
maxdate End date (YYYY/MM/DD) None

ESearch Result Fields:

python
record['Count']        # Total matching records (string)
record['IdList']       # List of record IDs
record['RetMax']       # Number of IDs returned
record['RetStart']     # Starting index
record['QueryKey']     # For history server (if usehistory='y')
record['WebEnv']       # For history server (if usehistory='y')
record['TranslationSet']  # Query translations applied
record['QueryTranslation']  # Final translated query

Entrez.einfo() - Database Information

Get information about available databases or specific database fields.

python
# List all available databases
handle = Entrez.einfo()
record = Entrez.read(handle)
handle.close()
print(record['DbList'])  # ['pubmed', 'protein', 'nucleotide', ...]

# Get info about specific database
handle = Entrez.einfo(db='nucleotide')
record = Entrez.read(handle)
handle.close()

print(f"Description: {record['DbInfo']['Description']}")
print(f"Record count: {record['DbInfo']['Count']}")

# List searchable fields
for field in record['DbInfo']['FieldList']:
    print(f"{field['Name']}: {field['Description']}")

Database Info Fields:

python
record['DbInfo']['DbName']       # Database name
record['DbInfo']['Description']  # Database description
record['DbInfo']['Count']        # Total records in database
record['DbInfo']['LastUpdate']   # Last update date
record['DbInfo']['FieldList']    # Searchable fields
record['DbInfo']['LinkList']     # Available links to other databases

Entrez.egquery() - Global Query

Search across all NCBI databases simultaneously.

python
handle = Entrez.egquery(term='CRISPR')
record = Entrez.read(handle)
handle.close()

for result in record['eGQueryResult']:
    if int(result['Count']) > 0:
        print(f"{result['DbName']}: {result['Count']} records")

Search Query Syntax

NCBI uses a specific query syntax:

Field Tags

python
# Search specific fields using [field_name]
term = 'BRCA1[gene]'                    # Gene name field
term = 'human[orgn]'                    # Organism field
term = 'Homo sapiens[ORGN]'             # Full organism name
term = 'NM_007294[accn]'                # Accession number
term = 'Smith J[auth]'                  # Author (PubMed)
term = 'Nature[jour]'                   # Journal (PubMed)
term = '1000:5000[slen]'                # Sequence length range
term = 'mRNA[fkey]'                     # Feature key

Boolean Operators

python
term = 'BRCA1 AND human'                # Both terms
term = 'cancer OR tumor'                # Either term
term = 'human NOT mouse'                # Exclude term
term = '(BRCA1 OR BRCA2) AND human'     # Grouping

Date Ranges

python
# Using date parameters
handle = Entrez.esearch(
    db='pubmed',
    term='CRISPR',
    datetype='pdat',     # Publication date
    mindate='2023/01/01',
    maxdate='2024/12/31'
)

# Or in query string
term = 'CRISPR AND 2024[pdat]'
term = 'CRISPR AND 2023:2024[pdat]'

Wildcards and Phrases

python
term = 'immun*'                         # Wildcard
term = '"breast cancer"[title]'         # Exact phrase

Common Databases

Database db value Common Fields
PubMed pubmed [auth], [title], [jour], [pdat]
Nucleotide nucleotide [orgn], [gene], [accn], [slen]
Protein protein [orgn], [gene], [accn], [molwt]
Gene gene [orgn], [sym], [chr]
SRA sra [orgn], [platform], [strategy]
Taxonomy taxonomy [scin], [comn], [rank]
Assembly assembly [orgn], [level], [refseq]

Code Patterns

Basic Search with Pagination

python
from Bio import Entrez

Entrez.email = 'your.email@example.com'

def search_ncbi(db, term, max_results=100):
    handle = Entrez.esearch(db=db, term=term, retmax=max_results)
    record = Entrez.read(handle)
    handle.close()
    return record['IdList'], int(record['Count'])

ids, total = search_ncbi('nucleotide', 'human[orgn] AND insulin[gene]')
print(f'Retrieved {len(ids)} of {total} total records')

Paginated Search for Large Results

python
def search_all_ids(db, term, batch_size=10000):
    all_ids = []
    handle = Entrez.esearch(db=db, term=term, retmax=0)
    record = Entrez.read(handle)
    handle.close()
    total = int(record['Count'])

    for start in range(0, total, batch_size):
        handle = Entrez.esearch(db=db, term=term, retstart=start, retmax=batch_size)
        record = Entrez.read(handle)
        handle.close()
        all_ids.extend(record['IdList'])

    return all_ids

Search with History Server (for Large Results)

python
# Store results on NCBI server for subsequent fetching
handle = Entrez.esearch(db='nucleotide', term='human[orgn] AND mRNA[fkey]', usehistory='y')
record = Entrez.read(handle)
handle.close()

webenv = record['WebEnv']
query_key = record['QueryKey']
total = int(record['Count'])

# Use webenv and query_key with efetch for batch downloads
# See batch-downloads skill for details

Recent Records Only

python
# Records from last 30 days
handle = Entrez.esearch(db='pubmed', term='CRISPR', reldate=30, datetype='pdat')
record = Entrez.read(handle)
handle.close()

Get Available Fields for a Database

python
def get_search_fields(db):
    handle = Entrez.einfo(db=db)
    record = Entrez.read(handle)
    handle.close()
    return [(f['Name'], f['Description']) for f in record['DbInfo']['FieldList']]

fields = get_search_fields('nucleotide')
for name, desc in fields[:10]:
    print(f'{name}: {desc}')

Check Query Translation

python
handle = Entrez.esearch(db='nucleotide', term='human BRCA1')
record = Entrez.read(handle)
handle.close()

# See how NCBI interpreted your query
print(f"Your query was translated to: {record['QueryTranslation']}")
# e.g., '"homo sapiens"[Organism] AND BRCA1[All Fields]'

Common Errors

Error Cause Solution
HTTPError 429 Rate limit exceeded Add delays or use API key
HTTPError 400 Invalid query syntax Check field names and operators
Empty IdList No matches or typo Check QueryTranslation field
RuntimeError Missing email Set Entrez.email

Decision Tree

Need to search NCBI?
├── Finding records in one database?
│   └── Use Entrez.esearch()
├── Search across all databases?
│   └── Use Entrez.egquery()
├── Need database field names?
│   └── Use Entrez.einfo(db='database')
├── List all available databases?
│   └── Use Entrez.einfo() (no db argument)
├── Results > 10,000 records?
│   └── Use usehistory='y', then batch fetch
└── Need to fetch actual records?
    └── See entrez-fetch skill

Related Skills

  • entrez-fetch - Retrieve full records after searching
  • entrez-link - Find related records in other databases
  • batch-downloads - Download large result sets efficiently
  • geo-data - Search GEO expression datasets (specialized search)
  • blast-searches - Search by sequence similarity instead of keywords

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results