Agent skills
tooluniverse-expression-data-r...

Agent skill

tooluniverse-expression-data-retrieval

Retrieves gene expression and omics datasets from ArrayExpress and BioStudies with gene disambiguation, experiment quality assessment, and structured reports. Creates comprehensive dataset profiles with metadata, sample information, and download links. Use when users need expression data, omics datasets, or mention ArrayExpress (E-MTAB, E-GEOD) or BioStudies (S-BSST) accessions.

View SKILL.md on GitHub Repository

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/tooluniverse-expression-data-retrieval

SKILL.md

Gene Expression & Omics Data Retrieval

Retrieve gene expression experiments and multi-omics datasets with proper disambiguation and quality assessment.

IMPORTANT: Always use English terms in tool calls (gene names, tissue names, condition descriptions), even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language.

Workflow Overview

Phase 0: Clarify Query (if ambiguous)
    ↓
Phase 1: Disambiguate Gene/Condition
    ↓
Phase 2: Search & Retrieve (Internal)
    ↓
Phase 3: Report Dataset Profile

Phase 0: Clarification (When Needed)

Ask the user ONLY if:

Gene name is ambiguous (e.g., "p53" → TP53 or MDM2 studies?)
Tissue/condition unclear for comparative studies
Organism not specified for non-human research

Skip clarification for:

Specific accession numbers (E-MTAB-, E-GEOD-, S-BSST*)
Clear disease/tissue + organism combinations
Explicit platform requests (RNA-seq, microarray)

Phase 1: Query Disambiguation

1.1 Gene Name Resolution

If searching by gene, first resolve official identifiers:

python

from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()

# For gene-focused searches, resolve official symbol first
# This helps construct better search queries
# Example: "p53" → "TP53" (official HGNC symbol)

Gene Disambiguation Checklist:

Official gene symbol identified (HGNC for human, MGI for mouse)
Common aliases noted for search expansion
Species confirmed

1.2 Construct Search Strategy

User Query Type	Search Strategy
Specific accession	Direct retrieval
Gene + condition	"[gene] [condition]" + species filter
Disease only	"[disease]" + species filter
Technology-specific	Add platform keywords (RNA-seq, microarray)

Phase 2: Data Retrieval (Internal)

Search silently. Do NOT narrate the process.

2.1 Search Experiments

python

# ArrayExpress search
result = tu.tools.arrayexpress_search_experiments(
    keywords="[gene/disease] [condition]",
    species="[species]",
    limit=20
)

# BioStudies for multi-omics
biostudies_result = tu.tools.biostudies_search_studies(
    query="[keywords]",
    limit=10
)

2.2 Get Experiment Details

For top results, retrieve full metadata:

python

# Get details for each relevant experiment
details = tu.tools.arrayexpress_get_experiment_details(
    accession=accession
)

# Get sample information
samples = tu.tools.arrayexpress_get_experiment_samples(
    accession=accession
)

# Get available files
files = tu.tools.arrayexpress_get_experiment_files(
    accession=accession
)

2.3 BioStudies Retrieval

python

# Multi-omics study details
study_details = tu.tools.biostudies_get_study_details(
    accession=study_accession
)

# Study structure
sections = tu.tools.biostudies_get_study_sections(
    accession=study_accession
)

# Available files
files = tu.tools.biostudies_get_study_files(
    accession=study_accession
)

Fallback Chains

Primary	Fallback	Notes
ArrayExpress search	BioStudies search	ArrayExpress empty
arrayexpress_get_experiment_details	biostudies_get_study_details	E-GEOD may have BioStudies mirror
arrayexpress_get_experiment_files	Note "Files unavailable"	Some studies restrict downloads

Phase 3: Report Dataset Profile

Output Structure

Present as a Dataset Search Report. Hide search process.

markdown

# Expression Data: [Query Topic]

**Search Summary**
- Query: [gene/disease] in [species]
- Databases: ArrayExpress, BioStudies
- Results: [N] relevant experiments found

**Data Quality Overview**: [assessment based on criteria below]

---

## Top Experiments

### 1. [E-MTAB-XXXX]: [Title]

| Attribute | Value |
|-----------|-------|
| **Accession** | [accession with link] |
| **Organism** | [species] |
| **Experiment Type** | RNA-seq / Microarray |
| **Platform** | [specific platform] |
| **Samples** | [N] samples |
| **Release Date** | [date] |

**Description**: [Brief description from metadata]

**Experimental Design**:
- Conditions: [treatment vs control, etc.]
- Replicates: [N biological, M technical]
- Tissue/Cell type: [if specified]

**Sample Groups**:
| Group | Samples | Description |
|-------|---------|-------------|
| Control | [N] | [description] |
| Treatment | [N] | [description] |

**Data Files Available**:
| File | Type | Size |
|------|------|------|
| [filename] | Processed data | [size] |
| [filename] | Raw data | [size] |
| [filename] | Sample metadata | [size] |

**Quality Assessment**: ●●● High / ●●○ Medium / ●○○ Low
- Sample size: [adequate/limited]
- Replication: [yes/no]
- Metadata completeness: [complete/partial]

---

### 2. [E-GEOD-XXXXX]: [Title]
[Same structure as above]

---

## Multi-Omics Studies (from BioStudies)

### [S-BSST-XXXXX]: [Title]

| Attribute | Value |
|-----------|-------|
| **Accession** | [accession] |
| **Study Type** | [proteomics/metabolomics/integrated] |
| **Organism** | [species] |
| **Samples** | [N] |

**Data Types Included**:
- [ ] Transcriptomics
- [ ] Proteomics
- [ ] Metabolomics
- [ ] Other: [specify]

---

## Summary Table

| Accession | Type | Samples | Platform | Quality |
|-----------|------|---------|----------|---------|
| [E-MTAB-X] | RNA-seq | [N] | Illumina | ●●● |
| [E-GEOD-X] | Microarray | [N] | Affymetrix | ●●○ |

---

## Recommendations

**For [specific analysis type]**:
- Best experiment: [accession] - [reason]
- Alternative: [accession] - [reason]

**Data Integration Notes**:
- Platform compatibility: [notes on combining datasets]
- Batch considerations: [if applicable]

---

## Data Access

### Direct Download Links
- [E-MTAB-XXXX processed data](link)
- [E-MTAB-XXXX raw data](link)

### Database Links
- ArrayExpress: https://www.ebi.ac.uk/arrayexpress/experiments/[accession]
- BioStudies: https://www.ebi.ac.uk/biostudies/studies/[accession]

Retrieved: [date]

Data Quality Tiers

Assessment criteria for expression experiments:

Tier	Symbol	Criteria
High Quality	●●●	≥3 bio replicates, complete metadata, processed data available
Medium Quality	●●○	2-3 replicates OR some metadata gaps, data accessible
Low Quality	●○○	No replicates, sparse metadata, or data access issues
Use with Caution	○○○	Single sample, no replication, outdated platform

Include assessment rationale:

markdown

**Quality**: ●●● High
- ✓ 4 biological replicates per condition
- ✓ Complete sample annotations
- ✓ Processed and raw data available
- ✓ Recent RNA-seq platform

Completeness Checklist

Every dataset report MUST include:

Per Experiment (Required)

Accession number with database link
Organism
Experiment type (RNA-seq/microarray/etc.)
Sample count
Brief description
Quality assessment

Search Summary (Required)

Query parameters stated
Number of results
Databases searched

Recommendations (Required)

Best dataset for user's purpose (or "No suitable data found")
Data access notes

Include Even If Empty

Multi-omics studies section (or "No multi-omics studies found")
Data integration notes (or "Single-platform data, no integration needed")

Common Use Cases

Disease Gene Expression

User: "Find breast cancer RNA-seq data"

python

result = tu.tools.arrayexpress_search_experiments(
    keywords="breast cancer RNA-seq",
    species="Homo sapiens",
    limit=20
)

→ Report top experiments with quality assessment

Gene-Specific Studies

User: "Find TP53 expression experiments in mouse"

python

result = tu.tools.arrayexpress_search_experiments(
    keywords="TP53 p53",  # Include aliases
    species="Mus musculus",
    limit=15
)

→ Report experiments studying this gene

Specific Accession Lookup

User: "Get details for E-MTAB-5214" → Single experiment profile with all details and files

Multi-Omics Integration

User: "Find proteomics and transcriptomics studies for liver disease" → Search both ArrayExpress and BioStudies, note integration potential

Error Handling

Error	Response
"No experiments found"	Broaden keywords, remove species filter, try synonyms
"Accession not found"	Verify format (E-MTAB-, E-GEOD-, S-BSST*), check if withdrawn
"Files not available"	Note in report: "Data files restricted by submitter"
"API timeout"	Retry once, then note: "(metadata retrieval incomplete)"

Tool Reference

ArrayExpress (Gene Expression)

Tool	Purpose
`arrayexpress_search_experiments`	Keyword/species search
`arrayexpress_get_experiment_details`	Full metadata
`arrayexpress_get_experiment_files`	Download links
`arrayexpress_get_experiment_samples`	Sample annotations

BioStudies (Multi-Omics)

Tool	Purpose
`biostudies_search_studies`	Multi-omics search
`biostudies_get_study_details`	Study metadata
`biostudies_get_study_files`	Data files
`biostudies_get_study_sections`	Study structure

Search Parameters Reference

ArrayExpress

Parameter	Description	Example
`keywords`	Free text search	"breast cancer RNA-seq"
`species`	Scientific name	"Homo sapiens"
`array`	Platform filter	"Illumina"
`limit`	Max results	20

BioStudies

Parameter	Description	Example
`query`	Free text	"proteomics liver"
`limit`	Max results	10

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/tooluniverse-expression-data-retrieval
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Gene Expression & Omics Data Retrieval

Workflow Overview

Phase 0: Clarification (When Needed)

Phase 1: Query Disambiguation

1.1 Gene Name Resolution

1.2 Construct Search Strategy

Phase 2: Data Retrieval (Internal)

2.1 Search Experiments

2.2 Get Experiment Details

2.3 BioStudies Retrieval

Fallback Chains

Phase 3: Report Dataset Profile

Output Structure

Data Quality Tiers

Completeness Checklist

Per Experiment (Required)

Search Summary (Required)

Recommendations (Required)

Include Even If Empty

Common Use Cases

Disease Gene Expression

Gene-Specific Studies

Specific Accession Lookup

Multi-Omics Integration

Error Handling

Tool Reference

Search Parameters Reference

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations