Agent skills
gsea-enrichment-analysis

Agent skill

gsea-enrichment-analysis

Gene set enrichment analysis with correct geneset format handling. Critical guidance for loading pathway databases and running enrichment in OmicVerse.

View SKILL.md on GitHub Repository

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/gsea-enrichment

SKILL.md

GSEA and Pathway Enrichment Analysis

Overview

This skill covers gene set enrichment analysis (GSEA) and pathway enrichment workflows in OmicVerse. It provides critical guidance on the correct data formats and API usage patterns to avoid common errors.

Critical API Reference - Geneset Format

IMPORTANT: Use Dictionary Format, NOT File Path!

The ov.bulk.geneset_enrichment() function requires a dictionary of gene sets, NOT a file path string. You must first load the geneset file using ov.utils.geneset_prepare().

CORRECT usage:

python

# Step 1: Download pathway database (if not already available)
ov.utils.download_pathway_database()

# Step 2: Load geneset file into dictionary format - REQUIRED!
pathways_dict = ov.utils.geneset_prepare(
    'genesets/GO_Biological_Process_2021.txt',  # or .gmt file
    organism='Human'  # or 'Mouse'
)

# Step 3: Now run enrichment with the DICTIONARY
enr = ov.bulk.geneset_enrichment(
    gene_list=deg_genes,
    pathways_dict=pathways_dict,  # Pass the DICTIONARY, not file path!
    pvalue_type='auto',
    organism='Human'
)

WRONG - DO NOT USE:

python

# WRONG! Don't pass file path directly to geneset_enrichment!
# enr = ov.bulk.geneset_enrichment(
#     gene_list=deg_genes,
#     pathways_dict='genesets/GO_Biological_Process_2021.gmt'  # ERROR! String path doesn't work!
# )

# WRONG! geneset_enrichment expects dict, not file path
# enr = ov.bulk.geneset_enrichment(
#     gene_list=deg_genes,
#     pathways_dict='GO_Biological_Process_2021'  # ERROR!
# )

File Format Support

File Extension	Load Method	Notes
`.txt`	`ov.utils.geneset_prepare()`	OmicVerse format
`.gmt`	`ov.utils.geneset_prepare()`	Standard GMT format
`.json`	`json.load()` then convert	Custom handling needed

Complete Enrichment Workflow

python

import omicverse as ov

# 1. Setup
ov.plot_set()

# 2. Ensure pathway database is available
ov.utils.download_pathway_database()

# 3. Load gene sets - ALWAYS use geneset_prepare first!
go_bp = ov.utils.geneset_prepare('genesets/GO_Biological_Process_2021.txt', organism='Human')
go_mf = ov.utils.geneset_prepare('genesets/GO_Molecular_Function_2021.txt', organism='Human')
kegg = ov.utils.geneset_prepare('genesets/KEGG_2021_Human.txt', organism='Human')

# 4. Prepare gene list (e.g., from DEG analysis)
# Assuming dds is a pyDEG object with results
deg_genes = dds.result.loc[dds.result['sig'] != 'normal'].index.tolist()

# 5. Run enrichment with dictionary
enr_go_bp = ov.bulk.geneset_enrichment(
    gene_list=deg_genes,
    pathways_dict=go_bp,  # Dictionary, NOT file path!
    pvalue_type='auto',
    organism='Human'
)

# 6. Visualize results
ov.bulk.geneset_plot(enr_go_bp, figsize=(6, 8), num=10)

# 7. For multiple databases, combine into dict
enr_dict = {
    'GO_BP': enr_go_bp,
    'GO_MF': enr_go_mf,
    'KEGG': enr_kegg
}
colors_dict = {
    'GO_BP': '#1f77b4',
    'GO_MF': '#ff7f0e',
    'KEGG': '#2ca02c'
}
ov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=5)

Common Errors and Solutions

Error: "FileNotFoundError" or "pathways_dict is not a dict"

Cause: Passing file path string instead of dictionary to geneset_enrichment() Solution: First load with ov.utils.geneset_prepare(), then pass the returned dictionary

Error: "Missing file 'genesets/GO_Biological_Process_2021.gmt'"

Cause: Pathway database not downloaded Solution: Run ov.utils.download_pathway_database() first

Error: "No enriched pathways found"

Cause: Gene list doesn't overlap with pathway genes, or organism mismatch Solution:

Verify gene symbols match (human vs mouse capitalization)
Check organism parameter matches your data
Ensure gene list has sufficient genes (>10 recommended)

Pathway Databases Available

After running ov.utils.download_pathway_database():

GO_Biological_Process_2021.txt
GO_Molecular_Function_2021.txt
GO_Cellular_Component_2021.txt
KEGG_2021_Human.txt
KEGG_2021_Mouse.txt
Reactome_2022.txt
WikiPathway_2023_Human.txt
And many more...

Best Practices

Always load genesets first: Never pass file paths directly to geneset_enrichment()
Check gene format: Ensure gene symbols match (CAPS for human, Title case for mouse)
Download once: Run download_pathway_database() once per environment
Specify organism: Always set organism='Human' or organism='Mouse'
Use background genes: For more accurate results, provide background parameter

Examples

"Run GO enrichment on my DEG results using the correct geneset_prepare workflow"
"Perform KEGG pathway analysis on upregulated genes with proper dictionary format"
"Compare GO BP, MF, and KEGG enrichment results using geneset_plot_multi"

References

Tutorial notebook: t_deg.ipynb (enrichment section)
Pathway download: ov.utils.download_pathway_database()
Quick reference: reference.md

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/gsea-enrichment
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

GSEA and Pathway Enrichment Analysis

Overview

Critical API Reference - Geneset Format

IMPORTANT: Use Dictionary Format, NOT File Path!

File Format Support

Complete Enrichment Workflow

Common Errors and Solutions

Error: "FileNotFoundError" or "pathways_dict is not a dict"

Error: "Missing file 'genesets/GO_Biological_Process_2021.gmt'"

Error: "No enriched pathways found"

Pathway Databases Available

Best Practices

Examples

References

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations