Agent skill
bulk-rna-seq-batch-correction-with-combat
Use omicverse's pyComBat wrapper to remove batch effects from merged bulk RNA-seq or microarray cohorts, export corrected matrices, and benchmark pre/post correction visualisations.
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bulk-combat-correction
SKILL.md
Bulk RNA-seq batch correction with ComBat
Overview
Apply this skill when a user has multiple bulk expression matrices measured across different batches and needs to harmonise them
before downstream analysis. It follows t_bulk_combat.ipynb, w
hich demonstrates the pyComBat workflow on ovarian cancer microarray cohorts.
Instructions
- Import core libraries
- Load
omicverse as ov,anndata,pandas as pd, andmatplotlib.pyplot as plt. - Call
ov.ov_plot_set()(aliasedov.plot_set()in some releases) to align figures with omicverse styling.
- Load
- Load each batch separately
- Read the prepared pickled matrices (or user-provided expression tables) with
pd.read_pickle(...)/pd.read_csv(...). - Transpose to gene × sample before wrapping them in
anndata.AnnDataobjects soadata.obsstores sample metadata. - Assign a
batchcolumn for every cohort (adata.obs['batch'] = '1','2', ...). Encourage descriptive labels when availa ble.
- Read the prepared pickled matrices (or user-provided expression tables) with
- Concatenate on shared genes
- Use
anndata.concat([adata1, adata2, adata3], merge='same')to retain the intersection of genes across batches. - Confirm the combined
adatareports balanced sample counts per batch; if not, prompt users to re-check inputs.
- Use
- Run ComBat batch correction
- Execute
ov.bulk.batch_correction(adata, batch_key='batch'). - Explain that corrected values are stored in
adata.layers['batch_correction']while the original counts remain inadata.X.
- Execute
- Export corrected and raw matrices
- Obtain DataFrames via
adata.to_df().T(raw) andadata.to_df(layer='batch_correction').T(corrected). - Encourage saving both tables (
.to_csv(...)) plus the harmonised AnnData (adata.write_h5ad('adata_batch.h5ad', compressio n='gzip')).
- Obtain DataFrames via
- Benchmark the correction
- For per-sample variance checks, draw before/after boxplots and recolour boxes using
ov.utils.red_color,blue_color,gree n_colorpalettes to match batches. - Copy raw counts to a named layer with
adata.layers['raw'] = adata.X.copy()before PCA. - Run
ov.pp.pca(adata, layer='raw', n_pcs=50)andov.pp.pca(adata, layer='batch_correction', n_pcs=50). - Visualise embeddings with
ov.utils.embedding(..., basis='raw|original|X_pca', color='batch', frameon='small')and repeat fo r the corrected layer to verify mixing.
- For per-sample variance checks, draw before/after boxplots and recolour boxes using
- Troubleshooting tips
- Mismatched gene identifiers cause dropped features—remind users to harmonise feature names (e.g., gene symbols) before conca tenation.
- pyComBat expects log-scale intensities or similarly distributed counts; recommend log-transforming strongly skewed matrices.
- If
batch_correctionlayer is missing, ensure thebatch_keymatches the column name inadata.obs.
Examples
- "Combine three GEO ovarian cohorts, run ComBat, and export both the raw and corrected CSV matrices."
- "Plot PCA embeddings before and after batch correction to confirm that batches 1–3 overlap."
- "Save the harmonised AnnData file so I can reload it later for downstream DEG analysis."
References
- Tutorial notebook:
t_bulk_combat.ipynb - Example inputs:
omicverse_guide/docs/Tutorials-bulk/data/combat/ - Quick copy/paste commands:
reference.md
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?