Agent skill

single-cell-downstream-analysis

Checklist-style reference for OmicVerse downstream tutorials covering AUCell scoring, metacell DEG, and related exports.

Stars 2,009
Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/single-downstream-analysis

SKILL.md

Single-cell downstream analysis quick-reference

This skill sheet distills the OmicVerse single-cell downstream tutorials into an executable checklist. Each module highlights prerequisites, the core API entry points, interpretation checkpoints, resource planning notes, and any optional validation or export steps surfaced in the notebooks.

AUCell pathway scoring (t_aucell.ipynb)

  • Prerequisites
    • Download pathway collections (GO, KEGG, or custom) that match the organism under study before running the tutorial.
    • Ensure an AnnData object with clustering/embedding (adata.obsm['X_umap']) is prepared.
  • Core calls
    • ov.single.geneset_aucell for one pathway; ov.single.pathway_aucell for multiple pathways.
    • ov.single.pathway_aucell_enrichment to score all pathways in a library (set num_workers for parallelism).
  • Result checks
    • Interpret AUCell scores as expression-like values (0–1). Use sc.pl.embedding to confirm pathway activity patterns.
    • Run sc.tl.rank_genes_groups on the AUCell AnnData to find cluster-enriched pathways and visualize with sc.pl.rank_genes_groups_dotplot.
  • Resources
    • Library-wide scoring can be CPU-intensive; allocate workers (num_workers=8 in tutorial) and sufficient memory for the dense AUCell matrix.
  • Optional validation / exports
    • Persist scores with adata_aucs.write_h5ad('...') for reuse.
    • Plot enriched pathways via ov.single.pathway_enrichment and ov.single.pathway_enrichment_plot heatmaps.

scRNA-seq DEG (bulk-style meta cell) (t_scdeg.ipynb)

  • Prerequisites
    • Run quality control and preprocessing (ov.pp.qc, ov.pp.preprocess, ov.pp.scale, ov.pp.pca).
    • Retain raw counts in adata.raw before HVG filtering.
  • Core calls
    • Construct differential objects with ov.bulk.pyDEG(test_adata.to_df(...).T) for full-cell and metacell views.
    • Build metacells via ov.single.MetaCell(..., use_gpu=True) when GPU is available for acceleration.
  • Result checks
    • Inspect volcano plots (dds.plot_volcano) and targeted boxplots (dds.plot_boxplot) for top DEGs.
    • Map DEG markers back to UMAP embeddings using ov.utils.embedding to confirm localization.
  • Resources
    • Metacell construction benefits from GPU but can fall back to CPU; ensure enough memory for transposed dense matrices passed to pyDEG.
  • Optional validation / exports
    • Save metacell embeddings with matplotlib figures; adjust legend_* settings for publication-ready visuals.

scRNA-seq DEG (cell-type & composition) (t_deg_single.ipynb)

  • Prerequisites
    • Annotated adata with condition, cell_label, and optional batch metadata.
    • Initialize mixed CPU/GPU resources when using graph-based DA methods (ov.settings.cpu_gpu_mixed_init()).
  • Core calls
    • ov.single.DEG(..., method='wilcoxon'|'t-test'|'memento-de') with deg_obj.run(...) to target cell types.
    • ov.single.DCT(..., method='sccoda'|'milo') for differential composition testing.
    • Graph setup for Milo: ov.pp.preprocess, ov.single.batch_correction, ov.pp.neighbors, ov.pp.umap.
  • Result checks
    • Review DEG tables from deg_obj (Wilcoxon / memento) and adjust capture rate / bootstraps for stability.
    • For scCODA, tune FDR via sim_results.set_fdr(); interpret boxplots with condition-level shifts.
    • Milo diagnostics: histogram of P-values, logFC vs –log10 FDR scatter, beeswarm of differential abundance.
  • Resources
    • Memento and Milo require multiple CPUs (num_cpus, num_boot, high k); ensure adequate compute time.
    • Harmony/scVI batch correction needs GPU memory when enabled; plan for VRAM usage.
  • Optional validation / exports
    • Visual diagnostics include UMAP overlays (ov.pl.embedding), Milo beeswarm plots, and custom color palettes.

scDrug response prediction (t_scdrug.ipynb)

  • Prerequisites
    • Fetch tumor-focused dataset (e.g., infercnvpy.datasets.maynard2020_3k).
    • Download reference assets before running predictions:
      • Gene annotations via ov.utils.get_gene_annotation (requires GTF from GENCODE or T2T-CHM13).
      • ov.utils.download_GDSC_data() and ov.utils.download_CaDRReS_model() for drug-response models.
      • Clone CaDRReS-Sc repo (git clone https://github.com/CSB5/CaDRReS-Sc).
  • Core calls
    • Tumor resolution detection: ov.single.autoResolution(adata, cpus=4).
    • Drug response runner: ov.single.Drug_Response(adata, scriptpath='CaDRReS-Sc', modelpath='models/', output='result').
  • Result checks
    • Inspect clustering and IC50 outputs stored under output; cross-reference with inferred CNV states.
  • Resources
    • Requires external CaDRReS-Sc environment (Python/R dependencies) and storage for model downloads.
    • Running inferCNV preprocessing may need multiple CPUs and substantial RAM.
  • Optional validation / exports
    • Persist intermediate AnnData (adata.write('scanpyobj.h5ad')) to reuse for downstream analyses or re-runs.

SCENIC regulon discovery (t_scenic.ipynb)

  • Prerequisites
    • Mouse hematopoiesis dataset loaded via ov.single.mouse_hsc_nestorowa16() (or provide preprocessed data with raw counts).
    • Download cisTarget ranking databases (*.feather) and motif annotations (motifs-*.tbl) for the species; allocate

      3 GB disk space and verify paths (db_glob, motif_path).

  • Core calls
    • Initialize analysis: ov.single.SCENIC(adata, db_glob=..., motif_path=..., n_jobs=12).
    • Run RegDiffusion-based GRN inference, regulon pruning, and AUCell scoring via the SCENIC object methods.
  • Result checks
    • Examine regulon activity matrices (scenic_obj.auc_mtx.head()), RSS scores, and embeddings colored by regulon activity.
    • Use RSS plots, dendrograms, and AUCell distributions to interpret TF specificity and activity thresholds.
  • Resources
    • Multi-core CPU recommended (n_jobs matches available cores); ensure enough RAM for motif enrichment.
    • Large downloads and intermediate objects (pickle/h5ad) require disk space.
  • Optional validation / exports
    • Save scenic_obj (ov.utils.save) and regulon AnnData (regulon_ad.write).
    • Optional plots: RSS per cell type, regulon embeddings, AUC histograms with threshold lines, GRN network visualizations.

cNMF program discovery (t_cnmf.ipynb)

  • Prerequisites
    • Preprocess with HVG selection (ov.pp.preprocess), scaling (ov.pp.scale), PCA, and have UMAP embeddings for inspection.
    • Select component range (e.g., np.arange(5, 11)) and iterations; ensure output directory exists.
  • Core calls
    • Instantiate analysis: ov.single.cNMF(..., output_dir='...', name='...').
    • Factorization workflow: cnmf_obj.factorize(...), cnmf_obj.combine(...), cnmf_obj.k_selection_plot(), cnmf_obj.consensus(...).
    • Extract results: cnmf_obj.load_results(...), cnmf_obj.get_results(...), optional RF classifier via get_results_rfc.
  • Result checks
    • Evaluate stability via K-selection plot and local density histogram; confirm chosen K with consensus heatmaps.
    • Inspect topic usage embeddings (ov.pl.embedding), cluster labels, and dotplots of top genes.
  • Resources
    • Multiple iterations and components are CPU-heavy; consider distributing workers (total_workers) and verifying disk space for intermediate factorization files.
  • Optional validation / exports
    • Visualizations include Euclidean distance heatmaps, density histograms, UMAP overlays for topics/clusters, and dotplots.

NOCD overlapping communities (t_nocd.ipynb)

  • Prerequisites
    • Prepare AnnData via ov.single.scanpy_lazy (automated preprocessing) before running NOCD.
    • Note: Tutorial warns NOCD implementation is under active development—expect variability.
  • Core calls
    • Pipeline wrapper: scbrca = ov.single.scnocd(adata) followed by chained methods (matrix_transform, matrix_normalize, GNN_configure, GNN_preprocess, GNN_model, GNN_result, GNN_plot, cal_nocd, calculate_nocd).
  • Result checks
    • Compare standard Leiden clusters versus NOCD outputs on UMAP embeddings to identify multi-fate cells.
  • Resources
    • Graph neural network stages can be GPU-accelerated; ensure CUDA availability or be prepared for longer CPU runtimes.
    • Track memory usage when constructing large adjacency matrices.
  • Optional validation / exports
    • Generate multiple UMAP overlays (sc.pl.umap) for nocd, nocd_n, and Leiden labels using shared color maps.

Lazy pipeline & reporting (t_lazy.ipynb)

  • Prerequisites
    • Install OmicVerse ≥1.7.0 with lazy utilities; supported species currently human/mouse.
    • Prepare batch metadata (sample_key) and optionally initialize hybrid compute (ov.settings.cpu_gpu_mixed_init()).
  • Core calls
    • Turnkey preprocessing: ov.single.lazy(adata, species='mouse', sample_key='batch', ...) with optional reforce_steps and module-specific kwargs.
    • Reporting: ov.single.generate_scRNA_report(...) to build HTML summary; ov.generate_reference_table(adata) for citation tracking.
  • Result checks
    • Inspect generated embeddings (ov.pl.embedding) for quality and annotation alignment.
    • Review HTML report for QC metrics, normalization, batch correction, and embeddings.
  • Resources
    • Steps like Harmony or scVI may invoke GPU; confirm hardware availability or adjust reforce_steps accordingly.
    • Report generation writes to disk; ensure output path is writable.
  • Optional validation / exports
    • Customize embeddings by color key; store HTML report and reference table alongside project documentation.

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results