Agent skill
bulktrajblend-trajectory-interpolation
Extend scRNA-seq developmental trajectories with BulkTrajBlend by generating intermediate cells from bulk RNA-seq, training beta-VAE and GNN models, and interpolating missing states.
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bulk-trajblend-interpolation
SKILL.md
BulkTrajBlend trajectory interpolation
Overview
Invoke this skill when users need to bridge gaps in single-cell developmental trajectories using matched bulk RNA-seq. It follows t_bulktrajblend.ipynb, showcasing how BulkTrajBlend deconvolves PDAC bulk samples, identifies overlapping communities with a GNN, and interpolates "interrupted" cell states.
Instructions
- Prepare libraries and inputs
- Import
omicverse as ov,scanpy as sc,scvelo as scv, and helper functions likefrom omicverse.utils import mde; runov.plot_set(). - Load the reference scRNA-seq AnnData (
scv.datasets.dentategyrus()) and raw bulk counts withov.utils.read(...)followed byov.bulk.Matrix_ID_mapping(...)for gene ID harmonisation.
- Import
- Configure BulkTrajBlend
- Instantiate
ov.bulk2single.BulkTrajBlend(bulk_seq=bulk_df, single_seq=adata, bulk_group=['dg_d_1','dg_d_2','dg_d_3'], celltype_key='clusters'). - Explain that
bulk_groupnames correspond to raw bulk columns and the method expects unscaled counts.
- Instantiate
- Set beta-VAE expectations
- Call
bulktb.vae_configure(cell_target_num=100)(or pass a dictionary) to define expected cell counts per cluster. Mention that omitting the argument triggers TAPE-based estimation.
- Call
- Train or load the beta-VAE
- Use
bulktb.vae_train(batch_size=512, learning_rate=1e-4, hidden_size=256, epoch_num=3500, vae_save_dir='...', vae_save_name='dg_btb_vae', generate_save_dir='...', generate_save_name='dg_btb'). - Highlight resuming with
bulktb.vae_load('.../dg_btb_vae.pth')and the need to regenerate cells with consistent random seeds for reproducibility.
- Use
- Generate synthetic cells
- Produce filtered AnnData via
bulktb.vae_generate(leiden_size=25)and inspect compositions withov.bulk2single.bulk2single_plot_cellprop(...). - Save outputs to disk for reuse (
adata.write_h5ad).
- Produce filtered AnnData via
- Configure and train the GNN
- Call
bulktb.gnn_configure(max_epochs=2000, use_rep='X', neighbor_rep='X_pca', gpu=0, ...)to set hyperparameters. - Train using
bulktb.gnn_train(); reload checkpoints withbulktb.gnn_load('save_model/gnn.pth'). - Generate overlapping community assignments through
bulktb.gnn_generate().
- Call
- Visualise community structure
- Create MDE embeddings:
bulktb.nocd_obj.adata.obsm['X_mde'] = mde(bulktb.nocd_obj.adata.obsm['X_pca']). - Plot clusters vs. discovered communities using
sc.pl.embedding(..., color=['clusters','nocd_n'], palette=ov.utils.pyomic_palette())and filtered subsets excluding synthetic labels with hyphens.
- Create MDE embeddings:
- Interpolate missing states
- Run
bulktb.interpolation('OPC')(replace with target lineage) to synthesise continuity, then preprocess the interpolated AnnData (HVG selection, scaling, PCA). - Compute embeddings with
mde, visualise withov.utils.embedding, and compare to the original atlas.
- Run
- Analyse trajectories
- Initialise
ov.single.pyVIAon both original and interpolated data to derive pseudotime, followed byget_pseudotime,sc.pp.neighbors,ov.utils.cal_paga, andov.utils.plot_pagafor topology validation.
- Initialise
- Troubleshooting tips
- If the VAE collapses (high reconstruction loss), lower
learning_rateor reducehidden_size. - Ensure the same generated dataset is used before calling
gnn_train; regenerating cells changes the graph and can break checkpoint loading. - Sparse clusters may need adjusted
cell_target_numthresholds or a smallerleiden_sizefilter to retain rare populations.
- If the VAE collapses (high reconstruction loss), lower
Examples
- "Train BulkTrajBlend on PDAC cohorts, then interpolate missing OPC states in the trajectory."
- "Load saved beta-VAE and GNN weights to regenerate overlapping communities and plot cluster vs. nocd labels."
- "Run VIA on interpolated cells and compare PAGA graphs with the original scRNA-seq trajectory."
References
- Tutorial notebook:
t_bulktrajblend.ipynb - Example datasets and checkpoints:
omicverse_guide/docs/Tutorials-bulk2single/data/ - Quick copy/paste commands:
reference.md
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?