Agent skill

clustermarkersofallcells

Finds marker genes for clusters of ALL cells before T/B cell selection. This process identifies differentially expressed genes across unsupervised clusters to help identify broad cell types (T cells, B cells, Myeloid cells, NK cells, etc.) in mixed immune cell populations.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/clustermarkersofallcells

SKILL.md

ClusterMarkersOfAllCells Process Configuration

Purpose

Finds marker genes for clusters of ALL cells before T/B cell selection. This process identifies differentially expressed genes across unsupervised clusters to help identify broad cell types (T cells, B cells, Myeloid cells, NK cells, etc.) in mixed immune cell populations.

When to Use

  • After SeuratClusteringOfAllCells: Runs on all cells before T/B selection
  • Before TOrBCellSelection: Provides markers to identify which clusters are T/B cells
  • Broad cell type identification: Distinguish major immune cell types from mixed populations
  • Mixed cell populations: When your data contains T, B, Myeloid, NK, and other cell types
  • Initial cell typing: First-pass identification before detailed annotation
  • Data quality check: Verify expected cell types are present in your data

Configuration Structure

Process Enablement

toml
[ClusterMarkersOfAllCells]
cache = true

Input Specification

toml
[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
# Accepts output from SeuratClusteringOfAllCells process

Environment Variables

All parameters are inherited from ClusterMarkers and MarkersFinder:

toml
[ClusterMarkersOfAllCells.envs]
# Parallel computing
ncores = 1

# Grouping (uses seurat_clusters by default)
group_by = null  # null = use Seurat::Idents() (usually "seurat_clusters")

# Statistical test parameters (passed to Seurat::FindMarkers())
test.use = "wilcox"           # wilcox (Wilcoxon), bimod, roc, t, negbinom, poisson
min.pct = 0.1                  # Only test genes detected in >=10% of cells
logfc.threshold = 0.25         # Minimum log2 fold change

# Marker filtering
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"  # Filter for significant markers

# Enrichment analysis
dbs = ["KEGG_2021_Human", "MSigDB_Hallmark_2020"]
enrich_style = "enrichr"       # enrichr or clusterprofiler

# Error handling
error = false                  # Don't error out if no markers found

# Visualization
marker_plots_defaults = {"order_by": "desc(avg_log2FC)"}
allmarker_plots = {"Top 10 markers of all clusters": {"plot_type": "heatmap"}}

External References

Seurat FindMarkers Parameters

  • Full reference: https://satijalab.org/seurat/reference/findmarkers
  • Statistical tests: test.use parameter
    • "wilcox": Wilcoxon Rank Sum test (default, recommended)
    • "roc": Receiver Operating Characteristic
    • "t": Student's t-test
    • "negbinom": Negative binomial (requires DESeq2)
    • "poisson": Poisson test
  • Common arguments (use - instead of . in TOML):
    • min-pct: Minimum detection percentage in either group
    • logfc-threshold: Minimum log2 fold change threshold
    • only-pos: Only return positive markers
    • min-diff-pct: Minimum difference in detection percentage

Enrichment Databases

Configuration Examples

Minimal Configuration

toml
[SeuratClusteringOfAllCells]
[ClusterMarkersOfAllCells]

[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]

Standard Marker Finding

toml
[SeuratClusteringOfAllCells]
[ClusterMarkersOfAllCells]

[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]

[ClusterMarkersOfAllCells.envs]
# Find markers for broad cell type identification
dbs = ["MSigDB_Hallmark_2020", "KEGG_2021_Human"]
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0.25"

# Generate key visualizations
[ClusterMarkersOfAllCells.envs.marker_plots."Volcano Plot (log2FC)"]
plot_type = "volcano_log2fc"

[ClusterMarkersOfAllCells.envs.allmarker_plots."Top 10 markers of all clusters"]
plot_type = "heatmap"

[ClusterMarkersOfAllCells.envs.enrich_plots."Bar Plot"]
plot_type = "bar"
top_term = 10

Common Patterns

Pattern 1: Broad Cell Type Markers

toml
[ClusterMarkersOfAllCells.envs]
# Optimized for distinguishing T/B/Myeloid/NK cells
min-pct = 0.1              # Require detection in >=10% of cells
logfc-threshold = 0.25     # Minimum log2 fold change
test.use = "wilcox"        # Fast and robust
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"

# Visualize markers to identify cell types
[ClusterMarkersOfAllCells.envs.allmarker_plots."Top 20 markers per cluster"]
plot_type = "heatmap"

# Check for expected markers in outputs
# T cells: CD3D, CD3E, CD3G, CD4, CD8A
# B cells: CD19, MS4A1 (CD20), CD79A, CD79B
# Myeloid: CD14, LYZ, FCGR3A, CD68
# NK cells: NCAM1 (CD56), KLRD1 (CD94), NKG7

Pattern 2: Quick Wilcoxon for Large Datasets

toml
[ClusterMarkersOfAllCells.envs]
# Fast analysis for large datasets (>50k cells)
ncores = 8                  # Use multiple cores
test.use = "wilcox"
min-pct = 0.15              # More stringent to reduce noise
logfc-threshold = 0.3
sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 0.5"

# Skip enrichment to save time
dbs = []

# Generate only essential plots
[ClusterMarkersOfAllCells.envs.allmarker_plots."Top markers heatmap"]
plot_type = "heatmap"

Pattern 3: Identify T/B Cell Clusters

toml
[ClusterMarkersOfAllCells.envs]
# Focus on finding T and B cell markers for selection
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 1"

# Will help identify which clusters express:
# T cell markers: CD3D, CD3E, CD3G
# B cell markers: CD19, MS4A1, CD79A

[ClusterMarkersOfAllCells.envs.allmarker_plots."All markers heatmap"]
plot_type = "heatmap"

Difference from ClusterMarkers

Aspect ClusterMarkersOfAllCells ClusterMarkers
Timing BEFORE TOrBCellSelection AFTER TOrBCellSelection
Data Scope ALL cells (mixed population) SELECTED T/B cells only
Purpose Identify broad cell types Fine-grained sub-clusters
Typical markers CD3, CD19, CD14, NK markers Activation, differentiation markers
Use case "Which clusters are T/B/Myeloid?" "What subtypes exist within T cells?"
Upstream SeuratClusteringOfAllCells SeuratClustering (post-selection)
Downstream TOrBCellSelection Cell type annotation, downstream analysis

Key insight: Use ClusterMarkersOfAllCells when you need to separate T/B cells from other cell types. Use ClusterMarkers when you want to analyze sub-clusters within already-purified T or B cell populations.

Dependencies

Upstream Processes

  • SeuratClusteringOfAllCells: Required - provides clustered object with seurat_clusters metadata
  • SeuratPreparing: Indirect - provides normalized Seurat object
  • SampleInfo or LoadingRNAFromSeurat: Entry point for data

Downstream Processes

  • TOrBCellSelection: Primary consumer - uses marker results to select T/B cells
  • TopExpressingGenesOfAllCells: Optional complementary analysis

Validation Rules

Required Inputs

toml
[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]  # Must be specified

Process Enablement

  • Process automatically enabled when SeuratClusteringOfAllCells is in config
  • No need to explicitly set [ClusterMarkersOfAllCells] if SeuratClusteringOfAllCells is enabled

Parameter Constraints

  • test.use: Must be one of "wilcox", "roc", "t", "negbinom", "poisson"
  • min-pct: Should be between 0 and 1 (e.g., 0.1 = 10%)
  • logfc-threshold: Numeric value (log2 scale)
  • sigmarkers: Valid dplyr filter expression

Common Errors

  • Missing clustering: Ensure SeuratClusteringOfAllCells runs first
  • No markers found: Adjust sigmarkers or logfc-threshold if too stringent
  • Memory issues: Reduce ncores or subset data with large datasets

Troubleshooting

Issue: No significant markers found

Symptoms: Empty output directory or warning about no markers

Solutions:

toml
[ClusterMarkersOfAllCells.envs]
# Less stringent thresholds
logfc-threshold = 0.1           # Lower fold change requirement
min-pct = 0.05                 # Lower detection percentage
sigmarkers = "p_val_adj < 0.1"  # More relaxed p-value

# Or check data quality
# - Are cells properly clustered?
# - Is expression matrix normalized?
# - Are there enough cells per cluster (>30 recommended)?

Issue: Too many markers (slow enrichment)

Symptoms: Process takes very long, memory issues

Solutions:

toml
[ClusterMarkersOfAllCells.envs]
# More stringent filtering
logfc-threshold = 0.5
min-pct = 0.2
sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 1"

# Reduce enrichment databases
dbs = ["MSigDB_Hallmark_2020"]

# Or skip enrichment entirely
dbs = []

Issue: Can't identify T/B cell clusters

Symptoms: Markers don't show clear T/B cell signatures

Solutions:

  1. Check marker gene presence:

    toml
    # Verify expected markers are in your data
    # Use SeuratClusterStats to visualize:
    [SeuratClusterStats.envs.features_defaults]
    features = ["CD3D", "CD3E", "CD19", "MS4A1", "CD14", "LYZ"]
    
  2. Adjust clustering parameters:

    toml
    [SeuratClusteringOfAllCells.envs]
    res = 0.5  # Try different resolutions (0.2-1.5)
    
  3. Check data quality:

    • Are genes properly normalized?
    • Are there enough cells per cluster?
    • Is species correct (human vs mouse gene symbols)?

Issue: Process not running

Symptoms: Process skipped in workflow

Solutions:

  • Verify SeuratClusteringOfAllCells is in config
  • Check dependencies are running correctly
  • Ensure TCR data requires T/B selection (not all T cells already)

Typical Marker Genes for Identification

Cell Type Positive Markers Negative Markers
T cells CD3D, CD3E, CD3G, CD4, CD8A CD19, MS4A1, CD14
B cells CD19, MS4A1 (CD20), CD79A, CD79B CD3E, CD3D, CD14
Monocytes CD14, LYZ, FCGR3A, S100A8 CD3E, CD19
NK cells NCAM1 (CD56), KLRD1 (CD94), NKG7 CD3E, CD19, CD14
Dendritic cells FCER1A, CST3 CD3E, CD19, CD14
Megakaryocytes PPBP, PF4 CD3E, CD19, CD14

Use these marker lists to identify which clusters correspond to which cell types in your allmarker_plots heatmaps.

Didn't find tool you were looking for?

Be as detailed as possible for better results