Agent skill

clonalstats

Generate comprehensive clonality statistics and diversity visualizations for TCR/BCR repertoire analysis. Quantifies clonal expansion, measures diversity metrics (Shannon, Simpson, Gini), and creates publication-ready plots.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/clonalstats

SKILL.md

ClonalStats Process Configuration

Purpose

Generate comprehensive clonality statistics and diversity visualizations for TCR/BCR repertoire analysis. Quantifies clonal expansion, measures diversity metrics (Shannon, Simpson, Gini), and creates publication-ready plots.

When to Use

  • To quantify clonal expansion patterns in TCR/BCR data
  • For diversity analysis comparing multiple samples or conditions
  • To identify hyperexpanded clones and their distribution
  • For rarefaction analysis to assess sampling depth
  • After ScRepCombiningExpression to analyze integrated TCR+RNA data

Configuration Structure

Process Enablement

toml
[ClonalStats]
cache = true

Input Specification

toml
[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

Core Environment Variables

toml
[ClonalStats.envs]
# Clone definition: "gene" (VDJC), "aa" (CDR3 amino acid), "nt" (CDR3 nucleotide)
clone_call = "aa"
# Chain analysis: "both", "TRA", "TRB", "TRG", "IGH", "IGL"
chain = "both"
# Data transformations (dplyr::mutate syntax)
mutaters = {}
# Data filtering (dplyr::filter syntax)
subset = null
# Output device parameters
devpars = {width = 800, height = 600, res = 100}
# Save code and data (large files - use with caution)
save_code = false
save_data = false

Case-Based Plot Generation

toml
[ClonalStats.envs.cases."Case Name"]
viz_type = "volume"  # volume, abundance, length, residency, stat,
                    # composition, overlap, diversity, geneusage,
                    # positional, kmer, rarefaction

Diversity Metrics

Metric Range Interpretation Best For
shannon 0 - ∞ Higher = more diversity General comparison
inv.simpson 1 - ∞ Higher = more diversity Common clones
gini.coeff 0 - 1 0 = equality, 1 = inequality Clonality dominance
norm.entropy 0 - 1 Higher = more diversity Evenness-focused
chao1 ≥ richness Estimates total richness Small samples
d50 Count Clones making up 50% Practical dominance

Interpretation:

  • High diversity = Many unique clones, even distribution (healthy repertoire)
  • Low diversity = Few dominant clones (antigen-specific response, infection, cancer)
  • Gini ≈ 1 = Very skewed, few clones dominate
  • Gini ≈ 0 = Even distribution

Visualization Types

viz_type options:

  • volume - Number of clones per sample/group
  • abundance - Clone abundance distribution (trend/histogram/density)
  • length - CDR3 sequence length distribution
  • residency - Clones present across groups (venn/upset)
  • stat - Expanded clone analysis (pies/sankey)
  • diversity - Diversity metrics (bar/box/violin)
  • geneusage - V/D/J gene usage frequency
  • rarefaction - Sampling depth assessment

Configuration Examples

Minimal Configuration

toml
[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

Standard Diversity Analysis

toml
[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Diversity"]
viz_type = "diversity"
method = "shannon"
plot_type = "box"
group_by = "Diagnosis"
comparisons = true

[ClonalStats.envs.cases."Gini Coeff"]
viz_type = "diversity"
method = "gini.coeff"
plot_type = "violin"
group_by = "Diagnosis"
add_box = true

Expanded Clone Analysis

toml
[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Expanded Clones"]
viz_type = "stat"
plot_type = "pies"
group_by = "Diagnosis"
subgroup_by = "seurat_clusters"
clones = {"Expanded (>2)" = "sel(Colitis > 2)"}

Rarefaction Analysis

toml
[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Rarefaction"]
viz_type = "rarefaction"
group_by = "Patient"
q = 1  # 0=richness, 1=shannon, 2=simpson
n_boots = 20

Complete Analysis Suite

toml
[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Volume"]
viz_type = "volume"

[ClonalStats.envs.cases."Abundance"]
viz_type = "abundance"
plot_type = "density"

[ClonalStats.envs.cases."Diversity"]
viz_type = "diversity"
method = "shannon"

[ClonalStats.envs.cases."Rarefaction"]
viz_type = "rarefaction"

Common Patterns

Disease vs Healthy

toml
[ClonalStats.envs.cases."Comparison"]
viz_type = "diversity"
method = "gini.coeff"
plot_type = "box"
group_by = "Condition"
comparisons = true

Time Course

toml
[ClonalStats.envs.cases."Timepoint"]
viz_type = "volume"
x = "Timepoint"

[ClonalStats.envs.cases."Diversity"]
viz_type = "diversity"
method = "shannon"
group_by = "Timepoint"

Treatment Response

toml
[ClonalStats.envs.cases."Response"]
viz_type = "diversity"
method = "gini.coeff"
group_by = "Response"
plot_type = "box"
comparisons = true

Dependencies

  • Upstream: ScRepCombiningExpression (required)
  • Related: ScRepLoading, CDR3Clustering, TESSA (optional)

Validation Rules

  • Input must be valid scRepertoire object
  • For viz_type = "diversity", method must be supported
  • For rarefaction, n_boots should be ≥ 10
  • Use sel() syntax in clones parameter for filtering

Troubleshooting

Sample column not found: Input must have Sample column or specify x parameter.

Strange diversity values: Small repertoire sizes cause bias. Use plot_type = "box".

Rarefaction curves noisy: Increase n_boots (try 50-100).

Too many clones in stat plots: Use subset or stricter clones thresholds.

Plot generation slow: Use clone_call = "gene" for speed, apply subset.

Missing comparisons: Set comparisons = true to add significance tests.

Best Practices

  1. Start with default cases to see standard visualizations
  2. Use multiple diversity metrics: Shannon + Gini
  3. Check rarefaction curves to ensure sufficient sampling
  4. Document clone thresholds when defining expanded clones
  5. Use clone_call = "gene" for speed, "aa" for granularity
  6. Set save_data = true for debugging (watch disk space)
  7. Validate findings with complementary diversity indices
  8. Consider sample size: small samples underestimate richness

Didn't find tool you were looking for?

Be as detailed as possible for better results