ClonalStats Process Configuration

Purpose

Generate comprehensive clonality statistics and diversity visualizations for TCR/BCR repertoire analysis. Quantifies clonal expansion, measures diversity metrics (Shannon, Simpson, Gini), and creates publication-ready plots.

When to Use

To quantify clonal expansion patterns in TCR/BCR data
For diversity analysis comparing multiple samples or conditions
To identify hyperexpanded clones and their distribution
For rarefaction analysis to assess sampling depth
After ScRepCombiningExpression to analyze integrated TCR+RNA data

Configuration Structure

Process Enablement

toml

[ClonalStats]
cache = true

Input Specification

toml

[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

Core Environment Variables

toml

[ClonalStats.envs]
# Clone definition: "gene" (VDJC), "aa" (CDR3 amino acid), "nt" (CDR3 nucleotide)
clone_call = "aa"
# Chain analysis: "both", "TRA", "TRB", "TRG", "IGH", "IGL"
chain = "both"
# Data transformations (dplyr::mutate syntax)
mutaters = {}
# Data filtering (dplyr::filter syntax)
subset = null
# Output device parameters
devpars = {width = 800, height = 600, res = 100}
# Save code and data (large files - use with caution)
save_code = false
save_data = false

Case-Based Plot Generation

toml

[ClonalStats.envs.cases."Case Name"]
viz_type = "volume"  # volume, abundance, length, residency, stat,
                    # composition, overlap, diversity, geneusage,
                    # positional, kmer, rarefaction

Diversity Metrics

Metric	Range	Interpretation	Best For
shannon	0 - ∞	Higher = more diversity	General comparison
inv.simpson	1 - ∞	Higher = more diversity	Common clones
gini.coeff	0 - 1	0 = equality, 1 = inequality	Clonality dominance
norm.entropy	0 - 1	Higher = more diversity	Evenness-focused
chao1	≥ richness	Estimates total richness	Small samples
d50	Count	Clones making up 50%	Practical dominance

Interpretation:

High diversity = Many unique clones, even distribution (healthy repertoire)
Low diversity = Few dominant clones (antigen-specific response, infection, cancer)
Gini ≈ 1 = Very skewed, few clones dominate
Gini ≈ 0 = Even distribution

Visualization Types

viz_type options:

volume - Number of clones per sample/group
abundance - Clone abundance distribution (trend/histogram/density)
length - CDR3 sequence length distribution
residency - Clones present across groups (venn/upset)
stat - Expanded clone analysis (pies/sankey)
diversity - Diversity metrics (bar/box/violin)
geneusage - V/D/J gene usage frequency
rarefaction - Sampling depth assessment

Configuration Examples

Minimal Configuration

toml

[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

Standard Diversity Analysis

toml

[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Diversity"]
viz_type = "diversity"
method = "shannon"
plot_type = "box"
group_by = "Diagnosis"
comparisons = true

[ClonalStats.envs.cases."Gini Coeff"]
viz_type = "diversity"
method = "gini.coeff"
plot_type = "violin"
group_by = "Diagnosis"
add_box = true

Expanded Clone Analysis

toml

[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Expanded Clones"]
viz_type = "stat"
plot_type = "pies"
group_by = "Diagnosis"
subgroup_by = "seurat_clusters"
clones = {"Expanded (>2)" = "sel(Colitis > 2)"}

Rarefaction Analysis

toml

[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Rarefaction"]
viz_type = "rarefaction"
group_by = "Patient"
q = 1  # 0=richness, 1=shannon, 2=simpson
n_boots = 20

Complete Analysis Suite

toml

[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Volume"]
viz_type = "volume"

[ClonalStats.envs.cases."Abundance"]
viz_type = "abundance"
plot_type = "density"

[ClonalStats.envs.cases."Diversity"]
viz_type = "diversity"
method = "shannon"

[ClonalStats.envs.cases."Rarefaction"]
viz_type = "rarefaction"

Common Patterns

Disease vs Healthy

toml

[ClonalStats.envs.cases."Comparison"]
viz_type = "diversity"
method = "gini.coeff"
plot_type = "box"
group_by = "Condition"
comparisons = true

Time Course

toml

[ClonalStats.envs.cases."Timepoint"]
viz_type = "volume"
x = "Timepoint"

[ClonalStats.envs.cases."Diversity"]
viz_type = "diversity"
method = "shannon"
group_by = "Timepoint"

Treatment Response

toml

[ClonalStats.envs.cases."Response"]
viz_type = "diversity"
method = "gini.coeff"
group_by = "Response"
plot_type = "box"
comparisons = true

Dependencies

Upstream: ScRepCombiningExpression (required)
Related: ScRepLoading, CDR3Clustering, TESSA (optional)

Validation Rules

Input must be valid scRepertoire object
For viz_type = "diversity", method must be supported
For rarefaction, n_boots should be ≥ 10
Use sel() syntax in clones parameter for filtering

Troubleshooting

Sample column not found: Input must have Sample column or specify x parameter.

Strange diversity values: Small repertoire sizes cause bias. Use plot_type = "box".

Rarefaction curves noisy: Increase n_boots (try 50-100).

Too many clones in stat plots: Use subset or stricter clones thresholds.

Plot generation slow: Use clone_call = "gene" for speed, apply subset.

Missing comparisons: Set comparisons = true to add significance tests.

Best Practices

Start with default cases to see standard visualizations
Use multiple diversity metrics: Shannon + Gini
Check rarefaction curves to ensure sufficient sampling
Document clone thresholds when defining expanded clones
Use clone_call = "gene" for speed, "aa" for granularity
Set save_data = true for debugging (watch disk space)
Validate findings with complementary diversity indices
Consider sample size: small samples underestimate richness

Search AI Tools

clonalstats

Install this agent skill to your Project

SKILL.md

ClonalStats Process Configuration

Purpose

When to Use

Configuration Structure

Process Enablement

Input Specification

Core Environment Variables

Case-Based Plot Generation

Diversity Metrics

Visualization Types

Configuration Examples

Minimal Configuration

Standard Diversity Analysis

Expanded Clone Analysis

Rarefaction Analysis

Complete Analysis Suite

Common Patterns

Disease vs Healthy

Time Course

Treatment Response

Dependencies

Validation Rules

Troubleshooting

Best Practices