Agent skill

cdr3aaphyschem

Analyzes physicochemical properties of CDR3 amino acid sequences to understand biochemical characteristics of T-cell receptor repertoires. Performs regression analysis between two cell groups at different CDR3 lengths for each physicochemical feature (hydrophobicity, volume, isoelectric point, etc.).

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/cdr3aaphyschem

SKILL.md

CDR3AAPhyschem Process Configuration

Purpose

Analyzes physicochemical properties of CDR3 amino acid sequences to understand biochemical characteristics of T-cell receptor repertoires. Performs regression analysis between two cell groups at different CDR3 lengths for each physicochemical feature (hydrophobicity, volume, isoelectric point, etc.).

When to Use

  • To analyze CDR3 biochemical properties differences between cell groups (e.g., Treg vs Tconv)
  • For feature engineering in TCR machine learning models
  • To identify sequence features that distinguish cell subsets
  • After ScRepCombiningExpression (requires combined TCR + RNA data)
  • When investigating T cell fate determination (regulatory vs conventional T cells)

Configuration Structure

Process Enablement

toml
[CDR3AAPhyschem]
cache = true

Input Specification

toml
[CDR3AAPhyschem.in]
scrfile = ["ScRepCombiningExpression"]
  • scrfile: Output from ScRepCombiningExpression (RDS or qs/qs2 format)
  • Must contain both TRA and TRB chains
  • Generated by scRepertoire::combineExpression()

Environment Variables

toml
[CDR3AAPhyschem.envs]
# Group comparison specification
group = "CellType"
comparison = {Treg = ["CD4 CTL", "CD4 Naive", "CD4 TCM", "CD4 TEM"], Tconv = "Tconv"}
target = "Treg"
each = "Sample"

# Chain selection
chain = "TRB"

Key Parameters:

  • group: Column name in metadata defining groups to compare (e.g., CellType, seurat_clusters)
  • comparison: Two-group specification for regression analysis
    • Format 1 (dict): Group1 = ["cell1", "cell2"], Group2 = "cell3"
    • Format 2 (list): ["Group1", "Group2"] (when groups exist in column)
  • target: Which group to label as 1 in regression (default: first group in comparison)
  • each: Column(s) to split data for separate analyses
    • Single column: "Sample"
    • Multiple columns: ["Sample", "Patient"]
    • Comma-separated: "Sample,Patient"
    • If not provided, all cells used together

Configuration Examples

Minimal Configuration

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.in]
scrfile = ["ScRepCombiningExpression"]

Standard Treg vs Tconv Analysis

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Define cell type groups for comparison
group = "CellType"
comparison = {Treg = ["Treg"], Tconv = ["Tconv"]}
target = "Treg"
chain = "TRB"

Multi-Sample Analysis

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = ["Treg", "Tconv"]
target = "Treg"
# Run regression separately for each sample
each = "Sample"
chain = "TRB"

Custom Group Definition

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "Cluster"
# Define clusters to compare
comparison = {
  HighQuality = ["c1", "c2", "c5"],
  LowQuality = ["c3", "c4"]
}
target = "HighQuality"
chain = "TRB"

Physicochemical Properties

Available Properties

The process calculates 8 key physicochemical properties from CDR3 amino acid sequences:

Property Description Biological Significance
length Total amino acid count in CDR3 Influences binding loop size and flexibility
gravy Grand Average of Hydrophobicity (Kyte-Doolittle scale) Hydrophobic CDR3s associate with self-reactivity and Treg fate
bulkiness Average bulkiness (Zimmerman scale) Measures steric bulk of amino acids
polarity Average polarity (Grantham scale) Influences interactions with peptide-MHC
aliphatic Normalized aliphatic index (Ikai scale) Related to thermal stability
charge Normalized net charge at physiological pH Affects electrostatic interactions
acidic Acidic side chain residue content (D, E proportion) Contributes to negative charge
aromatic Aromatic side chain content (F, W, Y proportion) Important for π-π interactions

Property Calculation Methods

  • Default scales: Standard biophysical scales from peer-reviewed literature
  • GRAVY: Kyte & Doolittle (1982) hydropathy scale
  • Bulkiness: Zimmerman et al. (1968) bulkiness parameters
  • Polarity: Grantham (1974) amino acid difference index
  • Aliphatic index: Ikai (1980) thermodynamic stability scale
  • Charge: Normalized based on pKa values (EMBOSS database)
  • Acidic/Basic/Aromatic: Direct residue counting proportions

Regression Analysis

  • Performed for each physicochemical property independently
  • Compares properties across CDR3 length distributions
  • Binary classification: target group (1) vs non-target (0)
  • Output: Statistical significance of property differences

Common Patterns

Pattern 1: Treg vs Tconv (TRB Chain)

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Literature-based: hydrophobic CDR3β promotes Treg fate
group = "CellType"
comparison = {Treg = ["Treg", "CD4+Treg"], Tconv = ["Tconv", "CD4+Tconv"]}
target = "Treg"
chain = "TRB"
each = ""  # Analyze all samples together

Pattern 2: Selected Properties Only

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Focus on hydrophobicity (key Treg feature)
group = "CellType"
comparison = ["Treg", "Tconv"]
target = "Treg"
chain = "TRB"
# To analyze specific chains separately

Pattern 3: Multi-Chain Analysis

Run separate processes for different chains:

toml
# TRB analysis
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
chain = "TRB"
group = "CellType"
comparison = ["Treg", "Tconv"]

# Note: Create separate config for TRA analysis if needed

Pattern 4: Multi-Group Comparisons

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = {
  Naive = ["CD4 Naive", "CD8 Naive"],
  Memory = ["CD4 TEM", "CD4 TCM", "CD8 TEM", "CD8 TCM"],
  Effector = ["CD4 CTL", "CD8 CTL"]
}
target = "Naive"
chain = "TRB"

Dependencies

  • Upstream: ScRepCombiningExpression (required)
  • Downstream: Feature analysis, ML model training, publication figures
  • Required data: Both TRA and TRB chains in combined object

Validation Rules

  • CDR3 sequence requirements: Must have valid amino acid sequences (no Ns)
  • Chain requirement: Data must contain specified chain (TRA or TRB)
  • Group specification: Groups must exist in metadata
  • Minimum cells: Sufficient cells per group for statistical regression
  • Length distribution: CDR3 length range must be adequate for regression

Troubleshooting

Issue: "Missing chain in data"

Cause: Specified chain (TRA/TRB) not found in combined object Solution:

toml
# Change to available chain
[CDR3AAPhyschem.envs]
chain = "TRA"  # or "TRB"

Issue: "Group not found in metadata"

Cause: group column or comparison values don't exist Solution:

  1. Check available metadata columns in ScRepCombiningExpression output
  2. Verify group names match exactly (case-sensitive)
toml
[CDR3AAPhyschem.envs]
group = "seurat_clusters"  # If CellType not available
comparison = ["0", "1"]  # Use cluster IDs

Issue: "Insufficient cells for regression"

Cause: Too few cells in one or more groups Solution:

  1. Use each to analyze samples separately if pooled analysis fails
  2. Combine similar cell types in comparison
toml
[CDR3AAPhyschem.envs]
# Combine rare subtypes
comparison = {HighExpander = ["Treg", "Tconv"], LowExpander = ["Tfh"]}

Issue: "No significant property differences"

Cause: Groups may not differ in physicochemical properties Solution:

  1. Check if comparison groups are biologically distinct
  2. Consider different group column (e.g., gene expression clusters)
  3. Verify CDR3 sequences are high-quality

Scientific Context

Key Publications

  1. Stadinski et al. (2016): "Hydrophobic CDR3 residues promote development of self-reactive T cells" - Nature Immunology
  2. Lagattuta et al. (2022): "TCR sequence features influence T cell fate" - Nature Immunology
  3. Ostmeyer et al. (2019): "Biophysicochemical motifs distinguish TILs from healthy tissue" - Cancer Research

Interpretation Guidelines

  • High GRAVY: More hydrophobic CDR3 (associated with self-reactivity, Treg)
  • High charge: Electrostatic potential may affect binding affinity
  • High aromaticity: Increased π-π interactions, structural stability
  • Length distribution: Longer CDR3s may provide broader specificity

Feature Engineering Applications

Use properties as features for:

  • TCR specificity prediction models
  • T cell fate classification (Treg vs Tconv)
  • Antigen binding affinity estimation
  • Cross-reactivity assessment

Output Format

  • Directory: {{in.scrfile | stem}}.cdr3aaphyschem/
  • Files:
    • Regression plots per property (hydrophobicity, volume, pI)
    • Statistical tables comparing groups
    • CDR3 length distributions
    • Property correlation matrices
  • Visualizations:
    • Property vs length scatter plots
    • Group-wise property boxplots
    • Regression curves with confidence intervals

Advanced Usage

Custom Property Scales

If using non-default scales (requires modifying underlying R script):

toml
# Note: Advanced usage - may require script modification
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Specify alternative hydrophobicity scale
hydro_scale = "Wimley"
pK_source = "Murray"

Length-Based Stratification

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Analyze by CDR3 length bins
group = "CellType"
comparison = ["Treg", "Tconv"]
# Use metadata column with length information
each = "CDR3_Length_Bin"
chain = "TRB"

Publication-Ready Plots

toml
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = {Treg = "Treg", Tconv = "Tconv"}
target = "Treg"
chain = "TRB"
# Publication parameters
plot_theme = "nature"
fig_dpi = 300
fig_format = "pdf"

Didn't find tool you were looking for?

Be as detailed as possible for better results