Agent skill
screploading
Load single-cell TCR-seq or scBCR-seq data from various formats into a scRepertoire-compatible object. This process reads VDJ (variable, diversity, joining) receptor contig data from multiple single-cell sequencing platforms and prepares it for integration with scRNA-seq data.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/screploading
SKILL.md
ScRepLoading Process Configuration
Purpose
Load single-cell TCR-seq or scBCR-seq data from various formats into a scRepertoire-compatible object. This process reads VDJ (variable, diversity, joining) receptor contig data from multiple single-cell sequencing platforms and prepares it for integration with scRNA-seq data.
When to Use
- When analyzing scTCR-seq or scBCR-seq data alongside scRNA-seq
- Required for TCR/BCR clonotype analysis (CDR3 clustering, clone expansion, TESSA analysis)
- Enables integration of immune receptor information with single-cell expression data
- Supports multiple sequencing platforms: 10x Genomics, AIRR, BD, Dandelion, Immcantation, MiXCR, ParseBio, TRUST4, WAT3R, Omniscope
Important: This process is automatically enabled when your sample info file contains TCRData or BCRData columns.
Configuration Structure
Process Enablement
[ScRepLoading]
cache = true # Enable caching (default: true)
Input Specification
[ScRepLoading.in]
# Type: file
# Required: yes
# Description: Sample metadata file (tab-delimited) with TCR/BCR data paths
metafile = "path/to/sample_info.txt"
Required input file columns:
Sample: Unique identifier for each sample (required)TCRData(for TCR analysis): Directory path to scTCR-seq dataBCRData(for BCR analysis): Directory path to scBCR-seq data- Additional columns: Treated as sample metadata (optional)
Data format requirements:
- 10x Genomics: Directory containing
filtered_contig_annotations.csvorall_contig_annotations.csv - AIRR format: Directory containing
airr_rearrangement.tsv - BD platform: Directory containing
Contigs_AIRR.tsv - Dandelion: Directory containing
all_contig_dandelion.tsv - Immcantation: Directory containing
_data.tsvor similar - JSON: File with
.jsonextension - MiXCR: Directory containing
clones.tsv - ParseBio: Directory containing
barcode_report.tsv - TRUST4: Directory containing
barcode_report.tsv - WAT3R: Directory containing
barcode_results.csv - Omniscope: Directory containing
.csvfiles
Path handling:
- If
TCRData/BCRDataspecifies a directory: Process usesscRepertoire::loadContigs()directly - If
TCRData/BCRDataspecifies a file: Creates symbolic link to temp directory for processing - When filename is not recognized by scRepertoire: Set
envs.formatexplicitly
Environment Variables
[ScRepLoading.envs]
# type: choice - Data type to load (default: "auto")
# Options:
# "TCR" - T cell receptor data
# "BCR" - B cell receptor data
# "auto" - Auto-detect from column names in sample info
# Note: If both TCRData and BCRData present, TCR selected by default
type = "auto"
# format: choice - Format of TCR/BCR data files (optional)
# Options: auto, 10X, AIRR, BD, Dandelion, Immcantation,
# JSON, MiXCR, Omniscope, ParseBio, TRUST4, WAT3R
# If not provided, scRepertoire guesses from filename
format = "auto"
# combineTCR: json - Extra arguments for scRepertoire::combineTCR()
# See: https://rdrr.io/github/ncborcherding/scRepertoire/man/combinetcr
combineTCR = {"samples": true}
# combineBCR: json - Extra arguments for scRepertoire::combineBCR()
# See: https://rdrr.io/github/ncborcherding/scRepertoire/man/combinebcr
combineBCR = {"samples": true}
# exclude: auto or list - Columns to exclude from metadata (default: auto)
# auto = ["BCRData", "TCRData", "RNAData"]
# Can also be comma-separated string: "BCRData,TCRData,RNAData"
exclude = "auto"
# tmpdir: str - Temporary directory for symbolic links (default: "/tmp")
tmpdir = "/tmp"
Detailed combineTCR Parameters
[ScRepLoading.envs.combineTCR]
# samples: bool or list - Sample labels (default: true)
# true = use Sample column from metadata
# false = no sample grouping
# list = explicit sample labels
samples = true
# ID: str or null - Additional sample labeling (optional)
# Adds prefix to barcodes to prevent duplicate issues
ID = null
# removeNA: bool - Remove cells with missing chain values (default: false)
# true = filter out cells with NA in any chain
# false = include cells with 1 NA value (default)
removeNA = false
# removeMulti: bool - Remove cells with >2 chains (default: false)
# true = filter out multi-chain cells (>2 chains)
# false = include multi-chain cells (default)
removeMulti = false
# filterMulti: bool - Select highest-expression chain for multi-chain (TCR default: false)
# true = keep highest UMI count chain if multiple chains present
# false = keep all chains (default)
filterMulti = false
# filterNonproductive: bool - Remove non-productive rearrangements (default: true)
# true = filter out non-functional receptors
# false = include all rearrangements
filterNonproductive = true
Detailed combineBCR Parameters
[ScRepLoading.envs.combineBCR]
# samples: bool or list - Sample labels (default: true)
samples = true
# ID: str or null - Additional sample labeling (optional)
ID = null
# call.related.clones: bool - Cluster related BCR clones (default: true)
# Uses nucleotide sequence + V gene with Levenshtein distance
# false = uses V gene + amino acid sequence for CTstrict
call.related.clones = true
# threshold: num - Normalized edit distance for clustering (default: 0.85)
# Higher = more permissive clustering (more sequences grouped)
# Range: 0.0 - 1.0
threshold = 0.85
# removeNA: bool - Remove cells with missing chain values (default: false)
removeNA = false
# removeMulti: bool - Remove cells with >2 chains (default: false)
removeMulti = false
# filterMulti: bool - Select highest-expression chain (default: true)
# true = keep highest UMI count chain
# false = keep all chains
filterMulti = true
# filterNonproductive: bool - Remove non-productive rearrangements (default: true)
filterNonproductive = true
Configuration Examples
Minimal Configuration (10x TCR Data)
[SampleInfo.in]
infile = "sample_info.txt"
# Sample info file contents:
# Sample Age Sex Diagnosis RNAData TCRData
# C1 62 F Colitis /data/C1/rna /data/C1/tcr
# C2 71 F Colitis /data/C2/rna /data/C2/tcr
# ScRepLoading auto-enables when TCRData column present
# No explicit ScRepLoading section needed
Single Sample with Format Specification
[ScRepLoading]
cache = true
[ScRepLoading.in]
metafile = "metadata/single_sample.txt"
[ScRepLoading.envs]
type = "TCR"
format = "10X"
[ScRepLoading.envs.combineTCR]
removeNA = true
filterNonproductive = true
Multi-Sample BCR Analysis with Clustering
[ScRepLoading]
cache = true
[ScRepLoading.in]
metafile = "metadata/bcr_samples.txt"
[ScRepLoading.envs]
type = "BCR"
[ScRepLoading.envs.combineBCR]
call.related.clones = true
threshold = 0.85 # Higher threshold for more permissive clustering
filterMulti = true
removeMulti = false
Non-10x Format (AIRR)
[ScRepLoading]
[ScRepLoading.in]
metafile = "metadata/airr_samples.txt"
[ScRepLoading.envs]
format = "AIRR"
type = "auto"
[ScRepLoading.envs.combineTCR]
removeNA = false
removeMulti = false
TRUST4 Format
[ScRepLoading]
[ScRepLoading.in]
metafile = "metadata/trust4_samples.txt"
[ScRepLoading.envs]
format = "TRUST4"
[ScRepLoading.envs.combineTCR]
removeNA = true
filterNonproductive = true
Common Patterns
Pattern 1: 10x Genomics TCR Data (Most Common)
# sample_info.txt
# Sample RNAData TCRData
# Sample1 /data/Sample1/rna /data/Sample1/vdj
# Sample2 /data/Sample2/rna /data/Sample2/vdj
[SampleInfo.in]
infile = "sample_info.txt"
# TCR directories must contain filtered_contig_annotations.csv
# No ScRepLoading configuration needed - auto-detected
Pattern 2: Both TCR and BCR Data (Auto-Detect TCR)
# sample_info.txt
# Sample RNAData TCRData BCRData
# Sample1 /data/Sample1/rna /data/Sample1/tcr /data/Sample1/bcr
[SampleInfo.in]
infile = "sample_info.txt"
# TCR selected by default when both columns present
# To explicitly analyze BCR instead:
[ScRepLoading.envs]
type = "BCR"
Pattern 3: Filtered TCR Data (Remove NA and Multi-Chain)
[ScRepLoading]
[ScRepLoading.in]
metafile = "metadata/tcr_filtered.txt"
[ScRepLoading.envs.combineTCR]
removeNA = true # Remove cells with missing chains
removeMulti = true # Remove cells with >2 chains
filterNonproductive = true # Remove non-functional receptors
Pattern 4: Relaxed Filtering for Exploratory Analysis
[ScRepLoading]
[ScRepLoading.in]
metafile = "metadata/tcr_exploratory.txt"
[ScRepLoading.envs.combineTCR]
removeNA = false # Keep cells with single chain
removeMulti = false # Include multi-chain cells for inspection
filterNonproductive = false # Include non-productive rearrangements
Pattern 5: BCR Clone Clustering with Custom Threshold
[ScRepLoading]
[ScRepLoading.in]
metafile = "metadata/bcr_clustering.txt"
[ScRepLoading.envs.combineBCR]
call.related.clones = true
threshold = 0.90 # More stringent clustering (lower = more permissive)
Pattern 6: Sample-Specific Labeling
[ScRepLoading]
[ScRepLoading.in]
metafile = "metadata/longitudinal.txt"
[ScRepLoading.envs.combineTCR]
samples = true # Use Sample column from metadata
ID = "Timepoint" # Add Timepoint as additional label prefix
# Creates barcodes like: "Sample1_Timepoint1_AAACCC..."
# Prevents duplicate barcode issues across timepoints
Pattern 7: Custom Metadata Exclusion
[ScRepLoading]
[ScRepLoading.in]
metafile = "metadata/custom_columns.txt"
[ScRepLoading.envs]
exclude = ["RNAData", "TCRData", "BCRData", "ExperimentID", "Batch"]
# These columns excluded from scRepertoire object metadata
# Helps reduce metadata clutter in downstream analysis
Pattern 8: Paired Chain Analysis (TRA+TRB for TCR)
# Default behavior - ScRepLoading automatically pairs chains
# at cell barcode level when both TRA and TRB present
[ScRepLoading]
[ScRepLoading.in]
metafile = "metadata/tcr_paired.txt"
[ScRepLoading.envs.combineTCR]
removeNA = false # Keep single-chain cells for inspection
filterMulti = false # Don't filter multi-chain cells
# Later analysis can filter for true paired chains
# Using downstream processes like CDR3Clustering
Dependencies
Upstream Processes
- SampleInfo (required): Provides sample metadata with
TCRData/BCRDatacolumns - LoadingRNAFromSeurat (alternative): When loading RNA from Seurat instead of SampleInfo
Downstream Processes
- ScRepCombiningExpression: Integrates TCR/BCR data with scRNA-seq expression
- CDR3Clustering: Clones cells by CDR3 sequence similarity
- TESSA: TCR-specific analysis (epitope specificity prediction)
- CDR3AAPhyschem: Physicochemical properties of CDR3 sequences
- ClonalStats: Clonality statistics and diversity metrics
Validation Rules
Common Configuration Errors
-
Missing TCRData/BCRData column:
- Error: Process not enabled, no TCR/BCR analysis
- Fix: Add
TCRDataorBCRDatacolumn to sample info file
-
Invalid format specified:
- Error: scRepertoire fails to recognize file format
- Fix: Set
envs.formatto one of:10X,AIRR,BD,Dandelion,Immcantation,JSON,MiXCR,ParseBio,TRUST4,WAT3R,Omniscope
-
Directory path not found:
- Error: Cannot access TCR/BCR data directory
- Fix: Verify paths in
TCRData/BCRDatacolumns exist and are readable
-
Missing required files in directory:
- Error: Expected contig file not found (e.g.,
filtered_contig_annotations.csv) - Fix: Ensure directory contains appropriate file for specified format
- Error: Expected contig file not found (e.g.,
-
Both TCR and BCR specified without type selection:
- Warning: TCR selected by default
- Fix: Set
envs.type = "BCR"if BCR analysis intended
File Format Requirements
- 10x Genomics: Must have
filtered_contig_annotations.csvin directory - AIRR: Must have
airr_rearrangement.tsvin directory - BD: Must have
Contigs_AIRR.tsvin directory - Dandelion: Must have
all_contig_dandelion.tsvin directory - MiXCR: Must have
clones.tsvin directory - TRUST4: Must have
barcode_report.tsvin directory - ParseBio: Must have
barcode_report.tsvin directory - WAT3R: Must have
barcode_results.csvin directory
Chain Compatibility
- TCR chains: Supports TRA, TRB, TRG, TRD (auto-detected from data)
- BCR chains: Supports IGH, IGL, IGK (auto-detected from data)
- Paired analysis: Automatically pairs TRA+TRB or IGH+IGL/IGK when both present
- Single-chain: Keeps single-chain cells when
removeNA = false
Troubleshooting
Issue: ScRepLoading not running
Cause: No TCRData or BCRData column in sample info file
Solution:
- Add
TCRDataorBCRDatacolumn to sample info - Verify column name exactly matches (case-sensitive)
- Check that
SampleInfo.in.infileis correctly specified
Issue: "File format not recognized"
Cause: Filename doesn't match expected pattern for auto-detection Solution:
- Set
envs.formatexplicitly to your format type - Example:
format = "TRUST4"for TRUST4 output - Verify directory contains expected file for that format
Issue: "No cells loaded" or empty output
Cause: Too aggressive filtering or mismatched barcodes Solution:
- Set
removeNA = falseandremoveMulti = falsetemporarily - Check that TCR/BCR barcodes match RNA barcodes
- Verify filterMulti is appropriate for your data type
Issue: Duplicate barcode errors
Cause: Multiple samples have identical cell barcodes Solution:
- Set
ID = "Sample"or use explicit sample labels - This adds sample prefix to barcodes:
Sample1_AAACCC... - Required when merging samples from same run
Issue: BCR clustering too strict/too permissive
Cause: Default threshold (0.85) not optimal for data Solution:
- Adjust
envs.combineBCR.threshold - Higher (0.90+): More stringent, fewer clusters
- Lower (0.80-): More permissive, more sequences clustered together
Issue: Single-chain cells lost
Cause: filterNonproductive = true or removeNA = true
Solution:
- For exploratory analysis, set
removeNA = false - For developmental studies, consider
filterNonproductive = false - Use
filterMulti = trueonly when confident in data quality
Issue: Metadata columns missing from output
Cause: Excluded by default (exclude = "auto")
Solution:
- Set
exclude = []to keep all metadata columns - Or specify custom list:
exclude = ["RNAData"] - Default excludes:
RNAData,TCRData,BCRData
Issue: Cannot load from specific directory path
Cause: Path not accessible or permission issues Solution:
- Verify directory exists and is readable
- Check file permissions:
ls -la path/to/tcr/ - Use absolute paths if relative paths fail
Issue: Combining TCR and BCR data separately
Cause: Need to analyze both receptor types Solution:
- Run pipeline twice with different
typesettings - First run:
[ScRepLoading.envs] type = "TCR" - Second run:
[ScRepLoading.envs] type = "BCR" - Use different output directories to avoid conflicts
Issue: Integration with ScRepCombiningExpression fails
Cause: Barcodes don't match between RNA and VDJ data Solution:
- Ensure same samples used in both RNA and VDJ data
- Check that SampleInfo has correct paths for both data types
- Verify barcode prefixes match (if using
IDparameter)
Didn't find tool you were looking for?