Agent skill
seuratsubclustering
Performs fine-grained re-clustering on specific subsets of cells (e.g., individual clusters, cell types, or custom subsets). Unlike `Seurat::FindSubCluster` which only finds subclusters within a single cluster, this process performs the complete clustering workflow (PCA, UMAP, FindNeighbors, FindClusters) on any subset of cells defined by metadata filters or cell barcode lists.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/seuratsubclustering
SKILL.md
SeuratSubClustering Process Configuration
Purpose
Performs fine-grained re-clustering on specific subsets of cells (e.g., individual clusters, cell types, or custom subsets). Unlike Seurat::FindSubCluster which only finds subclusters within a single cluster, this process performs the complete clustering workflow (PCA, UMAP, FindNeighbors, FindClusters) on any subset of cells defined by metadata filters or cell barcode lists.
When to Use
- Cluster heterogeneity analysis: When initial clustering identifies mixed cell populations within a cluster
- Cell type sub-clustering: To resolve heterogeneity within annotated cell types (e.g., T cell subsets: CD4+, CD8+, naive, memory, effector)
- Lineage-specific analysis: To examine substructure within major cell lineages
- Differential sub-populations: When a cluster contains multiple biologically distinct populations (e.g., NK cells + CD4 T cells)
- Multi-resolution exploration: To test different clustering granularities on specific cell subsets
- Downstream marker discovery: When you need markers for sub-populations within larger clusters
Configuration Structure
Process Enablement
[SeuratSubClustering]
cache = true # Cache intermediate results for faster re-runs
Input Specification
[SeuratSubClustering.in]
srtobj = ["SeuratClustering"] # Path or reference to Seurat object
Environment Variables
Core Parameters
[SeuratSubClustering.envs]
# Number of cores for parallelization
ncores = 1 # int; Higher values speed up computation
# Metadata mutaters to define subset cells
# Applied BEFORE subsetting to create temporary columns
mutaters = {} # json; Dictionary of dplyr-like mutations
# Expression to subset cells (dplyr::filter syntax)
# Applied to metadata using tidyseurat::filter()
subset = "seurat_clusters == 'c3'" # str; Filter expression
# Cache location for intermediate results
cache = "/tmp" # Path; Set to false to disable caching
Sub-clustering Cases (Multiple Subsets)
[SeuratSubClustering.envs.cases]
# Keys are case names (prefixes for outputs)
# Values inherit envs parameters (except mutaters, cache)
# If empty, default case "subcluster" is created
Case Naming Rules:
- Case name becomes prefix for reductions:
<CASENAME>PC_,<CASENAME>UMAP_ - Case name becomes prefix for cluster columns:
<CASENAME>.<resolution> - Case name becomes final cluster column:
<CASENAME> - Non-alphanumeric characters in case names are removed
Metadata Output:
- Each case adds new metadata columns to original Seurat object
- Reductions saved:
<CASENAME>.pc,<CASENAME>.umap - Clusters saved:
<CASENAME>.<resolution>for each resolution - Final clusters:
<CASENAME>column
RunPCA Parameters
[SeuratSubClustering.envs.RunPCA]
# See https://satijalab.org/seurat/reference/runpca
# object specified internally as subset object
npcs = 30 # int; Number of PCs to compute
RunUMAP Parameters
[SeuratSubClustering.envs.RunUMAP]
# See https://satijalab.org/seurat/reference/runumap
# object specified internally as subset object
# dims=N expanded to dims=1:N (min(N, ncol-1))
dims = 30 # int; Number of PCs to use
# Use specific features instead of dimensions
# Can be list: {order = "desc(abs(avg_log2FC))", n = 30}
# Or numeric (treated as n with default order)
features = 30 # int or list; Top markers for UMAP
# Reduction to use for UMAP
reduction = "pca" # str; Uses sobj@misc$integrated_new_reduction if omitted
n.neighbors = 30 # int; Neighborhood size
min.dist = 0.3 # float; Cluster tightness (0.001-0.5)
spread = 1 # float; Embedding scale
seed.use = 42 # int; Random seed
FindNeighbors Parameters
[SeuratSubClustering.envs.FindNeighbors]
# See https://satijalab.org/seurat/reference/findneighbors
# object specified internally
reduction = "pca" # str; Uses sobj@misc$integrated_new_reduction if omitted
dims = 30 # int; Dimensions to use
k.param = 20 # int; K-nearest neighbors
prune.SNN = 0.067 # float; SNN pruning threshold (default: 1/15)
nn.method = "annoy" # str; "annoy" or "rann"
FindClusters Parameters
[SeuratSubClustering.envs.FindClusters]
# See https://satijalab.org/seurat/reference/findclusters
# object specified internally
# Resolution: Higher = more clusters, Lower = fewer clusters
# Multiple resolutions supported: [0.4, 0.6, 0.8, 1.0]
# Range syntax: "0.1:0.5:0.1" -> [0.1, 0.2, 0.3, 0.4, 0.5]
resolution = 0.8 # float or list; Default: 0.8
# Cluster labels prefixed with "s" (s1, s2, ...) instead of (s0, s1, ...)
algorithm = 1 # int; 1=Louvain, 4=Leiden (recommended)
graph.name = "pca_snn" # str; Must match FindNeighbors SNN graph
random.seed = 0 # int; Reproducibility
Multi-resolution Output:
- Multiple resolutions create columns:
<CASENAME>_0.4,<CASENAME>_0.6,<CASENAME>_0.8,<CASENAME> - Final resolution uses last value in list
External References
Seurat Functions
- RunPCA(): https://satijalab.org/seurat/reference/runpca
- Principal component analysis on subset of cells
- RunUMAP(): https://satijalab.org/seurat/reference/runumap
- Non-linear dimensionality reduction for visualization
- FindNeighbors(): https://satijalab.org/seurat/reference/findneighbors
- K-nearest neighbor graph construction
- FindClusters(): https://satijalab.org/seurat/reference/findclusters
- Community detection (Louvain/Leiden algorithms)
tidyseurat::filter()
https://stemangiola.github.io/tidyseurat/reference/filter.html
- Subset Seurat objects using dplyr-like filter syntax
- Supports logical expressions:
seurat_clusters == 'c3',celltype %in% c('CD4', 'CD8') - Can use any metadata column created by
mutaters
Configuration Examples
Minimal Configuration (Default Case)
[SeuratSubClustering]
[SeuratSubClustering.in]
srtobj = ["SeuratClustering"]
Result: Creates default case "subcluster" with all cells
Single Cluster Sub-clustering
[SeuratSubClustering]
[SeuratSubClustering.envs]
subset = "seurat_clusters == 'c3'"
Result: Re-clusters only cells in cluster c3
Metadata-Based Sub-clustering (Cell Type)
[SeuratSubClustering]
[SeuratSubClustering.envs]
# First add cell type annotation via mutaters
mutaters = {is_cd4 = "if_else(celltype == 'CD4 T cell', TRUE, FALSE)"}
[SeuratSubClustering.envs.RunPCA]
npcs = 50
[SeuratSubClustering.envs.FindClusters]
resolution = 1.2
algorithm = 4 # Leiden
Result: Creates subcluster case for CD4+ cells only
Multiple Sub-clustering Cases
[SeuratSubClustering]
[SeuratSubClustering.envs]
# Define multiple sub-clustering cases
[SeuratSubClustering.envs.cases.TEffector]
subset = "celltype == 'CD8 T cell' & state == 'Effector'"
resolution = 1.0
[SeuratSubClustering.envs.cases.TNaive]
subset = "celltype == 'CD8 T cell' & state == 'Naive'"
resolution = 0.8
[SeuratSubClustering.envs.cases.CD4Memory]
subset = "celltype == 'CD4 T cell' & state == 'Memory'"
resolution = 1.5
Result: Three sub-clustering analyses with different resolutions
- Metadata columns:
TEffector,TNaive,CD4Memory - Reductions:
TEFFECTORPC_,TNAIVEPC_,CD4MEMORYPC_, etc. - Clusters:
TEffector,TNaive,CD4Memory
Multi-resolution Sub-clustering
[SeuratSubClustering]
[SeuratSubClustering.envs.cases.Cluster3]
subset = "seurat_clusters == 'c3'"
[SeuratSubClustering.envs.cases.Cluster3.FindClusters]
# Test multiple resolutions
resolution = "0.4:1.2:0.2" # [0.4, 0.6, 0.8, 1.0, 1.2]
algorithm = 4 # Leiden
Result: Cluster3 has columns Cluster3_0.4, Cluster3_0.6, Cluster3_0.8, Cluster3_1.0, Cluster3
Using Top Markers for UMAP
[SeuratSubClustering]
[SeuratSubClustering.envs.cases.MixedCluster]
subset = "seurat_clusters == 'c5'"
[SeuratSubClustering.envs.cases.MixedCluster.RunUMAP]
# Use top 30 DEGs for UMAP instead of PCs
features = {order = "desc(abs(avg_log2FC))", n = 30}
Result: Sub-cluster based on top DEGs for better separation
Leiden Algorithm with Custom Parameters
[SeuratSubClustering]
[SeuratSubClustering.envs]
ncores = 4
[SeuratSubClustering.envs.FindNeighbors]
k.param = 30
prune.SNN = 0.05
[SeuratSubClustering.envs.FindClusters]
algorithm = 4 # Leiden
resolution = 1.0
random.seed = 42
Complex Subset with Multiple Conditions
[SeuratSubClustering]
[SeuratSubClustering.envs.cases.ActivatedT]
subset = "celltype %in% c('CD4 T cell', 'CD8 T cell') & activation == 'Activated'"
[SeuratSubClustering.envs.cases.ActivatedT.RunPCA]
npcs = 40
[SeuratSubClustering.envs.cases.ActivatedT.RunUMAP]
dims = 40
n.neighbors = 20
min.dist = 0.2
Common Patterns
Pattern 1: Single Cluster Deep Dive
[SeuratSubClustering]
[SeuratSubClustering.envs]
# Re-cluster cluster 3 to resolve heterogeneity
subset = "seurat_clusters == 'c3'"
Pattern 2: Multiple Lineage Sub-clustering
[SeuratSubClustering]
[SeuratSubClustering.envs.cases.TCD4]
subset = "celltype == 'CD4 T cell'"
[SeuratSubClustering.envs.cases.TCD8]
subset = "celltype == 'CD8 T cell'"
[SeuratSubClustering.envs.cases.TGD]
subset = "celltype == 'Gamma delta T cell'"
[SeuratSubClustering.envs.cases.NK]
subset = "celltype == 'NK cell'"
Pattern 3: Functional State Sub-clustering
[SeuratSubClustering]
[SeuratSubClustering.envs.cases.Effector]
subset = "state == 'Effector'"
[SeuratSubClustering.envs.cases.Effector.FindClusters]
resolution = 1.5 # Higher resolution for more sub-states
[SeuratSubClustering.envs.cases.Memory]
subset = "state == 'Memory'"
[SeuratSubClustering.envs.cases.Naive]
subset = "state == 'Naive'"
Pattern 4: Re-clustering Based on Clonality (TCR+)
[SeuratSubClustering]
[SeuratSubClustering.envs]
# After ScRepCombiningExpression adds clonality metadata
[SeuratSubClustering.envs.cases.ExpandedClones]
subset = "clone_size >= 5" # Large clones
[SeuratSubClustering.envs.cases.ExpandedClones.FindClusters]
resolution = 0.6 # Lower resolution for broader groups
[SeuratSubClustering.envs.cases.RareClones]
subset = "clone_size == 1" # Unique clones
[SeuratSubClustering.envs.cases.RareClones.FindClusters]
resolution = 1.2 # Higher resolution to capture diversity
Pattern 5: Multi-resolution Exploration
[SeuratSubClustering]
[SeuratSubClustering.envs.cases.TumorCluster]
subset = "seurat_clusters == 'c8'"
[SeuratSubClustering.envs.cases.TumorCluster.FindClusters]
resolution = "0.2:2.0:0.2" # Sweep: [0.2, 0.4, ..., 2.0]
algorithm = 4 # Leiden
Dependencies
Upstream Processes
- Required:
SeuratClustering(orSeuratClusteringOfAllCellsif TOrBCellSelection used) - Optional:
ScRepCombiningExpression(if TCR/BCR data present, adds clonality metadata for subsetting) - Optional:
CellTypeAnnotation(if using annotated cell types for subsetting)
Downstream Processes
- SeuratClusterStats: Statistics for sub-clusters
- ClusterMarkers: Differential expression between sub-clusters
- MarkersFinder: Flexible marker finding with enrichment analysis
- ScFGSEA: Pathway analysis on sub-cluster markers
- ModuleScoreCalculator: Module scoring within sub-clusters
Validation Rules
Subset Expression Validation
- Must be valid dplyr::filter() expression
- Can reference any metadata column in Seurat object
- Complex expressions supported:
&(AND),|(OR),%in%(in operator) - Example:
seurat_clusters == 'c3' & percent.mt < 5 - Example:
celltype %in% c('CD4 T cell', 'CD8 T cell')
Case Name Validation
- Must contain only alphanumeric characters
- Non-alphanumeric characters automatically removed
- Used as prefix: reductions and cluster names
- Avoid spaces, special characters in case names
Resolution Constraints
- Must be positive (resolution > 0)
- Single value, list, or range syntax allowed
- Range:
"start:end:step"(step defaults to 0.1 if omitted) - Multi-resolution creates multiple metadata columns
Dimension Requirements
RunPCA.npcsmust not exceed cells in subsetRunUMAP.dimsautomatically truncated tomin(dims, ncol(reduction) - 1)- Use fewer dimensions for small subsets (< 100 cells)
Graph Name Consistency
FindClusters.graph.namemust matchFindNeighborsoutput- Default:
pca_snnwhen not specified - When using integrated reductions, ensure consistency
Troubleshooting
Issue: Subset Returns Zero Cells
Symptoms: Sub-clustering produces empty subset
Solutions:
- Verify subset expression syntax
# Check if column exists and values are correct
[SeuratSubClustering.envs]
# Use single quotes for string comparison
subset = "seurat_clusters == 'c3'" # Correct
subset = "seurat_clusters == c3" # Wrong (treated as variable)
- Verify column names exist in metadata
# Use existing columns only
subset = "seurat_clusters == 'c3'" # seurat_clusters exists
subset = "cluster_id == 'c3'" # cluster_id may not exist
- Check for exact string matching
# Case-sensitive
subset = "celltype == 'CD4 T cell'" # Exact match
subset = "celltype == 'CD4 T Cell'" # Wrong case
Issue: Too Many Small Sub-clusters
Symptoms: Hundreds of tiny sub-clusters, many singletons
Solutions:
[SeuratSubClustering.envs.FindClusters]
resolution = 0.4 # Lower resolution
algorithm = 4 # Leiden handles singletons better
Issue: Sub-clusters Overlapping in UMAP
Symptoms: Poor separation in sub-cluster visualization
Solutions:
[SeuratSubClustering.envs.RunUMAP]
min.dist = 0.1 # Tighter clusters
n.neighbors = 15 # More local detail
spread = 1.2 # More separation
Issue: Sub-clustering Uses Wrong Reduction
Symptoms: Clustering on raw RNA instead of integrated data
Solutions:
[SeuratSubClustering.envs.FindNeighbors]
reduction = "integrated.cca" # Use integrated reduction
[SeuratSubClustering.envs.RunUMAP]
reduction = "integrated.cca"
Issue: Multi-resolution Columns Not Created
Symptoms: Only final resolution column appears
Solutions:
[SeuratSubClustering.envs.FindClusters]
# Use list syntax (not single value with range)
resolution = [0.4, 0.6, 0.8, 1.0] # Correct
resolution = "0.4:1.0:0.2" # Also correct
Issue: Case Names Too Similar
Symptoms: Confusion between multiple cases
Solutions:
# Use descriptive, unique case names
[SeuratSubClustering.envs.cases]
T_CD4_Effector = {subset = "..."}
T_CD4_Naive = {subset = "..."}
B_Memory = {subset = "..."}
Issue: Sub-clustering on All Cells (Not Subset)
Symptoms: Default case runs on entire object
Solutions:
# Always specify subset or use cases
[SeuratSubClustering.envs]
subset = "seurat_clusters == 'c3'" # Explicit subset
# Or define specific cases
[SeuratSubClustering.envs.cases.MyCase]
subset = "seurat_clusters == 'c3'"
Issue: Reductions Not Saved
Symptoms: Cannot find <CASENAME>PC_ or <CASENAME>UMAP_
Solutions:
# Ensure case name is alphanumeric only
[SeuratSubClustering.envs.cases]
MySubCluster1 = {subset = "..."} # Correct
Sub-Cluster = {subset = "..."} # Hyphen removed -> SubCluster
# Check metadata for actual reduction names
# Reductions are: <CASENAME>pc, <CASENAME>umap (lowercase)
Best Practices
- Define explicit subsets: Always specify
subsetor definecasesto avoid default case on all cells - Use descriptive case names: Make case names clear and unique (e.g.,
T_Effector, notcase1) - Test multiple resolutions: Sweep resolution range to find optimal granularity for each subset
- Use Leiden algorithm: Prefer
algorithm = 4for better community detection - Leverage metadata columns: Use CellTypeAnnotation results, TCR clonality, or custom mutaters for subsetting
- Set random seeds: Ensure reproducible sub-clustering results with
random.seed - Parallelize large subsets: Use
ncores > 1for subsets > 10k cells - Adjust UMAP parameters: Smaller subsets may need different
n.neighborsandmin.dist - Document sub-clustering strategy: Comment on biological rationale for each case in config
- Use multi-resolution: Test
[0.4, 0.6, 0.8, 1.0]to capture different granularities
Related Processes
- SeuratClustering: Initial clustering before sub-clustering
- SeuratClusteringOfAllCells: Clustering before T/B cell selection
- CellTypeAnnotation: Annotate clusters before sub-clustering by cell type
- ClusterMarkers: Find markers for sub-clusters
- MarkersFinder: Flexible marker finding with multiple comparison groups
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?