Agent skill
using-dynamic-architectures
Use when building networks that grow, prune, or adapt topology during training. Routes to continual learning, gradient isolation, modular composition, and lifecycle orchestration skills.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/using-dynamic-architectures
SKILL.md
Dynamic Architectures Meta-Skill
When to Use This Skill
Invoke this meta-skill when you encounter:
- Growing Networks: Adding capacity during training (new layers, neurons, modules)
- Pruning Networks: Removing capacity that isn't contributing
- Continual Learning: Training on new tasks without forgetting old ones
- Gradient Isolation: Training new modules without destabilizing existing weights
- Modular Composition: Building networks from graftable, composable components
- Lifecycle Management: State machines controlling when to grow, train, integrate, prune
- Progressive Training: Staged capability expansion with warmup and cooldown
This is the entry point for dynamic/morphogenetic neural network patterns. It routes to 7 specialized reference sheets.
How to Access Reference Sheets
IMPORTANT: All reference sheets are located in the SAME DIRECTORY as this SKILL.md file.
When this skill is loaded from:
skills/using-dynamic-architectures/SKILL.md
Reference sheets like continual-learning-foundations.md are at:
skills/using-dynamic-architectures/continual-learning-foundations.md
NOT at:
skills/continual-learning-foundations.md (WRONG PATH)
Core Principle
Dynamic architectures grow capability, not just tune weights.
Static networks are a guess about capacity. Dynamic networks let training signal drive structure. The challenge is growing without forgetting, integrating without destabilizing, and knowing when to act.
Key tensions:
- Stability vs. Plasticity: Preserve existing knowledge while adding new capacity
- Isolation vs. Integration: Train new modules separately, then merge carefully
- Exploration vs. Exploitation: When to add capacity vs. when to stabilize
The 7 Dynamic Architecture Skills
- continual-learning-foundations - EWC, PackNet, rehearsal strategies, catastrophic forgetting theory
- gradient-isolation-techniques - Freezing, gradient masking, stop_grad patterns, alpha blending
- peft-adapter-techniques - LoRA, QLoRA, DoRA, adapter placement, merging strategies
- dynamic-architecture-patterns - Grow/prune patterns, slot-based expansion, capacity scheduling
- modular-neural-composition - MoE, gating, grafting semantics, interface contracts
- ml-lifecycle-orchestration - State machines, quality gates, transition triggers, controllers
- progressive-training-strategies - Staged expansion, warmup/cooldown, knowledge transfer
Routing Decision Framework
Step 1: Identify the Core Problem
Diagnostic Questions:
- "Are you trying to prevent forgetting when training on new data/tasks?"
- "Are you trying to add new capacity to an existing trained network?"
- "Are you designing how multiple modules combine?"
- "Are you deciding WHEN to grow, prune, or integrate?"
Quick Routing:
| Problem | Primary Skill |
|---|---|
| "Model forgets old tasks when I train new ones" | continual-learning-foundations |
| "New module destabilizes existing weights" | gradient-isolation-techniques |
| "Fine-tune LLM efficiently without full training" | peft-adapter-techniques |
| "When should I add more capacity?" | dynamic-architecture-patterns |
| "How do module outputs combine?" | modular-neural-composition |
| "How do I manage the grow/train/integrate cycle?" | ml-lifecycle-orchestration |
| "How do I warm up new modules safely?" | progressive-training-strategies |
Step 2: Catastrophic Forgetting (Continual Learning)
Symptoms:
- Performance on old tasks drops when training on new tasks
- Model "forgets" previous capabilities
- Fine-tuning overwrites learned features
Route to: continual-learning-foundations.md
Covers:
- Why SGD causes forgetting (loss landscape geometry)
- EWC, SI, MAS (regularization approaches)
- Progressive Neural Networks, PackNet (architectural approaches)
- Experience replay, generative replay (rehearsal approaches)
- Measuring forgetting (backward/forward transfer)
When to Use:
- Training sequentially on multiple tasks
- Fine-tuning without forgetting base capabilities
- Designing systems that accumulate knowledge over time
Step 3: Gradient Isolation
Symptoms:
- New module training affects host network stability
- Want to train on host errors without backprop flowing to host
- Need gradual integration of new capacity
Route to: gradient-isolation-techniques.md
Covers:
- Freezing strategies (full, partial, scheduled)
detach()vsno_grad()semantics- Dual-path training (residual learning on errors)
- Alpha blending for gradual integration
- Hook-based gradient surgery
When to Use:
- Training "seed" modules that learn from host errors
- Preventing catastrophic interference during growth
- Implementing safe module grafting
Step 4: PEFT Adapters (LoRA, QLoRA)
Symptoms:
- Want to fine-tune large pretrained models efficiently
- Memory constraints prevent full fine-tuning
- Need task-specific adaptation without modifying base weights
Route to: peft-adapter-techniques.md
Covers:
- LoRA (low-rank adaptation) fundamentals
- QLoRA (quantized base + LoRA adapters)
- DoRA (weight-decomposed adaptation)
- Adapter placement strategies
- Merging adapters into base model
- Multiple adapter management
When to Use:
- Fine-tuning LLMs on limited compute
- Creating task-specific model variants
- Memory-efficient adaptation of large models
Step 5: Dynamic Architecture Patterns
Symptoms:
- Need to add capacity during training (not just before)
- Want to prune underperforming components
- Deciding when/where to grow the network
Route to: dynamic-architecture-patterns.md
Covers:
- Growth patterns (slot-based, layer widening, depth extension)
- Pruning patterns (magnitude, gradient-based, lottery ticket)
- Trigger conditions (loss plateau, contribution metrics, budgets)
- Capacity scheduling (grow-as-needed vs overparameterize-then-prune)
When to Use:
- Building networks that expand during training
- Implementing neural architecture search lite
- Managing parameter budgets with dynamic allocation
Step 6: Modular Composition
Symptoms:
- Combining outputs from multiple modules
- Designing gating/routing mechanisms
- Need graftable, replaceable components
Route to: modular-neural-composition.md
Covers:
- Combination mechanisms (additive, multiplicative, selective)
- Mixture of Experts (sparse gating, load balancing)
- Grafting semantics (input/output attachment points)
- Interface contracts (shape matching, normalization boundaries)
- Multi-module coordination (independent, competitive, cooperative)
When to Use:
- Building modular architectures with interchangeable parts
- Implementing MoE or gated architectures
- Designing residual streams as module communication
Step 7: Lifecycle Orchestration
Symptoms:
- Need to decide WHEN to grow, train, integrate, prune
- Building state machines for module lifecycle
- Want quality gates before integration decisions
Route to: ml-lifecycle-orchestration.md
Covers:
- State machine fundamentals (states, transitions, terminals)
- Gate design patterns (structural, performance, stability, contribution)
- Transition triggers (metric-based, time-based, budget-based)
- Rollback and recovery (cooldown, hysteresis)
- Controller patterns (heuristic, learned/RL, hybrid)
When to Use:
- Designing grow/train/integrate/prune workflows
- Implementing quality gates for safe integration
- Building RL-controlled architecture decisions
Step 8: Progressive Training
Symptoms:
- New modules cause instability when integrated
- Need warmup/cooldown for safe capacity addition
- Planning multi-stage training schedules
Route to: progressive-training-strategies.md
Covers:
- Staged capacity expansion strategies
- Warmup patterns (zero-init, LR warmup, alpha ramp)
- Cooldown and stabilization (settling periods, consolidation)
- Multi-stage schedules (sequential, overlapping, budget-aware)
- Knowledge transfer between stages (inheritance, distillation)
When to Use:
- Ramping new modules safely into production
- Designing curriculum over architecture (not just data)
- Preventing stage transition shock
Common Multi-Skill Scenarios
Scenario: Building a Morphogenetic System
Need: Network that grows seeds, trains them in isolation, and grafts successful ones
Routing sequence:
- dynamic-architecture-patterns - Slot-based expansion, where seeds attach
- gradient-isolation-techniques - Train seeds on host errors without destabilizing host
- modular-neural-composition - How seed outputs blend into host stream
- ml-lifecycle-orchestration - State machine for seed lifecycle
- progressive-training-strategies - Warmup/cooldown for grafting
Scenario: Continual Learning Without Forgetting
Need: Train on sequence of tasks without catastrophic forgetting
Routing sequence:
- continual-learning-foundations - Understand forgetting, choose approach
- gradient-isolation-techniques - If using architectural approach (columns, modules)
- progressive-training-strategies - Staged training across tasks
Scenario: Neural Architecture Search (Lite)
Need: Grow/prune network based on training signal
Routing sequence:
- dynamic-architecture-patterns - Growth/pruning triggers and patterns
- ml-lifecycle-orchestration - Automation via heuristics or RL
- progressive-training-strategies - Stabilization between changes
Scenario: RL-Controlled Architecture
Need: RL agent deciding when to grow, prune, integrate
Routing sequence:
- ml-lifecycle-orchestration - Learned controller patterns
- dynamic-architecture-patterns - What actions the RL agent can take
- gradient-isolation-techniques - Safe exploration during training
Rationalization Resistance Table
| Rationalization | Reality | Counter-Guidance |
|---|---|---|
| "Just train a bigger model from scratch" | Transfer + growth often beats from-scratch | "Check continual-learning-foundations for why" |
| "I'll freeze everything except the new layer" | Full freeze may be too restrictive | "Check gradient-isolation-techniques for partial strategies" |
| "I'll add capacity whenever loss plateaus" | Need more than loss plateau (contribution check) | "Check ml-lifecycle-orchestration for proper gates" |
| "Modules can just sum their outputs" | Naive summation can cause interference | "Check modular-neural-composition for combination mechanisms" |
| "I'll integrate immediately when training finishes" | Need warmup/holding period | "Check progressive-training-strategies for safe integration" |
| "EWC solves all forgetting problems" | EWC has limitations, may need architectural approach | "Check continual-learning-foundations for trade-offs" |
Red Flags Checklist
Watch for these signs of incorrect approach:
- No Isolation: Training new modules without gradient isolation from host
- No Warmup: Integrating new capacity at full amplitude immediately
- No Gates: Integrating based only on time, not performance metrics
- Naive Combination: Summing module outputs without gating or blending
- Ignoring Forgetting: Adding new tasks without measuring old task performance
- No Rollback: No plan for what happens if integration fails
Relationship to Other Packs
| Request | Primary Pack | Why |
|---|---|---|
| "Implement PPO for architecture decisions" | yzmir-deep-rl | RL algorithm implementation |
| "Evaluate architecture changes without mutation" | yzmir-deep-rl/counterfactual-reasoning | Counterfactual simulation |
| "Debug PyTorch gradient flow" | yzmir-pytorch-engineering | Low-level PyTorch debugging |
| "Optimize training loop performance" | yzmir-training-optimization | General training optimization |
| "Design transformer architecture" | yzmir-neural-architectures | Static architecture design |
| "Deploy morphogenetic model" | yzmir-ml-production | Production deployment |
Intersection with deep-rl: If using RL to control architecture decisions (when to grow/prune), combine this pack's lifecycle orchestration with deep-rl's policy gradient or actor-critic methods.
Counterfactual evaluation: Before committing to a live mutation (grow/prune), use deep-rl's counterfactual-reasoning.md to simulate the change and evaluate outcomes without risk. This is critical for production morphogenetic systems.
Diagnostic Question Templates
Use these to route users:
Problem Classification
- "Are you training on multiple tasks sequentially, or growing a single-task network?"
- "Do you have an existing trained model you want to extend, or starting fresh?"
- "Is the issue forgetting (old performance drops) or instability (training explodes)?"
Architectural Questions
- "Where do new modules attach to the existing network?"
- "How should new module outputs combine with existing outputs?"
- "What triggers growth? Loss plateau, manual, or learned?"
Lifecycle Questions
- "What states can a module be in? (training, integrating, permanent, removed)"
- "What conditions must be met before integration?"
- "What happens if a module fails to improve performance?"
Summary: Routing Decision Tree
START: Dynamic architecture problem
├─ Forgetting old tasks?
│ └─ → continual-learning-foundations
├─ New module destabilizes existing?
│ └─ → gradient-isolation-techniques
├─ Fine-tuning LLM efficiently?
│ └─ → peft-adapter-techniques
├─ When/where to add capacity?
│ └─ → dynamic-architecture-patterns
├─ How modules combine?
│ └─ → modular-neural-composition
├─ Managing grow/train/integrate cycle?
│ └─ → ml-lifecycle-orchestration
├─ Warmup/cooldown for new capacity?
│ └─ → progressive-training-strategies
└─ Building complete morphogenetic system?
└─ → Start with dynamic-architecture-patterns
→ Then gradient-isolation-techniques
→ Then ml-lifecycle-orchestration
Reference Sheets
After routing, load the appropriate reference sheet:
- continual-learning-foundations.md - EWC, PackNet, rehearsal, forgetting theory
- gradient-isolation-techniques.md - Freezing, detach, alpha blending, hook surgery
- peft-adapter-techniques.md - LoRA, QLoRA, DoRA, adapter merging
- dynamic-architecture-patterns.md - Grow/prune patterns, triggers, scheduling
- modular-neural-composition.md - MoE, gating, grafting, interface contracts
- ml-lifecycle-orchestration.md - State machines, gates, controllers
- progressive-training-strategies.md - Staged expansion, warmup/cooldown
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?