Agent skill
boltzgen
All-atom protein design using BoltzGen diffusion model. Use this skill when: (1) Need side-chain aware design from the start, (2) Designing around small molecules or ligands, (3) Want all-atom diffusion (not just backbone), (4) Require precise binding geometries, (5) Using YAML-based configuration. For backbone-only generation, use rfdiffusion. For sequence-only design, use proteinmpnn. For structure validation, use boltz.
Install this agent skill to your Project
npx add-skill https://github.com/adaptyvbio/protein-design-skills/tree/main/skills/boltzgen
SKILL.md
BoltzGen All-Atom Design
Prerequisites
| Requirement | Minimum | Recommended |
|---|---|---|
| Python | 3.10+ | 3.11 |
| CUDA | 12.0+ | 12.1+ |
| GPU VRAM | 24GB | 48GB (L40S) |
| RAM | 32GB | 64GB |
How to run
First time? See Installation Guide to set up Modal and biomodals.
Option 1: Modal (recommended)
# Clone biomodals
git clone https://github.com/hgbrian/biomodals && cd biomodals
# Run BoltzGen (requires YAML config file)
modal run modal_boltzgen.py \
--input-yaml binder_config.yaml \
--protocol protein-anything \
--num-designs 50
# With custom GPU
GPU=L40S modal run modal_boltzgen.py \
--input-yaml binder_config.yaml \
--protocol protein-anything \
--num-designs 100
GPU: L40S (48GB) recommended | Timeout: 120min default
Available protocols: protein-anything, peptide-anything, protein-small_molecule, nanobody-anything, antibody-anything
Option 2: Local installation
git clone https://github.com/HannesStark/boltzgen.git
cd boltzgen
pip install -e .
python sample.py config=config.yaml
Option 3: Python API
from boltzgen import BoltzGen
model = BoltzGen.load_pretrained()
designs = model.sample(
target_pdb="target.pdb",
num_samples=50,
binder_length=80
)
GPU: L40S (48GB) | Time: ~30-60s per design
Key parameters (CLI)
| Parameter | Default | Description |
|---|---|---|
--input-yaml |
required | Path to YAML design specification |
--protocol |
protein-anything |
Design protocol |
--num-designs |
10 | Number of designs to generate |
--steps |
all | Pipeline steps to run (e.g., design inverse_folding) |
YAML configuration
BoltzGen uses an entity-based YAML format where you specify designed proteins and target structures as entities.
Important notes:
- Residue indices use
label_seq_id(1-indexed), not author residue numbers - File paths are relative to the YAML file location
- Target files should be in CIF format (PDB also works but CIF preferred)
- Run
boltzgen check config.yamlto verify your specification before running
Basic Binder Config
entities:
# Designed protein (variable length 80-140 residues)
- protein:
id: B
sequence: 80..140
# Target from structure file
- file:
path: target.cif
include:
- chain:
id: A
# Specify binding site residues (optional but recommended)
binding_types:
- chain:
id: A
binding: 45,67,89
Binder with Specific Binding Site
entities:
- protein:
id: G
sequence: 60..100
- file:
path: 5cqg.cif
include:
- chain:
id: A
binding_types:
- chain:
id: A
binding: 343,344,251
structure_groups: "all"
Peptide Design (Cyclic)
entities:
- protein:
id: S
sequence: 10..14C6C3 # With cysteines for disulfide
- file:
path: target.cif
include:
- chain:
id: A
constraints:
- bond:
atom1: [S, 11, SG]
atom2: [S, 18, SG] # Disulfide bond
Design protocols
| Protocol | Use Case |
|---|---|
protein-anything |
Design proteins to bind proteins or peptides |
peptide-anything |
Design cyclic peptides to bind proteins |
protein-small_molecule |
Design proteins to bind small molecules |
nanobody-anything |
Design nanobody CDRs |
antibody-anything |
Design antibody CDRs |
Output format
output/
├── sample_0/
│ ├── design.cif # All-atom structure (CIF format)
│ ├── metrics.json # Confidence scores
│ └── sequence.fasta # Sequence
├── sample_1/
│ └── ...
└── summary.csv
Note: BoltzGen outputs CIF format. Convert to PDB if needed:
from Bio.PDB import MMCIFParser, PDBIO
parser = MMCIFParser()
structure = parser.get_structure("design", "design.cif")
io = PDBIO()
io.set_structure(structure)
io.save("design.pdb")
Sample output
Successful run
$ modal run modal_boltzgen.py --input-yaml binder.yaml --protocol protein-anything --num-designs 10
Running: boltzgen run binder.yaml --output /tmp/out --protocol protein-anything --num_designs 10
[INFO] Loading BoltzGen model...
[INFO] Generating designs...
[INFO] Running inverse folding...
[INFO] Running structure prediction...
[INFO] Filtering and ranking...
[INFO] Pipeline complete
Results saved to: ./out/boltzgen/2501161234/
Output directory structure:
out/boltzgen/2501161234/
├── intermediate_designs/ # Raw diffusion outputs
│ ├── design_0.cif
│ └── design_0.npz
├── intermediate_designs_inverse_folded/
│ ├── refold_cif/ # Refolded complexes
│ └── aggregate_metrics_analyze.csv
└── final_ranked_designs/
├── final_10_designs/ # Top designs
└── results_overview.pdf # Summary plots
What good output looks like:
- Refolding RMSD < 2.0A (design folds as predicted)
- ipTM > 0.5 (confident interface)
- All designs complete pipeline without errors
Decision tree
Should I use BoltzGen?
│
├─ What type of design?
│ ├─ All-atom precision needed → BoltzGen ✓
│ ├─ Ligand binding pocket → BoltzGen ✓
│ └─ Standard miniprotein → RFdiffusion (faster)
│
├─ What matters most?
│ ├─ Side-chain packing → BoltzGen ✓
│ ├─ Speed / diversity → RFdiffusion
│ ├─ Highest success rate → BindCraft
│ └─ AF2 optimization → ColabDesign
│
└─ Compute resources?
├─ Have L40S/A100 (48GB+) → BoltzGen ✓
└─ Only A10G (24GB) → Consider RFdiffusion
Typical performance
| Campaign Size | Time (L40S) | Cost (Modal) | Notes |
|---|---|---|---|
| 50 designs | 30-45 min | ~$8 | Quick exploration |
| 100 designs | 1-1.5h | ~$15 | Standard campaign |
| 500 designs | 5-8h | ~$70 | Large campaign |
Per-design: ~30-60s for typical binder.
Verify
find output -name "*.cif" | wc -l # Should match num_samples
Troubleshooting
Verify config first: Always run boltzgen check config.yaml before running the full pipeline
Slow generation: Use fewer designs for initial testing, then scale up
OOM errors: Use A100-80GB or reduce --num-designs
Wrong binding site: Residue indices use label_seq_id (1-indexed), check in Molstar viewer
Error interpretation
| Error | Cause | Fix |
|---|---|---|
RuntimeError: CUDA out of memory |
Large design or long protein | Use A100-80GB or reduce designs |
FileNotFoundError: *.cif |
Target file not found | File paths are relative to YAML location |
ValueError: invalid chain |
Chain not in target | Verify chain IDs with Molstar or PyMOL |
modal: command not found |
Modal CLI not installed | Run pip install modal && modal setup |
Next: Validate with boltz or chai → protein-qc for filtering.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
proteinmpnn
Design protein sequences using ProteinMPNN inverse folding. Use this skill when: (1) Designing sequences for RFdiffusion backbones, (2) Redesigning existing protein sequences, (3) Fixing specific residues while designing others, (4) Optimizing sequences for expression or stability, (5) Multi-state or negative design. For backbone generation, use rfdiffusion or bindcraft. For ligand-aware design, use ligandmpnn. For solubility optimization, use solublempnn.
campaign-manager
Goal-oriented binder design campaign planning and health assessment. Use this skill when: (1) Planning a complete binder design campaign, (2) Converting high-level goals into runnable pipelines, (3) Assessing campaign health and pass rates, (4) Diagnosing why designs are failing QC, (5) Estimating time, cost, and expected yields, (6) Selecting between design tools for a specific target. This skill orchestrates the other protein design tools. For individual tool parameters, use the specific tool skills.
esm
ESM2 protein language model for embeddings and sequence scoring. Use this skill when: (1) Computing pseudo-log-likelihood (PLL) scores, (2) Getting protein embeddings for clustering, (3) Filtering designs by sequence plausibility, (4) Zero-shot variant effect prediction, (5) Analyzing sequence-function relationships. For structure prediction, use chai or boltz. For QC thresholds, use protein-qc.
binding-characterization
Guidance for SPR and BLI binding characterization experiments. Use when: (1) Planning binding kinetics experiments, (2) Troubleshooting poor/no binding signal, (3) Interpreting kinetic data artifacts, (4) Choosing between SPR vs BLI platforms.
cell-free-expression
Guidance for cell-free protein synthesis (CFPS) optimization. Use when: (1) Planning CFPS experiments, (2) Troubleshooting low yield or aggregation, (3) Optimizing DNA template design for CFPS, (4) Expressing difficult proteins (disulfide-rich, toxic, membrane).
ligandmpnn
Ligand-aware protein sequence design using LigandMPNN. Use this skill when: (1) Designing sequences around small molecules, (2) Enzyme active site design, (3) Ligand binding pocket optimization, (4) Metal coordination site design, (5) Cofactor binding proteins. For standard protein design, use proteinmpnn. For solubility optimization, use solublempnn.
Didn't find tool you were looking for?