Agent skill

solublempnn

Solubility-optimized protein sequence design using SolubleMPNN. Use this skill when: (1) Designing for E. coli expression, (2) Optimizing solubility of designed proteins, (3) Reducing aggregation propensity, (4) Need high-yield expression, (5) Avoiding inclusion body formation. For standard design, use proteinmpnn. For ligand-aware design, use ligandmpnn.

Stars 125
Forks 14

Install this agent skill to your Project

npx add-skill https://github.com/adaptyvbio/protein-design-skills/tree/main/skills/solublempnn

SKILL.md

SolubleMPNN Solubility-Optimized Design

Prerequisites

Requirement Minimum Recommended
Python 3.8+ 3.10
CUDA 11.0+ 11.7+
GPU VRAM 8GB 16GB (T4)
RAM 8GB 16GB

How to run

First time? See Installation Guide to set up Modal and biomodals.

Option 1: Modal (recommended)

SolubleMPNN uses the ProteinMPNN Modal wrapper with soluble model:

bash
cd biomodals
modal run modal_proteinmpnn.py \
  --pdb-path backbone.pdb \
  --num-seq-per-target 16 \
  --sampling-temp 0.1 \
  --model-name v_48_020

GPU: T4 (16GB) | Timeout: 600s default

Option 2: Local installation

bash
git clone https://github.com/dauparas/ProteinMPNN.git
cd ProteinMPNN

# Use soluble model weights
python protein_mpnn_run.py \
  --pdb_path backbone.pdb \
  --out_folder output/ \
  --num_seq_per_target 16 \
  --sampling_temp "0.1" \
  --model_name "v_48_020"  # Soluble model

Key parameters

Parameter Default Range Description
--pdb_path required path Input structure
--num_seq_per_target 1 1-1000 Sequences per structure
--sampling_temp "0.1" "0.0001-1.0" Temperature (string!)
--model_name v_48_020 string Soluble model variant

Model Variants

Model Description Use Case
v_48_002 Standard General design
v_48_020 Soluble-trained E. coli expression
v_48_030 High solubility Difficult targets

Output format

output/
├── seqs/backbone.fa
└── backbone_pdb/backbone_0001.pdb

Sample output

Successful run

$ python protein_mpnn_run.py --pdb_path backbone.pdb --model_name v_48_020 --num_seq_per_target 8
Loading soluble model weights (v_48_020)...
Designing sequences for backbone.pdb
Generated 8 sequences in 2.1 seconds

output/seqs/backbone.fa:
>backbone_0001, score=1.31, global_score=1.24, seq_recovery=0.78
MKTAYIAKQRQISFVKSHFSRQLE...
>backbone_0002, score=1.28, global_score=1.21, seq_recovery=0.81
MKTAYIAKQRQISFVKSQFSRQLD...

What good output looks like:

  • Score: 1.0-2.0 (lower = more confident)
  • Reduced hydrophobic patches compared to standard MPNN
  • Improved charge distribution

Decision tree

Should I use SolubleMPNN?
│
├─ What expression system?
│  ├─ E. coli → SolubleMPNN ✓
│  ├─ Mammalian → ProteinMPNN (PTMs matter more)
│  └─ Yeast → Either
│
├─ History of expression problems?
│  ├─ Yes, aggregation → SolubleMPNN ✓
│  ├─ Yes, low yield → SolubleMPNN ✓
│  └─ No → ProteinMPNN is fine
│
├─ What's in the binding site?
│  ├─ Small molecule / ligand → Use LigandMPNN
│  └─ Nothing / protein only → SolubleMPNN ✓
│
└─ Need highest solubility?
   ├─ Yes → Use v_48_030 model
   └─ Standard → Use v_48_020 model

Typical performance

Campaign Size Time (T4) Cost (Modal) Notes
100 backbones × 8 seq 15-20 min ~$2 Standard
500 backbones × 8 seq 1-1.5h ~$8 Large campaign

Expected improvement: +15-30% solubility score vs standard ProteinMPNN.


Verify

bash
grep -c "^>" output/seqs/*.fa  # Should match backbone_count × num_seq_per_target

Troubleshooting

Still insoluble: Try v_48_030 (higher solubility bias) Low diversity: Increase temperature to 0.2 Poor folding: Use standard ProteinMPNN and optimize later

Error interpretation

Error Cause Fix
RuntimeError: CUDA out of memory Long protein or large batch Reduce batch_size
FileNotFoundError: v_48_020 Missing model weights Download soluble weights

Next: Structure prediction for validation → protein-qc for filtering.

Expand your agent's capabilities with these related and highly-rated skills.

adaptyvbio/protein-design-skills

proteinmpnn

Design protein sequences using ProteinMPNN inverse folding. Use this skill when: (1) Designing sequences for RFdiffusion backbones, (2) Redesigning existing protein sequences, (3) Fixing specific residues while designing others, (4) Optimizing sequences for expression or stability, (5) Multi-state or negative design. For backbone generation, use rfdiffusion or bindcraft. For ligand-aware design, use ligandmpnn. For solubility optimization, use solublempnn.

125 14
Explore
adaptyvbio/protein-design-skills

campaign-manager

Goal-oriented binder design campaign planning and health assessment. Use this skill when: (1) Planning a complete binder design campaign, (2) Converting high-level goals into runnable pipelines, (3) Assessing campaign health and pass rates, (4) Diagnosing why designs are failing QC, (5) Estimating time, cost, and expected yields, (6) Selecting between design tools for a specific target. This skill orchestrates the other protein design tools. For individual tool parameters, use the specific tool skills.

125 14
Explore
adaptyvbio/protein-design-skills

esm

ESM2 protein language model for embeddings and sequence scoring. Use this skill when: (1) Computing pseudo-log-likelihood (PLL) scores, (2) Getting protein embeddings for clustering, (3) Filtering designs by sequence plausibility, (4) Zero-shot variant effect prediction, (5) Analyzing sequence-function relationships. For structure prediction, use chai or boltz. For QC thresholds, use protein-qc.

125 14
Explore
adaptyvbio/protein-design-skills

binding-characterization

Guidance for SPR and BLI binding characterization experiments. Use when: (1) Planning binding kinetics experiments, (2) Troubleshooting poor/no binding signal, (3) Interpreting kinetic data artifacts, (4) Choosing between SPR vs BLI platforms.

125 14
Explore
adaptyvbio/protein-design-skills

cell-free-expression

Guidance for cell-free protein synthesis (CFPS) optimization. Use when: (1) Planning CFPS experiments, (2) Troubleshooting low yield or aggregation, (3) Optimizing DNA template design for CFPS, (4) Expressing difficult proteins (disulfide-rich, toxic, membrane).

125 14
Explore
adaptyvbio/protein-design-skills

ligandmpnn

Ligand-aware protein sequence design using LigandMPNN. Use this skill when: (1) Designing sequences around small molecules, (2) Enzyme active site design, (3) Ligand binding pocket optimization, (4) Metal coordination site design, (5) Cofactor binding proteins. For standard protein design, use proteinmpnn. For solubility optimization, use solublempnn.

125 14
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results