Agent skill
uniprot
Access UniProt for protein sequence and annotation retrieval. Use this skill when: (1) Looking up protein sequences by accession, (2) Finding functional annotations, (3) Getting domain boundaries, (4) Finding homologs and variants, (5) Cross-referencing to PDB structures. For structure retrieval, use pdb. For sequence design, use proteinmpnn.
Install this agent skill to your Project
npx add-skill https://github.com/adaptyvbio/protein-design-skills/tree/main/skills/uniprot
SKILL.md
UniProt Database Access
Note: This skill uses the UniProt REST API directly. No Modal deployment needed - all operations run locally via HTTP requests.
Fetching Sequences
By Accession
# FASTA format
curl "https://rest.uniprot.org/uniprotkb/P00533.fasta"
# JSON format with annotations
curl "https://rest.uniprot.org/uniprotkb/P00533.json"
Using Python
import requests
def get_uniprot_sequence(accession):
"""Fetch sequence from UniProt."""
url = f"https://rest.uniprot.org/uniprotkb/{accession}.fasta"
response = requests.get(url)
if response.ok:
lines = response.text.strip().split('\n')
header = lines[0]
sequence = ''.join(lines[1:])
return header, sequence
return None, None
Getting Annotations
Full Entry
def get_uniprot_entry(accession):
"""Fetch full UniProt entry as JSON."""
url = f"https://rest.uniprot.org/uniprotkb/{accession}.json"
response = requests.get(url)
return response.json() if response.ok else None
entry = get_uniprot_entry("P00533")
print(f"Protein: {entry['proteinDescription']['recommendedName']['fullName']['value']}")
Domain Boundaries
def get_domains(accession):
"""Extract domain annotations."""
entry = get_uniprot_entry(accession)
domains = []
for feature in entry.get('features', []):
if feature['type'] == 'Domain':
domains.append({
'name': feature.get('description', ''),
'start': feature['location']['start']['value'],
'end': feature['location']['end']['value']
})
return domains
# Example: EGFR domains
domains = get_domains("P00533")
# [{'name': 'Kinase', 'start': 712, 'end': 979}, ...]
Searching UniProt
By Gene Name
def search_uniprot(query, organism=None, limit=10):
"""Search UniProt by query."""
url = "https://rest.uniprot.org/uniprotkb/search"
params = {
"query": query,
"format": "json",
"size": limit
}
if organism:
params["query"] += f" AND organism_id:{organism}"
response = requests.get(url, params=params)
return response.json()['results']
# Search for human EGFR
results = search_uniprot("EGFR", organism=9606)
By Sequence Similarity (BLAST)
# Use UniProt BLAST
# https://www.uniprot.org/blast
Cross-References
Get PDB Structures
def get_pdb_references(accession):
"""Get PDB structures for UniProt entry."""
entry = get_uniprot_entry(accession)
pdbs = []
for xref in entry.get('uniProtKBCrossReferences', []):
if xref['database'] == 'PDB':
pdbs.append({
'pdb_id': xref['id'],
'method': xref.get('properties', [{}])[0].get('value', ''),
'chains': xref.get('properties', [{}])[1].get('value', '')
})
return pdbs
# Example: PDB structures for EGFR
pdbs = get_pdb_references("P00533")
Common Use Cases
Target Selection
# 1. Find protein by name
results = search_uniprot("insulin receptor", organism=9606)
# 2. Get accession
accession = results[0]['primaryAccession'] # e.g., P06213
# 3. Get domains
domains = get_domains(accession)
# 4. Find PDB structure
pdbs = get_pdb_references(accession)
# 5. Download best structure for design
Sequence Alignment Info
def get_sequence_variants(accession):
"""Get natural variants from UniProt."""
entry = get_uniprot_entry(accession)
variants = []
for feature in entry.get('features', []):
if feature['type'] == 'Natural variant':
variants.append({
'position': feature['location']['start']['value'],
'original': feature.get('alternativeSequence', {}).get('originalSequence', ''),
'variant': feature.get('alternativeSequence', {}).get('alternativeSequences', [''])[0],
'description': feature.get('description', '')
})
return variants
API Reference
| Endpoint | Description |
|---|---|
/uniprotkb/{id}.fasta |
FASTA sequence |
/uniprotkb/{id}.json |
Full entry JSON |
/uniprotkb/search |
Search entries |
/uniprotkb/stream |
Batch download |
Troubleshooting
Entry not found: Check accession format (e.g., P00533) Rate limits: Add delay between requests Large downloads: Use stream endpoint with pagination
Next: Use sequence with esm for embeddings or colabfold for structure.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
proteinmpnn
Design protein sequences using ProteinMPNN inverse folding. Use this skill when: (1) Designing sequences for RFdiffusion backbones, (2) Redesigning existing protein sequences, (3) Fixing specific residues while designing others, (4) Optimizing sequences for expression or stability, (5) Multi-state or negative design. For backbone generation, use rfdiffusion or bindcraft. For ligand-aware design, use ligandmpnn. For solubility optimization, use solublempnn.
campaign-manager
Goal-oriented binder design campaign planning and health assessment. Use this skill when: (1) Planning a complete binder design campaign, (2) Converting high-level goals into runnable pipelines, (3) Assessing campaign health and pass rates, (4) Diagnosing why designs are failing QC, (5) Estimating time, cost, and expected yields, (6) Selecting between design tools for a specific target. This skill orchestrates the other protein design tools. For individual tool parameters, use the specific tool skills.
esm
ESM2 protein language model for embeddings and sequence scoring. Use this skill when: (1) Computing pseudo-log-likelihood (PLL) scores, (2) Getting protein embeddings for clustering, (3) Filtering designs by sequence plausibility, (4) Zero-shot variant effect prediction, (5) Analyzing sequence-function relationships. For structure prediction, use chai or boltz. For QC thresholds, use protein-qc.
binding-characterization
Guidance for SPR and BLI binding characterization experiments. Use when: (1) Planning binding kinetics experiments, (2) Troubleshooting poor/no binding signal, (3) Interpreting kinetic data artifacts, (4) Choosing between SPR vs BLI platforms.
cell-free-expression
Guidance for cell-free protein synthesis (CFPS) optimization. Use when: (1) Planning CFPS experiments, (2) Troubleshooting low yield or aggregation, (3) Optimizing DNA template design for CFPS, (4) Expressing difficult proteins (disulfide-rich, toxic, membrane).
ligandmpnn
Ligand-aware protein sequence design using LigandMPNN. Use this skill when: (1) Designing sequences around small molecules, (2) Enzyme active site design, (3) Ligand binding pocket optimization, (4) Metal coordination site design, (5) Cofactor binding proteins. For standard protein design, use proteinmpnn. For solubility optimization, use solublempnn.
Didn't find tool you were looking for?