Agent skill
bio-format-conversion
Convert between sequence file formats (FASTA, FASTQ, GenBank, EMBL) using Biopython Bio.SeqIO. Use when changing file formats or preparing data for different tools.
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/format-conversion
SKILL.md
Format Conversion
Convert sequence files between formats using Biopython's Bio.SeqIO module.
Required Import
python
from Bio import SeqIO
Core Function
SeqIO.convert() - Direct Conversion
Convert between formats in a single call. Most efficient method.
python
count = SeqIO.convert('input.gb', 'genbank', 'output.fasta', 'fasta')
print(f'Converted {count} records')
Parameters:
in_file- Input filename or handlein_format- Input format stringout_file- Output filename or handleout_format- Output format string
Returns: Number of records converted
Common Conversions
| From | To | Notes |
|---|---|---|
| GenBank | FASTA | Loses annotations, keeps sequence |
| FASTA | GenBank | Need to add molecule_type |
| FASTQ | FASTA | Loses quality scores |
| FASTA | FASTQ | Need to add quality scores |
| GenBank | EMBL | Usually works directly |
| Stockholm | FASTA | Alignment to sequences |
Code Patterns
Simple Conversion
python
SeqIO.convert('input.gb', 'genbank', 'output.fasta', 'fasta')
GenBank to FASTA
python
SeqIO.convert('sequence.gb', 'genbank', 'sequence.fasta', 'fasta')
FASTQ to FASTA (drop quality)
python
SeqIO.convert('reads.fastq', 'fastq', 'reads.fasta', 'fasta')
FASTA to GenBank (requires molecule_type)
python
records = SeqIO.parse('input.fasta', 'fasta')
def add_molecule_type(records):
for record in records:
record.annotations['molecule_type'] = 'DNA'
yield record
SeqIO.write(add_molecule_type(records), 'output.gb', 'genbank')
FASTA to FASTQ (add dummy quality)
python
def add_quality(records, quality=30):
for record in records:
record.letter_annotations['phred_quality'] = [quality] * len(record.seq)
yield record
records = SeqIO.parse('input.fasta', 'fasta')
SeqIO.write(add_quality(records), 'output.fastq', 'fastq')
Batch Convert Multiple Files
python
from pathlib import Path
for gb_file in Path('.').glob('*.gb'):
fasta_file = gb_file.with_suffix('.fasta')
count = SeqIO.convert(str(gb_file), 'genbank', str(fasta_file), 'fasta')
print(f'{gb_file.name}: {count} records')
Convert with Modifications
python
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
def uppercase_record(rec):
return SeqRecord(rec.seq.upper(), id=rec.id, description=rec.description)
records = SeqIO.parse('input.fasta', 'fasta')
modified = (uppercase_record(rec) for rec in records)
SeqIO.write(modified, 'output.fasta', 'fasta')
Alignment Format Conversion
python
from Bio import AlignIO
AlignIO.convert('alignment.sto', 'stockholm', 'alignment.phy', 'phylip')
Format Compatibility Matrix
Can convert directly (no modifications needed):
- GenBank <-> EMBL
- FASTA -> any format (may need annotations added)
- Any format -> FASTA (always works, may lose data)
- FASTQ -> FASTA
Requires adding data:
- FASTA -> FASTQ (need quality scores)
- FASTA -> GenBank (need molecule_type)
May lose data:
- GenBank -> FASTA (loses features, annotations)
- FASTQ -> FASTA (loses quality scores)
- Any rich format -> FASTA
Common Errors
| Error | Cause | Solution |
|---|---|---|
ValueError: missing molecule_type |
FASTA to GenBank | Add molecule_type annotation |
ValueError: missing quality scores |
FASTA to FASTQ | Add phred_quality to letter_annotations |
KeyError: 'phred_quality' |
Wrong FASTQ variant | Try 'fastq-sanger', 'fastq-illumina' |
Decision Tree
Converting formats?
├── Simple conversion (no data changes)?
│ └── Use SeqIO.convert() directly
├── Need to add annotations?
│ └── Parse, modify records, then write
├── Need to transform sequences?
│ └── Parse, apply transformation, then write
└── Multiple files?
└── Loop with SeqIO.convert() or batch generator
Related Skills
- read-sequences - Parse sequences for custom conversion logic
- write-sequences - Write converted sequences with modifications
- batch-processing - Convert multiple files at once
- compressed-files - Handle compressed input/output during conversion
- alignment-files - For SAM/BAM/CRAM conversion, use samtools view
Didn't find tool you were looking for?