Agent skill
bio-format-conversion
Convert between sequence file formats (FASTA, FASTQ, GenBank, EMBL) using Biopython Bio.SeqIO. Use when changing file formats or preparing data for different tools.
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-format-conversion
SKILL.md
Version Compatibility
Reference examples tested with: BioPython 1.83+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
pip show <package>thenhelp(module.function)to check signatures
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Format Conversion
"Convert this file to a different format" → Read records in one format, optionally add missing annotations, and write in the target format.
- Python:
SeqIO.convert()for direct conversion, orSeqIO.parse()+SeqIO.write()when modifications are needed (BioPython) - CLI:
seqkit seq(SeqKit) for FASTA/FASTQ;samtools viewfor SAM/BAM/CRAM
Convert sequence files between formats using Biopython's Bio.SeqIO module.
Required Import
from Bio import SeqIO
Core Function
SeqIO.convert() - Direct Conversion
Convert between formats in a single call. Most efficient method.
count = SeqIO.convert('input.gb', 'genbank', 'output.fasta', 'fasta')
print(f'Converted {count} records')
Parameters:
in_file- Input filename or handlein_format- Input format stringout_file- Output filename or handleout_format- Output format string
Returns: Number of records converted
Common Conversions
| From | To | Notes |
|---|---|---|
| GenBank | FASTA | Loses annotations, keeps sequence |
| FASTA | GenBank | Need to add molecule_type |
| FASTQ | FASTA | Loses quality scores |
| FASTA | FASTQ | Need to add quality scores |
| GenBank | EMBL | Usually works directly |
| Stockholm | FASTA | Alignment to sequences |
Code Patterns
Simple Conversion
SeqIO.convert('input.gb', 'genbank', 'output.fasta', 'fasta')
GenBank to FASTA
SeqIO.convert('sequence.gb', 'genbank', 'sequence.fasta', 'fasta')
FASTQ to FASTA (drop quality)
SeqIO.convert('reads.fastq', 'fastq', 'reads.fasta', 'fasta')
FASTA to GenBank (requires molecule_type)
Goal: Convert FASTA to GenBank format, which requires molecule_type annotation.
Approach: Stream records through a generator that injects the missing annotation, then write.
Reference (BioPython 1.83+):
records = SeqIO.parse('input.fasta', 'fasta')
def add_molecule_type(records):
for record in records:
record.annotations['molecule_type'] = 'DNA'
yield record
SeqIO.write(add_molecule_type(records), 'output.gb', 'genbank')
FASTA to FASTQ (add dummy quality)
Goal: Convert FASTA to FASTQ by assigning uniform placeholder quality scores.
Approach: Stream records through a generator that adds phred_quality to each, then write as FASTQ.
Reference (BioPython 1.83+):
def add_quality(records, quality=30):
for record in records:
record.letter_annotations['phred_quality'] = [quality] * len(record.seq)
yield record
records = SeqIO.parse('input.fasta', 'fasta')
SeqIO.write(add_quality(records), 'output.fastq', 'fastq')
Batch Convert Multiple Files
Goal: Convert all files of one format in a directory to another format.
Approach: Glob for input files, apply SeqIO.convert() to each, and report per-file counts.
Reference (BioPython 1.83+):
from pathlib import Path
for gb_file in Path('.').glob('*.gb'):
fasta_file = gb_file.with_suffix('.fasta')
count = SeqIO.convert(str(gb_file), 'genbank', str(fasta_file), 'fasta')
print(f'{gb_file.name}: {count} records')
Convert with Modifications
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
def uppercase_record(rec):
return SeqRecord(rec.seq.upper(), id=rec.id, description=rec.description)
records = SeqIO.parse('input.fasta', 'fasta')
modified = (uppercase_record(rec) for rec in records)
SeqIO.write(modified, 'output.fasta', 'fasta')
Alignment Format Conversion
from Bio import AlignIO
AlignIO.convert('alignment.sto', 'stockholm', 'alignment.phy', 'phylip')
Format Compatibility Matrix
Can convert directly (no modifications needed):
- GenBank <-> EMBL
- FASTA -> any format (may need annotations added)
- Any format -> FASTA (always works, may lose data)
- FASTQ -> FASTA
Requires adding data:
- FASTA -> FASTQ (need quality scores)
- FASTA -> GenBank (need molecule_type)
May lose data:
- GenBank -> FASTA (loses features, annotations)
- FASTQ -> FASTA (loses quality scores)
- Any rich format -> FASTA
Common Errors
| Error | Cause | Solution |
|---|---|---|
ValueError: missing molecule_type |
FASTA to GenBank | Add molecule_type annotation |
ValueError: missing quality scores |
FASTA to FASTQ | Add phred_quality to letter_annotations |
KeyError: 'phred_quality' |
Wrong FASTQ variant | Try 'fastq-sanger', 'fastq-illumina' |
Decision Tree
Converting formats?
├── Simple conversion (no data changes)?
│ └── Use SeqIO.convert() directly
├── Need to add annotations?
│ └── Parse, modify records, then write
├── Need to transform sequences?
│ └── Parse, apply transformation, then write
└── Multiple files?
└── Loop with SeqIO.convert() or batch generator
Related Skills
- read-sequences - Parse sequences for custom conversion logic
- write-sequences - Write converted sequences with modifications
- batch-processing - Convert multiple files at once
- compressed-files - Handle compressed input/output during conversion
- alignment-files - For SAM/BAM/CRAM conversion, use samtools view
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?