Agent skill
bio-write-sequences
Write biological sequences to files (FASTA, FASTQ, GenBank, EMBL) using Biopython Bio.SeqIO. Use when saving sequences, creating new sequence files, or outputting modified records.
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-write-sequences
SKILL.md
Version Compatibility
Reference examples tested with: BioPython 1.83+, pysam 0.22+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
pip show <package>thenhelp(module.function)to check signatures
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Write Sequences
"Write sequences to a file" → Serialize SeqRecord objects into a formatted sequence file.
- Python:
SeqIO.write()(BioPython) - R:
writeXStringSet()(Biostrings)
Write SeqRecord objects to sequence files using Biopython's Bio.SeqIO module.
Required Import
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
Core Functions
SeqIO.write() - Write Records to File
Write one or more SeqRecord objects to a file.
SeqIO.write(records, 'output.fasta', 'fasta')
Parameters:
records- Single SeqRecord, list, or iterator of SeqRecordshandle- Filename (string) or file handleformat- Output format string
Returns: Number of records written (integer)
record.format() - Get Formatted String
Get a string representation without writing to file.
formatted = record.format('fasta')
print(formatted)
Creating SeqRecord Objects
Goal: Construct in-memory sequence records suitable for writing to any format.
Approach: Create SeqRecord with at minimum a Seq and id. Add letter_annotations for FASTQ, annotations['molecule_type'] for GenBank/EMBL.
"Create a sequence record from scratch" → Wrap a Seq string in a SeqRecord with metadata fields.
- Python:
SeqRecord(Seq(...), id=...)(BioPython)
Minimal SeqRecord
record = SeqRecord(Seq('ATGCGATCGATCG'), id='seq1')
Full SeqRecord
record = SeqRecord(
Seq('ATGCGATCGATCG'),
id='seq1',
name='sequence_one',
description='Example sequence for demonstration'
)
With Annotations (for GenBank output)
from Bio.SeqFeature import SeqFeature, FeatureLocation
record = SeqRecord(
Seq('ATGCGATCGATCG'),
id='seq1',
annotations={'molecule_type': 'DNA'}
)
record.features.append(
SeqFeature(FeatureLocation(0, 9), type='gene', qualifiers={'gene': ['exampleGene']})
)
Common Formats
| Format | String | Notes |
|---|---|---|
| FASTA | 'fasta' |
Most universal, sequence + header only |
| FASTQ | 'fastq' |
Requires quality scores in letter_annotations |
| GenBank | 'genbank' |
Requires annotations and molecule_type |
| EMBL | 'embl' |
Similar requirements to GenBank |
| Tab | 'tab' |
Simple ID + sequence tabular format |
Code Patterns
Write Single Record
record = SeqRecord(Seq('ATGC'), id='my_seq', description='test sequence')
SeqIO.write(record, 'output.fasta', 'fasta')
Write Multiple Records
records = [
SeqRecord(Seq('ATGC'), id='seq1'),
SeqRecord(Seq('GCTA'), id='seq2'),
SeqRecord(Seq('TTAA'), id='seq3')
]
count = SeqIO.write(records, 'output.fasta', 'fasta')
print(f'Wrote {count} records')
Write to File Handle
with open('output.fasta', 'w') as handle:
SeqIO.write(records, handle, 'fasta')
Write Modified Records
Goal: Transform sequences in-memory and write the modified versions to a new file.
Approach: Parse input, apply transformation via generator, write output. Using a generator avoids loading all records into memory.
"Modify sequences and save" → Parse records, transform each, write to new file with SeqIO.write().
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
def uppercase_record(rec):
return SeqRecord(rec.seq.upper(), id=rec.id, description=rec.description)
records = SeqIO.parse('input.fasta', 'fasta')
modified = (uppercase_record(rec) for rec in records)
SeqIO.write(modified, 'output.fasta', 'fasta')
Append to Existing File
with open('output.fasta', 'a') as handle:
SeqIO.write(new_records, handle, 'fasta')
Write FASTQ with Quality Scores
record = SeqRecord(Seq('ATGCGATCG'), id='read1')
record.letter_annotations['phred_quality'] = [30, 30, 28, 25, 30, 30, 28, 25, 30]
SeqIO.write(record, 'output.fastq', 'fastq')
Write GenBank Format
record = SeqRecord(Seq('ATGCGATCGATCG'), id='SEQ001', name='example')
record.annotations['molecule_type'] = 'DNA'
record.annotations['topology'] = 'linear'
record.annotations['organism'] = 'Example organism'
SeqIO.write(record, 'output.gb', 'genbank')
Common Errors
| Error | Cause | Solution |
|---|---|---|
TypeError: SeqRecord expected |
Passed raw string/Seq | Wrap in SeqRecord object |
ValueError: missing molecule_type |
GenBank without annotations | Add record.annotations['molecule_type'] = 'DNA' |
ValueError: missing quality scores |
FASTQ without phred_quality | Add quality scores to letter_annotations |
ValueError: Sequences must all be the same length |
PHYLIP with unequal lengths | Pad or trim sequences first |
Format-Specific Requirements
FASTQ
Must have quality scores:
record.letter_annotations['phred_quality'] = [30] * len(record.seq)
GenBank/EMBL
Must have molecule_type:
record.annotations['molecule_type'] = 'DNA' # or 'RNA', 'protein'
PHYLIP
All sequences must be same length. IDs truncated to 10 characters.
Related Skills
- read-sequences - Read sequences before modifying and writing
- format-conversion - Direct format conversion without intermediate processing
- filter-sequences - Filter sequences before writing subset
- sequence-manipulation/seq-objects - Create SeqRecord objects to write
- alignment-files - For SAM/BAM output, use samtools/pysam
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?