Agent skill
bio-genome-assembly-assembly-polishing
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-genome-assembly-assembly-polishing
SKILL.md
name: bio-genome-assembly-assembly-polishing description: Polish genome assemblies to reduce errors using short reads (Pilon), long reads (Racon), or ONT-specific tools (medaka). Essential for improving long-read assembly accuracy. Use when improving assembly accuracy with polishing tools. tool_type: cli primary_tool: Pilon measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Assembly Polishing
Improve assembly accuracy by correcting errors using additional sequencing data.
Polishing Strategies
| Tool | Input Reads | Best For |
|---|---|---|
| Pilon | Illumina | Final polishing |
| medaka | ONT | ONT assemblies |
| Racon | Long reads | Quick polishing |
| NextPolish | Both | Combined approach |
Recommended Workflows
ONT Assembly
- Racon (2-3 rounds with ONT)
- medaka (1 round)
- Pilon (2-3 rounds with Illumina)
PacBio CLR Assembly
- Racon (2-3 rounds)
- Pilon (2-3 rounds with Illumina)
PacBio HiFi Assembly
- Often no polishing needed (>99% accuracy)
- Optional Pilon if Illumina available
Pilon (Illumina Polishing)
Installation
conda install -c bioconda pilon
Basic Usage
# Map short reads to assembly
bwa index assembly.fasta
bwa mem -t 16 assembly.fasta R1.fq.gz R2.fq.gz | samtools sort -o aligned.bam
samtools index aligned.bam
# Run Pilon
pilon --genome assembly.fasta --frags aligned.bam --output polished
Key Options
| Option | Description |
|---|---|
--genome |
Input assembly |
--frags |
Paired-end BAM |
--output |
Output prefix |
--changes |
Write changes file |
--vcf |
Write VCF of changes |
--fix |
What to fix (snps, indels, gaps, all) |
--threads |
Threads for alignment |
--mindepth |
Min depth for correction |
Multiple Rounds
#!/bin/bash
ASSEMBLY=$1
R1=$2
R2=$3
ROUNDS=${4:-3}
current=$ASSEMBLY
for i in $(seq 1 $ROUNDS); do
echo "=== Pilon round $i ==="
bwa index $current
bwa mem -t 16 $current $R1 $R2 | samtools sort -o round${i}.bam
samtools index round${i}.bam
pilon --genome $current --frags round${i}.bam --output pilon_round${i} --changes
current=pilon_round${i}.fasta
changes=$(wc -l < pilon_round${i}.changes)
echo "Changes made: $changes"
if [ $changes -eq 0 ]; then
echo "No more changes, stopping"
break
fi
done
cp $current final_polished.fasta
Fix Specific Issues
# Only fix SNPs and small indels
pilon --genome assembly.fa --frags aligned.bam --output polished --fix snps,indels
# Only fill gaps
pilon --genome assembly.fa --frags aligned.bam --output polished --fix gaps
medaka (ONT Polishing)
Installation
conda install -c bioconda medaka
Basic Usage
medaka_consensus -i reads.fastq.gz -d assembly.fasta -o medaka_output -t 8
Key Options
| Option | Description |
|---|---|
-i |
Input reads |
-d |
Draft assembly |
-o |
Output directory |
-t |
Threads |
-m |
Model name |
Model Selection
# List available models
medaka tools list_models
# Use specific model (match your basecaller)
medaka_consensus -i reads.fq.gz -d assembly.fa -o output -m r1041_e82_400bps_sup_v5.1.0
Models for Common Chemistries
| Chemistry | Model |
|---|---|
| R10.4.1 + SUP | r1041_e82_400bps_sup_* |
| R10.4.1 + HAC | r1041_e82_400bps_hac_* |
| R9.4.1 + SUP | r941_sup_* |
Output
medaka_output/
├── consensus.fasta # Polished assembly
├── calls_to_draft.bam # Alignments
└── *.hdf # Intermediate files
Racon (Long-Read Polishing)
Installation
conda install -c bioconda racon
Basic Usage
# Map reads to assembly
minimap2 -ax map-ont assembly.fasta reads.fastq.gz > aligned.sam
# Polish
racon -t 16 reads.fastq.gz aligned.sam assembly.fasta > polished.fasta
Multiple Rounds
#!/bin/bash
ASSEMBLY=$1
READS=$2
ROUNDS=${3:-3}
current=$ASSEMBLY
for i in $(seq 1 $ROUNDS); do
echo "=== Racon round $i ==="
minimap2 -ax map-ont $current $READS > round${i}.sam
racon -t 16 $READS round${i}.sam $current > racon_round${i}.fasta
current=racon_round${i}.fasta
done
cp $current racon_polished.fasta
Key Options
| Option | Description |
|---|---|
-t |
Threads |
-m |
Match score (default: 3) |
-x |
Mismatch score (default: -5) |
-g |
Gap penalty (default: -4) |
-w |
Window size (default: 500) |
Complete Polishing Workflow
ONT Assembly Polishing
#!/bin/bash
set -euo pipefail
ASSEMBLY=$1 # Flye assembly
ONT_READS=$2 # ONT reads
ILLUMINA_R1=$3 # Illumina R1
ILLUMINA_R2=$4 # Illumina R2
OUTDIR=$5
mkdir -p $OUTDIR
# Step 1: Racon polishing (2 rounds)
echo "=== Racon Polishing ==="
current=$ASSEMBLY
for i in 1 2; do
minimap2 -ax map-ont $current $ONT_READS > ${OUTDIR}/racon_${i}.sam
racon -t 16 $ONT_READS ${OUTDIR}/racon_${i}.sam $current > ${OUTDIR}/racon_${i}.fasta
current=${OUTDIR}/racon_${i}.fasta
done
# Step 2: medaka polishing
echo "=== medaka Polishing ==="
medaka_consensus -i $ONT_READS -d $current -o ${OUTDIR}/medaka -t 8
current=${OUTDIR}/medaka/consensus.fasta
# Step 3: Pilon polishing (2 rounds)
echo "=== Pilon Polishing ==="
for i in 1 2; do
bwa index $current
bwa mem -t 16 $current $ILLUMINA_R1 $ILLUMINA_R2 | samtools sort -o ${OUTDIR}/pilon_${i}.bam
samtools index ${OUTDIR}/pilon_${i}.bam
pilon --genome $current --frags ${OUTDIR}/pilon_${i}.bam --output ${OUTDIR}/pilon_${i}
current=${OUTDIR}/pilon_${i}.fasta
done
cp $current ${OUTDIR}/final_polished.fasta
echo "Done: ${OUTDIR}/final_polished.fasta"
NextPolish (Combined Approach)
Installation
conda install -c bioconda nextpolish
Usage
# Create config file
cat > run.cfg << EOF
[General]
job_type = local
job_prefix = nextPolish
task = best
rewrite = yes
rerun = 3
parallel_jobs = 2
multithread_jobs = 8
genome = assembly.fasta
genome_size = auto
workdir = ./01_rundir
[lgs_option]
lgs_fofn = lgs.fofn
lgs_options = -min_read_len 1k -max_depth 100
lgs_minimap2_options = -x map-ont
[sgs_option]
sgs_fofn = sgs.fofn
sgs_options = -max_depth 100
EOF
# File of filenames
ls reads.fastq.gz > lgs.fofn
ls R1.fq.gz R2.fq.gz > sgs.fofn
# Run
nextPolish run.cfg
Quality Assessment
After polishing, assess improvement:
# Compare to reference (if available)
quast.py -r reference.fa original.fa polished.fa -o quast_comparison
# Check error rate
minimap2 -ax map-ont polished.fa reads.fq.gz | samtools stats | grep "error rate"
Related Skills
- long-read-assembly - Initial assembly
- short-read-assembly - Source of polishing reads
- assembly-qc - Assess polishing improvement
- long-read-sequencing - medaka variant calling
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?