Agent skill

bio-genome-assembly-short-read-assembly

Stars 2,009
Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-genome-assembly-short-read-assembly

SKILL.md


name: bio-genome-assembly-short-read-assembly description: De novo genome assembly from Illumina short reads using SPAdes. Covers bacterial, fungal, and small eukaryotic genome assembly, as well as metagenome and transcriptome assembly modes. Use when assembling genomes from Illumina reads. tool_type: cli primary_tool: SPAdes measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

Short-Read Assembly

Assemble genomes from Illumina paired-end or single-end reads using SPAdes.

SPAdes Overview

SPAdes (St. Petersburg genome Assembler) uses de Bruijn graph approach with multiple k-mer sizes for robust assembly.

Installation

bash
conda install -c bioconda spades

Basic Usage

Paired-End Assembly

bash
spades.py -1 R1.fastq.gz -2 R2.fastq.gz -o output_dir

Single-End Assembly

bash
spades.py -s reads.fastq.gz -o output_dir

With Unpaired Reads

bash
spades.py -1 R1.fastq.gz -2 R2.fastq.gz -s unpaired.fastq.gz -o output_dir

Assembly Modes

Isolate Mode (Default for Bacteria)

bash
spades.py --isolate -1 R1.fq.gz -2 R2.fq.gz -o isolate_assembly

Best for single-organism isolates with uniform coverage.

Careful Mode

bash
spades.py --careful -1 R1.fq.gz -2 R2.fq.gz -o careful_assembly

Reduces misassemblies at cost of speed. Recommended for small genomes.

Meta Mode (Metagenomes)

bash
spades.py --meta -1 R1.fq.gz -2 R2.fq.gz -o meta_assembly

For mixed microbial communities with varying coverage.

RNA Mode (Transcriptomes)

bash
spades.py --rna -1 R1.fq.gz -2 R2.fq.gz -o rna_assembly

Assembles transcripts from RNA-seq data.

Plasmid Mode

bash
spades.py --plasmid -1 R1.fq.gz -2 R2.fq.gz -o plasmid_assembly

Extracts plasmid sequences from bacterial isolates.

Key Options

Option Description
-o <dir> Output directory
-t <#> Number of threads (default: 16)
-m <#> Memory limit in GB (default: 250)
-k <#,#,...> K-mer sizes (auto by default)
--careful Reduce misassemblies
--isolate Isolate mode for uniform coverage
--meta Metagenome mode
--rna RNA-seq assembly
--cov-cutoff <#> Coverage cutoff (default: off)
--only-assembler Skip error correction
--continue Resume interrupted run

Multiple Libraries

Paired Libraries with Different Insert Sizes

bash
spades.py \
    --pe1-1 short_R1.fq.gz --pe1-2 short_R2.fq.gz \
    --pe2-1 long_R1.fq.gz --pe2-2 long_R2.fq.gz \
    -o output_dir

With Mate Pairs

bash
spades.py \
    --pe1-1 paired_R1.fq.gz --pe1-2 paired_R2.fq.gz \
    --mp1-1 mate_R1.fq.gz --mp1-2 mate_R2.fq.gz \
    -o output_dir

With PacBio/Nanopore (Hybrid)

bash
spades.py \
    -1 illumina_R1.fq.gz -2 illumina_R2.fq.gz \
    --pacbio pacbio.fq.gz \
    -o hybrid_assembly

# Or with Nanopore
spades.py \
    -1 illumina_R1.fq.gz -2 illumina_R2.fq.gz \
    --nanopore nanopore.fq.gz \
    -o hybrid_assembly

K-mer Selection

Auto Selection (Recommended)

SPAdes automatically selects appropriate k-mers based on read length.

Manual K-mer Specification

bash
# For 150bp reads
spades.py -k 21,33,55,77 -1 R1.fq.gz -2 R2.fq.gz -o output

# For 250bp reads
spades.py -k 21,33,55,77,99,127 -1 R1.fq.gz -2 R2.fq.gz -o output

Output Files

output_dir/
├── scaffolds.fasta     # Final scaffolds (use this)
├── contigs.fasta       # Contigs before scaffolding
├── assembly_graph.gfa  # Assembly graph
├── spades.log          # Log file
├── params.txt          # Parameters used
└── K*/                 # Intermediate k-mer assemblies

Scaffold FASTA Headers

>NODE_1_length_500000_cov_50.5
  • NODE_1 - Contig/scaffold ID
  • length_500000 - Sequence length
  • cov_50.5 - Average k-mer coverage

Memory and Performance

Reduce Memory Usage

bash
# Limit memory to 32GB
spades.py -m 32 -1 R1.fq.gz -2 R2.fq.gz -o output

# Use fewer threads
spades.py -t 8 -1 R1.fq.gz -2 R2.fq.gz -o output

Resume Interrupted Assembly

bash
spades.py --continue -o output_dir

Skip Error Correction

bash
# If reads already corrected
spades.py --only-assembler -1 R1.fq.gz -2 R2.fq.gz -o output

Complete Workflows

Bacterial Genome Assembly

bash
#!/bin/bash
set -euo pipefail

R1=$1
R2=$2
OUTDIR=$3
THREADS=${4:-16}

echo "=== Bacterial Genome Assembly ==="

# Run SPAdes in isolate mode
spades.py \
    --isolate \
    --careful \
    -t $THREADS \
    -1 $R1 -2 $R2 \
    -o $OUTDIR

# Basic stats
echo "Assembly statistics:"
grep -c "^>" ${OUTDIR}/scaffolds.fasta
seqkit stats ${OUTDIR}/scaffolds.fasta

Metagenome Assembly

bash
#!/bin/bash
set -euo pipefail

R1=$1
R2=$2
OUTDIR=$3

spades.py \
    --meta \
    -t 32 \
    -m 200 \
    -1 $R1 -2 $R2 \
    -o $OUTDIR

echo "Metagenome assembly complete: ${OUTDIR}/scaffolds.fasta"

Transcriptome Assembly

bash
spades.py \
    --rna \
    -t 16 \
    -1 rnaseq_R1.fq.gz -2 rnaseq_R2.fq.gz \
    -o transcriptome_assembly

Alternative Assemblers

Assembler Best For
SPAdes Small genomes, bacteria, fungi
MEGAHIT Metagenomes (memory efficient)
ABySS Large genomes
Velvet Legacy, small genomes
Trinity Transcriptomes

MEGAHIT (Alternative for Metagenomes)

bash
megahit -1 R1.fq.gz -2 R2.fq.gz -o megahit_output -t 16

Troubleshooting

Out of Memory

  • Reduce -m limit
  • Use --meta mode (more memory efficient)
  • Try MEGAHIT instead

Poor Assembly

  • Check read quality with FastQC
  • Trim adapters and low-quality bases
  • Increase coverage if possible
  • Try --careful mode

Long Runtime

  • Reduce k-mer values
  • Use --only-assembler if reads pre-corrected
  • Increase threads

Related Skills

  • read-qc - Preprocess reads before assembly
  • assembly-polishing - Polish assembly with Pilon
  • assembly-qc - Assess with QUAST/BUSCO
  • long-read-assembly - Long-read alternatives

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results