Agent skill
bio-workflows-longread-sv-pipeline
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-workflows-longread-sv-pipeline
SKILL.md
name: bio-workflows-longread-sv-pipeline description: End-to-end workflow for detecting structural variants from long-read sequencing data. Covers ONT/PacBio alignment with minimap2 and SV calling with Sniffles or cuteSV. Use when detecting structural variants from long reads. tool_type: cli primary_tool: Sniffles workflow: true depends_on:
- long-read-sequencing/long-read-alignment
- long-read-sequencing/long-read-qc
- long-read-sequencing/structural-variants qc_checkpoints:
- after_qc: "Read N50 >10kb, quality score >Q10"
- after_alignment: "Mapping rate >90%, coverage sufficient"
- after_calling: "SV count reasonable, genotypes concordant" measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Long-Read SV Pipeline
Complete workflow for detecting structural variants from ONT or PacBio long-read data.
Workflow Overview
Long reads (ONT/PacBio)
|
v
[1. QC] ----------------> NanoPlot
|
v
[2. Alignment] ---------> minimap2
|
v
[3. SV Calling] --------> Sniffles / cuteSV
|
v
[4. Filtering] ---------> bcftools
|
v
[5. Annotation] --------> AnnotSV (optional)
|
v
Filtered SV VCF
Primary Path: minimap2 + Sniffles
Step 1: Quality Control
# ONT reads QC
NanoPlot --fastq reads.fastq.gz \
--outdir nanoplot_output \
--threads 8
# Check key metrics
# - Read N50 should be >10kb
# - Mean quality >Q10
# - Total bases sufficient for coverage
Step 2: Alignment with minimap2
# ONT reads
minimap2 -ax map-ont \
-t 16 \
--MD \
-Y \
reference.fa \
reads.fastq.gz | \
samtools sort -@ 4 -o aligned.bam
samtools index aligned.bam
# PacBio HiFi
minimap2 -ax map-hifi \
-t 16 \
--MD \
-Y \
reference.fa \
reads.fastq.gz | \
samtools sort -@ 4 -o aligned.bam
# PacBio CLR
minimap2 -ax map-pb \
-t 16 \
--MD \
-Y \
reference.fa \
reads.fastq.gz | \
samtools sort -@ 4 -o aligned.bam
QC Checkpoint: Check alignment stats
samtools flagstat aligned.bam
samtools depth -a aligned.bam | awk '{sum+=$3} END {print "Average coverage:",sum/NR}'
- Mapping rate >90%
- Average coverage >10x for SV calling (>20x preferred)
Step 3: SV Calling with Sniffles
# Sniffles2 (recommended)
sniffles \
--input aligned.bam \
--vcf svs.vcf.gz \
--reference reference.fa \
--threads 8 \
--minsvlen 50
# With tandem repeat annotations (recommended)
sniffles \
--input aligned.bam \
--vcf svs.vcf.gz \
--reference reference.fa \
--tandem-repeats tandem_repeats.bed \
--threads 8
Alternative: cuteSV
# cuteSV (faster, good for ONT)
cuteSV \
aligned.bam \
reference.fa \
svs.vcf \
work_dir/ \
--threads 8 \
--min_size 50 \
--genotype
bgzip svs.vcf
tabix svs.vcf.gz
Step 4: Filtering
# Filter by quality and size
bcftools view -i 'QUAL>=20 && ABS(SVLEN)>=50' svs.vcf.gz -Oz -o svs.filtered.vcf.gz
# Filter by SV type
bcftools view -i 'SVTYPE="DEL" || SVTYPE="INS"' svs.filtered.vcf.gz -Oz -o del_ins.vcf.gz
# Filter by genotype
bcftools view -i 'GT="1/1" || GT="0/1"' svs.filtered.vcf.gz -Oz -o genotyped.vcf.gz
# Stats
bcftools stats svs.filtered.vcf.gz > sv_stats.txt
Step 5: Annotation (Optional)
# AnnotSV for gene/clinical annotations
AnnotSV -SVinputFile svs.filtered.vcf.gz \
-outputFile annotated_svs \
-genomeBuild GRCh38
Multi-Sample SV Calling
# Call SVs per sample
for sample in sample1 sample2 sample3; do
sniffles --input ${sample}.bam \
--snf ${sample}.snf \
--reference reference.fa
done
# Merge and joint genotype
sniffles --input sample1.snf sample2.snf sample3.snf \
--vcf merged_svs.vcf.gz \
--reference reference.fa
Parameter Recommendations
| Tool | Parameter | ONT | PacBio HiFi |
|---|---|---|---|
| minimap2 | -ax | map-ont | map-hifi |
| Sniffles | --minsvlen | 50 | 50 |
| Sniffles | --minsupport | auto | auto |
| cuteSV | --min_size | 50 | 50 |
| cuteSV | --min_support | 3 | 3 |
SV Types Detected
| Type | Abbreviation | Description |
|---|---|---|
| Deletion | DEL | Sequence removed |
| Insertion | INS | Sequence added |
| Duplication | DUP | Sequence copied |
| Inversion | INV | Sequence reversed |
| Translocation | BND | Breakend (interchromosomal) |
Troubleshooting
| Issue | Likely Cause | Solution |
|---|---|---|
| Few SVs | Low coverage | Increase sequencing depth |
| Many false positives | Low quality reads | Filter by QUAL, increase min support |
| Missing known SV | Repeat region | Use tandem repeat annotations |
| High breakend count | Mapping artifacts | Check alignment quality |
Complete Pipeline Script
#!/bin/bash
set -e
THREADS=16
READS="reads.fastq.gz"
REF="reference.fa"
SAMPLE="sample1"
OUTDIR="sv_results"
mkdir -p ${OUTDIR}/{qc,aligned,sv}
# Step 1: QC
echo "=== QC ==="
NanoPlot --fastq ${READS} --outdir ${OUTDIR}/qc -t ${THREADS}
# Step 2: Alignment
echo "=== Alignment ==="
minimap2 -ax map-ont -t ${THREADS} --MD -Y ${REF} ${READS} | \
samtools sort -@ 4 -o ${OUTDIR}/aligned/${SAMPLE}.bam
samtools index ${OUTDIR}/aligned/${SAMPLE}.bam
echo "Alignment stats:"
samtools flagstat ${OUTDIR}/aligned/${SAMPLE}.bam
# Step 3: SV calling
echo "=== SV Calling ==="
sniffles --input ${OUTDIR}/aligned/${SAMPLE}.bam \
--vcf ${OUTDIR}/sv/${SAMPLE}.vcf.gz \
--reference ${REF} \
--threads ${THREADS}
# Step 4: Filter
echo "=== Filtering ==="
bcftools view -i 'QUAL>=20' ${OUTDIR}/sv/${SAMPLE}.vcf.gz \
-Oz -o ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz
bcftools index ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz
# Stats
bcftools stats ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz > ${OUTDIR}/sv/stats.txt
echo "=== Complete ==="
echo "SVs: $(bcftools view -H ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz | wc -l)"
Related Skills
- long-read-sequencing/long-read-alignment - minimap2 details
- long-read-sequencing/structural-variants - Sniffles, cuteSV options
- long-read-sequencing/long-read-qc - NanoPlot metrics
- variant-calling/structural-variant-calling - Short-read SV methods
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?