Agent skill
long-read-sequencing-agent
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/long-read-sequencing-agent
SKILL.md
name: 'long-read-sequencing-agent' description: 'AI-powered analysis of long-read sequencing data (PacBio, ONT) for structural variant detection, isoform discovery, epigenetic modifications, and de novo assembly.' measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Long-Read Sequencing Agent
The Long-Read Sequencing Agent provides comprehensive AI-driven analysis of long-read sequencing data from PacBio (HiFi) and Oxford Nanopore (ONT) platforms. It enables structural variant detection, full-length isoform discovery, base modification calling, and de novo genome assembly.
When to Use This Skill
- When detecting structural variants (SVs) missed by short-read sequencing.
- To characterize full-length transcript isoforms and alternative splicing.
- For detecting DNA base modifications (5mC, 6mA) directly from sequencing.
- When performing de novo genome assembly for complex regions.
- To phase variants and generate fully-resolved haplotypes.
Core Capabilities
-
Structural Variant Detection: AI-enhanced SV calling for deletions, insertions, inversions, translocations, and complex rearrangements.
-
Isoform Discovery: Full-length transcript sequencing for novel isoform and fusion detection.
-
Base Modification Calling: Direct detection of DNA methylation (5mC, 5hmC, 6mA) from native sequencing.
-
Haplotype Phasing: Phase-resolved assemblies and variant calling.
-
De Novo Assembly: Assemble complex genomic regions (centromeres, telomeres, HLA).
-
Error Correction: AI-based error correction for long-read data.
Platform Comparison
| Feature | PacBio HiFi | ONT (R10+) |
|---|---|---|
| Read length | 15-25 kb | >100 kb possible |
| Accuracy | >99.9% (HiFi) | >99% (Q20+) |
| Base mods | 5mC, 6mA | 5mC, 5hmC, 6mA, more |
| Throughput | 20-40 Gb/run | 100+ Gb/run |
| Cost | Higher | Lower |
Workflow
-
Input: Long-read FASTQ/BAM files from PacBio or ONT sequencing.
-
QC & Alignment: Filter reads by quality, align to reference genome.
-
SV Calling: Detect structural variants using Sniffles, PBSV, or CuteSV.
-
Isoform Analysis: Identify full-length isoforms with IsoSeq or FLAIR.
-
Modification Calling: Extract base modifications from signal data.
-
Phasing: Generate haplotype-resolved variant calls.
-
Output: SV calls, isoform annotations, modification maps, phased assemblies.
Example Usage
User: "Analyze this PacBio HiFi dataset for structural variants and DNA methylation in a cancer sample."
Agent Action:
python3 Skills/Genomics/Long_Read_Sequencing_Agent/longread_analyzer.py \
--input cancer_hifi.bam \
--platform pacbio_hifi \
--reference GRCh38.fa \
--sv_calling sniffles2 \
--methylation true \
--phasing true \
--output longread_results/
Structural Variant Detection
| Tool | Platform | SV Types | Strengths |
|---|---|---|---|
| Sniffles2 | Both | All SV types | Speed, accuracy |
| PBSV | PacBio | All SV types | HiFi optimized |
| CuteSV | Both | All SV types | Sensitivity |
| SAVANA | Both | Somatic SVs | Cancer-specific |
| Jasmine | Both | Population SV | Multi-sample |
SV Size Spectrum:
- Small SVs: 50-500 bp (often missed by short-read)
- Medium SVs: 500 bp - 10 kb
- Large SVs: >10 kb
- Complex SVs: Multi-breakpoint events
Isoform Analysis
Full-Length Transcript Sequencing:
- Capture full gene structures (5' to 3')
- Detect novel exons and splice junctions
- Identify gene fusions
- Quantify isoform expression
Tools:
- IsoSeq3 (PacBio): Clustering and polishing
- FLAIR (Both): Isoform discovery and quantification
- StringTie2 (Both): Guided assembly
- SQANTI3: Isoform classification and QC
Base Modification Detection
| Modification | Detection | Biological Role |
|---|---|---|
| 5mC | Both platforms | Gene silencing |
| 5hmC | ONT primarily | Active demethylation |
| 6mA | Both platforms | Bacterial/mitochondrial |
| BrdU | ONT | Replication timing |
Resolution: Single-base, single-molecule, strand-specific
AI/ML Components
Error Correction:
- DeepConsensus (PacBio): Transformer for HiFi calling
- Medaka (ONT): Neural network polishing
- PEPPER-Margin-DeepVariant: AI variant calling
SV Classification:
- Deep learning for complex SV characterization
- ML filters for false positive reduction
- Multi-sample joint calling
Clinical Applications
- Cancer Genomics: Detect SVs driving oncogene activation
- Rare Disease: Resolve variants in complex regions
- Pharmacogenomics: Phase CYP450 star alleles
- HLA Typing: Full-resolution typing for transplant
- Repeat Expansions: Size tandem repeat diseases
Prerequisites
- Python 3.10+
- Sniffles2, PBSV, CuteSV for SV calling
- minimap2/pbmm2 for alignment
- High-memory system (64GB+ recommended)
Related Skills
- Long_Read_SV_Caller - For specialized SV analysis
- Variant_Interpretation - For variant annotation
- Epigenomics_MethylGPT_Agent - For methylation analysis
Output Files
| Output | Format | Content |
|---|---|---|
| SVs | VCF | Structural variants |
| Methylation | BED/bigWig | Modification calls |
| Isoforms | GTF | Transcript annotations |
| Phased | VCF | Haplotype-resolved variants |
| Assembly | FASTA | Assembled contigs |
Author
AI Group - Biomedical AI Platform
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?