Agent skill

deepvariant

Stars 2,009
Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/variant-interpretation-acmg/bioSkills/deepvariant

SKILL.md


name: bio-variant-calling-deepvariant description: Deep learning-based variant calling with Google DeepVariant. Provides high accuracy for germline SNPs and indels from Illumina, PacBio, and ONT data. Use when calling variants with DeepVariant deep learning caller. tool_type: cli primary_tool: DeepVariant measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

DeepVariant Variant Calling

Installation

Docker (Recommended)

bash
docker pull google/deepvariant:1.6.1

# Or with GPU support
docker pull google/deepvariant:1.6.1-gpu

Singularity

bash
singularity pull docker://google/deepvariant:1.6.1

Basic Usage

One-Step Run (run_deepvariant)

bash
docker run -v "${PWD}:/input" -v "${PWD}/output:/output" \
    google/deepvariant:1.6.1 \
    /opt/deepvariant/bin/run_deepvariant \
    --model_type=WGS \
    --ref=/input/reference.fa \
    --reads=/input/sample.bam \
    --output_vcf=/output/sample.vcf.gz \
    --output_gvcf=/output/sample.g.vcf.gz \
    --num_shards=16

Model Types

Model Data Type Use Case
WGS Illumina WGS Whole genome sequencing
WES Illumina WES Whole exome/targeted
PACBIO PacBio HiFi Long-read HiFi
ONT_R104 ONT R10.4 Oxford Nanopore
HYBRID_PACBIO_ILLUMINA Mixed Hybrid assemblies

Step-by-Step Workflow

For more control, run each step separately:

Step 1: Make Examples

bash
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
    /opt/deepvariant/bin/make_examples \
    --mode calling \
    --ref /data/reference.fa \
    --reads /data/sample.bam \
    --examples /data/examples.tfrecord.gz \
    --gvcf /data/gvcf.tfrecord.gz

Step 2: Call Variants

bash
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
    /opt/deepvariant/bin/call_variants \
    --outfile /data/call_variants.tfrecord.gz \
    --examples /data/examples.tfrecord.gz \
    --checkpoint /opt/models/wgs/model.ckpt

Step 3: Postprocess Variants

bash
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
    /opt/deepvariant/bin/postprocess_variants \
    --ref /data/reference.fa \
    --infile /data/call_variants.tfrecord.gz \
    --outfile /data/output.vcf.gz \
    --gvcf_outfile /data/output.g.vcf.gz \
    --nonvariant_site_tfrecord_path /data/gvcf.tfrecord.gz

GPU Acceleration

bash
docker run --gpus all -v "${PWD}:/data" \
    google/deepvariant:1.6.1-gpu \
    /opt/deepvariant/bin/run_deepvariant \
    --model_type=WGS \
    --ref=/data/reference.fa \
    --reads=/data/sample.bam \
    --output_vcf=/data/output.vcf.gz \
    --num_shards=16

PacBio HiFi Calling

bash
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
    /opt/deepvariant/bin/run_deepvariant \
    --model_type=PACBIO \
    --ref=/data/reference.fa \
    --reads=/data/hifi_aligned.bam \
    --output_vcf=/data/hifi_variants.vcf.gz \
    --num_shards=16

ONT Calling

bash
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
    /opt/deepvariant/bin/run_deepvariant \
    --model_type=ONT_R104 \
    --ref=/data/reference.fa \
    --reads=/data/ont_aligned.bam \
    --output_vcf=/data/ont_variants.vcf.gz \
    --num_shards=16

Exome/Targeted Sequencing

bash
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
    /opt/deepvariant/bin/run_deepvariant \
    --model_type=WES \
    --ref=/data/reference.fa \
    --reads=/data/exome.bam \
    --regions=/data/targets.bed \
    --output_vcf=/data/exome_variants.vcf.gz \
    --num_shards=8

Joint Calling with GLnexus

For multi-sample cohorts, use gVCFs with GLnexus:

bash
# Generate gVCFs for each sample
for bam in *.bam; do
    sample=$(basename $bam .bam)
    docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
        /opt/deepvariant/bin/run_deepvariant \
        --model_type=WGS \
        --ref=/data/reference.fa \
        --reads=/data/$bam \
        --output_vcf=/data/${sample}.vcf.gz \
        --output_gvcf=/data/${sample}.g.vcf.gz \
        --num_shards=16
done

# Joint genotyping with GLnexus
docker run -v "${PWD}:/data" quay.io/mlin/glnexus:v1.4.1 \
    /usr/local/bin/glnexus_cli \
    --config DeepVariantWGS \
    /data/*.g.vcf.gz \
    | bcftools view - -Oz -o cohort.vcf.gz

GLnexus Configurations

Config Use Case
DeepVariantWGS Illumina WGS
DeepVariantWES Illumina exome
DeepVariant_unfiltered Keep all variants

Output Quality Metrics

bash
# Variant statistics
bcftools stats output.vcf.gz > stats.txt

# Filter by quality
bcftools view -i 'QUAL>20 && FMT/GQ>20' output.vcf.gz -Oz -o filtered.vcf.gz

# Ti/Tv ratio (expect ~2.0-2.1 for WGS)
bcftools stats output.vcf.gz | grep TSTV

Benchmarking Against Truth Set

bash
# Using hap.py for GIAB benchmarking
docker run -v "${PWD}:/data" jmcdani20/hap.py:latest \
    /opt/hap.py/bin/hap.py \
    /data/HG002_GRCh38_truth.vcf.gz \
    /data/deepvariant_output.vcf.gz \
    -r /data/reference.fa \
    -o /data/benchmark \
    --threads 16

Complete Workflow Script

bash
#!/bin/bash
set -euo pipefail

BAM=$1
REFERENCE=$2
OUTPUT_PREFIX=$3
MODEL_TYPE=${4:-WGS}
THREADS=${5:-16}

echo "=== DeepVariant: ${MODEL_TYPE} mode ==="

docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
    /opt/deepvariant/bin/run_deepvariant \
    --model_type=${MODEL_TYPE} \
    --ref=/data/${REFERENCE} \
    --reads=/data/${BAM} \
    --output_vcf=/data/${OUTPUT_PREFIX}.vcf.gz \
    --output_gvcf=/data/${OUTPUT_PREFIX}.g.vcf.gz \
    --intermediate_results_dir=/data/${OUTPUT_PREFIX}_tmp \
    --num_shards=${THREADS}

echo "=== Indexing ==="
bcftools index -t ${OUTPUT_PREFIX}.vcf.gz
bcftools index -t ${OUTPUT_PREFIX}.g.vcf.gz

echo "=== Statistics ==="
bcftools stats ${OUTPUT_PREFIX}.vcf.gz > ${OUTPUT_PREFIX}_stats.txt

echo "=== Complete ==="
echo "VCF: ${OUTPUT_PREFIX}.vcf.gz"
echo "gVCF: ${OUTPUT_PREFIX}.g.vcf.gz"

Comparison with Other Callers

Caller Speed Accuracy Best For
DeepVariant Moderate Highest Production, benchmarking
GATK HaplotypeCaller Moderate High GATK ecosystem
bcftools Fast Good Quick analysis
Clair3 Fast High Long reads

Resource Requirements

Data Type Memory CPU Time (30x WGS)
WGS 64 GB ~4-6 hours
WES 32 GB ~30 min
With GPU 32 GB ~1-2 hours (WGS)

Related Skills

  • variant-calling/gatk-variant-calling - GATK alternative
  • variant-calling/variant-calling - bcftools calling
  • long-read-sequencing/clair3-variants - Long-read alternative
  • variant-calling/filtering-best-practices - Post-calling filtering

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results