Agent skill
bio-variant-calling-deepvariant
Deep learning-based variant calling with Google DeepVariant. Provides high accuracy for germline SNPs and indels from Illumina, PacBio, and ONT data. Use when calling variants with DeepVariant deep learning caller.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/deepvariant
SKILL.md
DeepVariant Variant Calling
Installation
Docker (Recommended)
docker pull google/deepvariant:1.6.1
# Or with GPU support
docker pull google/deepvariant:1.6.1-gpu
Singularity
singularity pull docker://google/deepvariant:1.6.1
Basic Usage
One-Step Run (run_deepvariant)
docker run -v "${PWD}:/input" -v "${PWD}/output:/output" \
google/deepvariant:1.6.1 \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WGS \
--ref=/input/reference.fa \
--reads=/input/sample.bam \
--output_vcf=/output/sample.vcf.gz \
--output_gvcf=/output/sample.g.vcf.gz \
--num_shards=16
Model Types
| Model | Data Type | Use Case |
|---|---|---|
WGS |
Illumina WGS | Whole genome sequencing |
WES |
Illumina WES | Whole exome/targeted |
PACBIO |
PacBio HiFi | Long-read HiFi |
ONT_R104 |
ONT R10.4 | Oxford Nanopore |
HYBRID_PACBIO_ILLUMINA |
Mixed | Hybrid assemblies |
Step-by-Step Workflow
For more control, run each step separately:
Step 1: Make Examples
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
/opt/deepvariant/bin/make_examples \
--mode calling \
--ref /data/reference.fa \
--reads /data/sample.bam \
--examples /data/examples.tfrecord.gz \
--gvcf /data/gvcf.tfrecord.gz
Step 2: Call Variants
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
/opt/deepvariant/bin/call_variants \
--outfile /data/call_variants.tfrecord.gz \
--examples /data/examples.tfrecord.gz \
--checkpoint /opt/models/wgs/model.ckpt
Step 3: Postprocess Variants
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
/opt/deepvariant/bin/postprocess_variants \
--ref /data/reference.fa \
--infile /data/call_variants.tfrecord.gz \
--outfile /data/output.vcf.gz \
--gvcf_outfile /data/output.g.vcf.gz \
--nonvariant_site_tfrecord_path /data/gvcf.tfrecord.gz
GPU Acceleration
docker run --gpus all -v "${PWD}:/data" \
google/deepvariant:1.6.1-gpu \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WGS \
--ref=/data/reference.fa \
--reads=/data/sample.bam \
--output_vcf=/data/output.vcf.gz \
--num_shards=16
PacBio HiFi Calling
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
/opt/deepvariant/bin/run_deepvariant \
--model_type=PACBIO \
--ref=/data/reference.fa \
--reads=/data/hifi_aligned.bam \
--output_vcf=/data/hifi_variants.vcf.gz \
--num_shards=16
ONT Calling
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
/opt/deepvariant/bin/run_deepvariant \
--model_type=ONT_R104 \
--ref=/data/reference.fa \
--reads=/data/ont_aligned.bam \
--output_vcf=/data/ont_variants.vcf.gz \
--num_shards=16
Exome/Targeted Sequencing
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WES \
--ref=/data/reference.fa \
--reads=/data/exome.bam \
--regions=/data/targets.bed \
--output_vcf=/data/exome_variants.vcf.gz \
--num_shards=8
Joint Calling with GLnexus
For multi-sample cohorts, use gVCFs with GLnexus:
# Generate gVCFs for each sample
for bam in *.bam; do
sample=$(basename $bam .bam)
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WGS \
--ref=/data/reference.fa \
--reads=/data/$bam \
--output_vcf=/data/${sample}.vcf.gz \
--output_gvcf=/data/${sample}.g.vcf.gz \
--num_shards=16
done
# Joint genotyping with GLnexus
docker run -v "${PWD}:/data" quay.io/mlin/glnexus:v1.4.1 \
/usr/local/bin/glnexus_cli \
--config DeepVariantWGS \
/data/*.g.vcf.gz \
| bcftools view - -Oz -o cohort.vcf.gz
GLnexus Configurations
| Config | Use Case |
|---|---|
DeepVariantWGS |
Illumina WGS |
DeepVariantWES |
Illumina exome |
DeepVariant_unfiltered |
Keep all variants |
Output Quality Metrics
# Variant statistics
bcftools stats output.vcf.gz > stats.txt
# Filter by quality
bcftools view -i 'QUAL>20 && FMT/GQ>20' output.vcf.gz -Oz -o filtered.vcf.gz
# Ti/Tv ratio (expect ~2.0-2.1 for WGS)
bcftools stats output.vcf.gz | grep TSTV
Benchmarking Against Truth Set
# Using hap.py for GIAB benchmarking
docker run -v "${PWD}:/data" jmcdani20/hap.py:latest \
/opt/hap.py/bin/hap.py \
/data/HG002_GRCh38_truth.vcf.gz \
/data/deepvariant_output.vcf.gz \
-r /data/reference.fa \
-o /data/benchmark \
--threads 16
Complete Workflow Script
#!/bin/bash
set -euo pipefail
BAM=$1
REFERENCE=$2
OUTPUT_PREFIX=$3
MODEL_TYPE=${4:-WGS}
THREADS=${5:-16}
echo "=== DeepVariant: ${MODEL_TYPE} mode ==="
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \
/opt/deepvariant/bin/run_deepvariant \
--model_type=${MODEL_TYPE} \
--ref=/data/${REFERENCE} \
--reads=/data/${BAM} \
--output_vcf=/data/${OUTPUT_PREFIX}.vcf.gz \
--output_gvcf=/data/${OUTPUT_PREFIX}.g.vcf.gz \
--intermediate_results_dir=/data/${OUTPUT_PREFIX}_tmp \
--num_shards=${THREADS}
echo "=== Indexing ==="
bcftools index -t ${OUTPUT_PREFIX}.vcf.gz
bcftools index -t ${OUTPUT_PREFIX}.g.vcf.gz
echo "=== Statistics ==="
bcftools stats ${OUTPUT_PREFIX}.vcf.gz > ${OUTPUT_PREFIX}_stats.txt
echo "=== Complete ==="
echo "VCF: ${OUTPUT_PREFIX}.vcf.gz"
echo "gVCF: ${OUTPUT_PREFIX}.g.vcf.gz"
Comparison with Other Callers
| Caller | Speed | Accuracy | Best For |
|---|---|---|---|
| DeepVariant | Moderate | Highest | Production, benchmarking |
| GATK HaplotypeCaller | Moderate | High | GATK ecosystem |
| bcftools | Fast | Good | Quick analysis |
| Clair3 | Fast | High | Long reads |
Resource Requirements
| Data Type | Memory | CPU Time (30x WGS) |
|---|---|---|
| WGS | 64 GB | ~4-6 hours |
| WES | 32 GB | ~30 min |
| With GPU | 32 GB | ~1-2 hours (WGS) |
Related Skills
- variant-calling/gatk-variant-calling - GATK alternative
- variant-calling/variant-calling - bcftools calling
- long-read-sequencing/clair3-variants - Long-read alternative
- variant-calling/filtering-best-practices - Post-calling filtering
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?