Agent skill

vcf-basics

Stars 2,009
Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/variant-interpretation-acmg/bioSkills/vcf-basics

SKILL.md


name: bio-vcf-basics description: View, query, and understand VCF/BCF variant files using bcftools and cyvcf2. Use when inspecting variants, extracting specific fields, or understanding VCF format structure. tool_type: cli primary_tool: bcftools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

VCF/BCF Basics

View and query variant files using bcftools and cyvcf2.

Format Overview

Format Description Use Case
VCF Text format, human-readable Debugging, small files
VCF.gz Compressed VCF (bgzip) Standard distribution
BCF Binary VCF Fast processing, large files

VCF Format Structure

##fileformat=VCFv4.2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE1
chr1    1000    rs123   A       G       30      PASS    DP=50   GT:DP   0/1:25

Header Lines (##)

  • ##fileformat - VCF version
  • ##INFO - INFO field definitions
  • ##FORMAT - FORMAT field definitions
  • ##FILTER - Filter definitions
  • ##contig - Reference contigs
  • ##reference - Reference genome

Column Header (#CHROM)

Fixed columns: CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT Followed by sample columns

Data Columns

Column Description
CHROM Chromosome
POS 1-based position
ID Variant identifier (e.g., rs number)
REF Reference allele
ALT Alternate allele(s), comma-separated
QUAL Phred-scaled quality score
FILTER PASS or filter name
INFO Semicolon-separated key=value pairs
FORMAT Colon-separated format keys
SAMPLE Colon-separated values matching FORMAT

bcftools view

View VCF

bash
bcftools view input.vcf.gz | head

View Header Only

bash
bcftools view -h input.vcf.gz

View Without Header

bash
bcftools view -H input.vcf.gz | head

View Specific Region

bash
bcftools view input.vcf.gz chr1:1000000-2000000

View Specific Samples

bash
bcftools view -s sample1,sample2 input.vcf.gz

Exclude Samples

bash
bcftools view -s ^sample3 input.vcf.gz

bcftools query

Extract specific fields in custom format.

Basic Query

bash
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\n' input.vcf.gz

Query with INFO Fields

bash
bcftools query -f '%CHROM\t%POS\t%INFO/DP\t%INFO/AF\n' input.vcf.gz

Query with Sample Fields

bash
bcftools query -f '%CHROM\t%POS[\t%GT]\n' input.vcf.gz

Query Specific Samples

bash
bcftools query -f '%CHROM\t%POS[\t%SAMPLE=%GT]\n' -s sample1,sample2 input.vcf.gz

Include Header

bash
bcftools query -H -f '%CHROM\t%POS\t%REF\t%ALT\n' input.vcf.gz

Common Format Specifiers

Specifier Description
%CHROM Chromosome
%POS Position
%ID Variant ID
%REF Reference allele
%ALT Alternate allele
%QUAL Quality score
%FILTER Filter status
%INFO/TAG INFO field value
%TYPE Variant type (snp, indel, etc.)
[%GT] Genotype (per sample)
[%DP] Depth (per sample)
[%SAMPLE] Sample name
\n Newline
\t Tab

Format Conversion

VCF to BCF

bash
bcftools view -Ob -o output.bcf input.vcf.gz

BCF to VCF

bash
bcftools view -Ov -o output.vcf input.bcf

Compress VCF (bgzip)

bash
bgzip input.vcf
# Creates input.vcf.gz

Index VCF/BCF

bash
bcftools index input.vcf.gz
# Creates input.vcf.gz.csi

bcftools index -t input.vcf.gz
# Creates input.vcf.gz.tbi (tabix index)

Output Format Options

Flag Format
-Ov Uncompressed VCF
-Oz Compressed VCF (bgzip)
-Ou Uncompressed BCF
-Ob Compressed BCF

Genotype Encoding

Genotype Meaning
0/0 Homozygous reference
0/1 Heterozygous
1/1 Homozygous alternate
1/2 Heterozygous (two different alts)
./. Missing
0|1 Phased heterozygous

cyvcf2 Python Alternative

Open and Iterate

python
from cyvcf2 import VCF

vcf = VCF('input.vcf.gz')
for variant in vcf:
    print(f'{variant.CHROM}:{variant.POS} {variant.REF}>{variant.ALT[0]}')

Access Variant Properties

python
from cyvcf2 import VCF

for variant in VCF('input.vcf.gz'):
    print(f'Chrom: {variant.CHROM}')
    print(f'Pos: {variant.POS}')
    print(f'ID: {variant.ID}')
    print(f'Ref: {variant.REF}')
    print(f'Alt: {variant.ALT}')  # List
    print(f'Qual: {variant.QUAL}')
    print(f'Filter: {variant.FILTER}')
    print(f'Type: {variant.var_type}')  # snp, indel, etc.
    break

Access INFO Fields

python
from cyvcf2 import VCF

for variant in VCF('input.vcf.gz'):
    dp = variant.INFO.get('DP')
    af = variant.INFO.get('AF')
    print(f'{variant.CHROM}:{variant.POS} DP={dp} AF={af}')

Access Genotypes

python
from cyvcf2 import VCF

vcf = VCF('input.vcf.gz')
samples = vcf.samples  # List of sample names

for variant in vcf:
    gts = variant.gt_types  # 0=HOM_REF, 1=HET, 2=UNKNOWN, 3=HOM_ALT
    for sample, gt in zip(samples, gts):
        gt_str = ['HOM_REF', 'HET', 'UNKNOWN', 'HOM_ALT'][gt]
        print(f'{sample}: {gt_str}')
    break

Access Sample Fields

python
from cyvcf2 import VCF

for variant in VCF('input.vcf.gz'):
    depths = variant.format('DP')  # numpy array
    gqs = variant.format('GQ')     # Genotype quality
    print(f'Depths: {depths}')

Fetch Region

python
from cyvcf2 import VCF

vcf = VCF('input.vcf.gz')
for variant in vcf('chr1:1000000-2000000'):
    print(f'{variant.CHROM}:{variant.POS}')

Get Header Info

python
from cyvcf2 import VCF

vcf = VCF('input.vcf.gz')
print(f'Samples: {vcf.samples}')
print(f'Contigs: {vcf.seqnames}')

# INFO field definitions
for info in vcf.header_iter():
    if info['HeaderType'] == 'INFO':
        print(f'{info["ID"]}: {info["Description"]}')

Write VCF

python
from cyvcf2 import VCF, Writer

vcf = VCF('input.vcf.gz')
writer = Writer('output.vcf', vcf)

for variant in vcf:
    if variant.QUAL > 30:
        writer.write_record(variant)

writer.close()
vcf.close()

Quick Reference

Task bcftools cyvcf2
View VCF bcftools view file.vcf.gz VCF('file.vcf.gz')
View header bcftools view -h file.vcf.gz vcf.header_iter()
Get region bcftools view file.vcf.gz chr1:1-1000 vcf('chr1:1-1000')
Query fields bcftools query -f '%CHROM\t%POS\n' Loop with properties
Count variants bcftools view -H file.vcf.gz | wc -l sum(1 for _ in vcf)
VCF to BCF bcftools view -Ob -o out.bcf in.vcf.gz Use Writer

Common Errors

Error Cause Solution
no BGZF EOF marker Not bgzipped Use bgzip not gzip
index required Missing index for region query Run bcftools index
sample not found Wrong sample name Check with bcftools query -l

Related Skills

  • variant-calling - Generate VCF from alignments
  • filtering-best-practices - Filter variants by quality/criteria
  • vcf-manipulation - Merge, concat, intersect VCFs
  • alignment-files/pileup-generation - Generate pileup for calling

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results