Agent skills
bio-phasing-imputation-genotyp...

Agent skill

bio-phasing-imputation-genotype-imputation

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-phasing-imputation-genotype-imputation

SKILL.md

name: bio-phasing-imputation-genotype-imputation description: Impute missing genotypes using reference panels with Beagle or Minimac4. Use when increasing variant density for GWAS, harmonizing data across genotyping platforms, or inferring variants not directly typed in array data. tool_type: cli primary_tool: beagle measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

read_file
run_shell_command

Genotype Imputation

Beagle Imputation

bash

# Basic imputation
java -jar beagle.jar \
    gt=study.vcf.gz \
    ref=reference_panel.vcf.gz \
    map=genetic_map.txt \
    out=imputed

# Output: imputed.vcf.gz with imputed genotypes

Beagle with Options

bash

java -Xmx32g -jar beagle.jar \
    gt=study.vcf.gz \
    ref=reference_panel.vcf.gz \
    map=genetic_map.txt \
    out=imputed \
    nthreads=8 \
    gp=true \              # Output genotype probabilities
    ap=true \              # Output allele probabilities
    impute=true \          # Perform imputation (default)
    ne=20000               # Effective population size

Impute Per Chromosome

bash

for chr in {1..22}; do
    java -Xmx32g -jar beagle.jar \
        gt=study.chr${chr}.vcf.gz \
        ref=ref.chr${chr}.vcf.gz \
        map=genetic_maps/plink.chr${chr}.GRCh38.map \
        out=imputed.chr${chr} \
        gp=true \
        nthreads=8
done

# Concatenate
bcftools concat imputed.chr*.vcf.gz -Oz -o imputed.all.vcf.gz
bcftools index imputed.all.vcf.gz

IMPUTE5 (Alternative)

bash

# Newer IMPUTE software
impute5 \
    --h reference.bcf \
    --m genetic_map.txt \
    --g study.vcf.gz \
    --r chr22 \
    --o imputed.chr22.vcf.gz \
    --threads 8

Minimac4 (Michigan Imputation Server)

bash

# Often used via web server, but can run locally
minimac4 \
    --refHaps reference.m3vcf.gz \
    --haps study.vcf.gz \
    --prefix imputed \
    --format GT,DS,GP \
    --cpus 8

Input Preparation

bash

# 1. Align to reference (strand, allele order)
bcftools +fixref study.vcf.gz -Oz -o fixed.vcf.gz -- \
    -f reference.fa -m flip

# 2. Filter to sites in reference
bcftools isec -n=2 -w1 fixed.vcf.gz reference_sites.vcf.gz \
    -Oz -o study_overlap.vcf.gz

# 3. Phase first (if not already phased)
java -jar beagle.jar gt=study_overlap.vcf.gz out=phased

# 4. Then impute
java -jar beagle.jar gt=phased.vcf.gz ref=reference.vcf.gz out=imputed

Extract Imputation Quality

bash

# INFO/DR2 or INFO/R2 contains imputation quality
bcftools query -f '%CHROM\t%POS\t%ID\t%INFO/DR2\n' imputed.vcf.gz > info_scores.txt

# Filter by quality
bcftools view -i 'INFO/DR2 > 0.3' imputed.vcf.gz -Oz -o imputed_filtered.vcf.gz

Output Formats

Format	Field	Description
GT	0\|0, 0\|1, 1\|1	Hard-called genotype
DS	0.0-2.0	Dosage (expected ALT allele count)
GP	0.0-1.0,0.0-1.0,0.0-1.0	Genotype probabilities (AA,AB,BB)
DR2/R2	0.0-1.0	Imputation quality score

Using Dosages for GWAS

python

import pandas as pd

# Extract dosages
# bcftools query -f '%CHROM\t%POS\t%ID[\t%DS]\n' imputed.vcf.gz > dosages.txt

dosages = pd.read_csv('dosages.txt', sep='\t')

# Dosage-based association (treats uncertainty)
# Use --dosage in PLINK2 or similar

bash

# PLINK2 with dosages
plink2 --vcf imputed.vcf.gz dosage=DS \
    --glm \
    --pheno phenotypes.txt \
    --out gwas_results

Quality Thresholds

Analysis	Minimum INFO/R2
GWAS discovery	0.3
GWAS fine-mapping	0.8
Meta-analysis	0.5
Polygenic scores	0.9

Key Parameters

Parameter	Beagle	Description
gt	input VCF	Study genotypes
ref	reference VCF	Reference panel
map	genetic map	Recombination map
gp	true/false	Output genotype probs
ne	20000	Effective population size
nthreads	N	CPU threads
window	40	Window size (cM)

Imputation Servers

For large-scale imputation, consider web-based servers:

Michigan Imputation Server: imputationserver.sph.umich.edu
TOPMed Imputation Server: imputation.biodatacatalyst.nhlbi.nih.gov
Sanger Imputation Server: imputation.sanger.ac.uk

bash

# Prepare input for server
# Most require VCF.GZ per chromosome
for chr in {1..22}; do
    bcftools view -r chr${chr} study.vcf.gz -Oz -o study.chr${chr}.vcf.gz
done

Related Skills

phasing-imputation/haplotype-phasing - Pre-phasing step
phasing-imputation/reference-panels - Reference panel setup
phasing-imputation/imputation-qc - Quality control
population-genetics/association-testing - GWAS with imputed data

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/bio-phasing-imputation-genotype-imputation
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Genotype Imputation

Beagle Imputation

Beagle with Options

Impute Per Chromosome

IMPUTE5 (Alternative)

Minimac4 (Michigan Imputation Server)

Input Preparation

Extract Imputation Quality

Output Formats

Using Dosages for GWAS

Quality Thresholds

Key Parameters

Imputation Servers

Related Skills

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations