Agent skill

bio-phasing-imputation-reference-panels

Stars 2,009
Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-phasing-imputation-reference-panels

SKILL.md


name: bio-phasing-imputation-reference-panels description: Download, prepare, and manage reference panels for phasing and imputation. Covers 1000 Genomes, HRC, and TOPMed panels. Use when setting up imputation infrastructure or selecting appropriate reference panels for target populations. tool_type: cli primary_tool: bcftools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

Reference Panels

1000 Genomes Phase 3 (GRCh38)

bash
# Download from IGSR
BASE_URL="http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased"

for chr in {1..22}; do
    wget ${BASE_URL}/CCDG_14151_B01_GRM_WGS_2020-08-05_chr${chr}.filtered.shapeit2-duohmm-phased.vcf.gz
    wget ${BASE_URL}/CCDG_14151_B01_GRM_WGS_2020-08-05_chr${chr}.filtered.shapeit2-duohmm-phased.vcf.gz.tbi
done

Subset by Population

bash
# Download sample info
wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/1000G_2504_high_coverage.sequence.index

# Create population sample lists
grep "EUR" samples.txt | cut -f1 > european_samples.txt
grep "AFR" samples.txt | cut -f1 > african_samples.txt
grep "EAS" samples.txt | cut -f1 > east_asian_samples.txt

# Subset reference to specific population
bcftools view -S european_samples.txt \
    1000GP.chr22.vcf.gz \
    -Oz -o 1000GP_EUR.chr22.vcf.gz

Convert to Beagle Format

bash
# Beagle uses VCF directly, but ensure proper format
bcftools view -m2 -M2 -v snps reference.vcf.gz | \
    bcftools annotate --set-id '%CHROM:%POS:%REF:%ALT' | \
    bgzip > reference_beagle.vcf.gz
bcftools index reference_beagle.vcf.gz

Convert to IMPUTE5 Format

bash
# IMPUTE5 uses its own format
imp5Converter \
    --h reference.vcf.gz \
    --r chr22 \
    --o reference.chr22.imp5

HRC Reference Panel

bash
# HRC requires registration at EGA
# After access granted:

# Download from EGA using pyega3
pip install pyega3
pyega3 -cf credentials.json fetch EGAD00001002729

# HRC contains 32,470 samples (mostly European)

TOPMed Reference Panel

bash
# TOPMed available through imputation servers
# Or download from dbGaP with appropriate access

# Use via Michigan Imputation Server:
# 1. Upload study VCF
# 2. Select "TOPMed r2" as reference
# 3. Download imputed results

Genetic Maps

bash
# Beagle format (GRCh38) - from Browning lab
wget https://faculty.washington.edu/browning/beagle/genetic_maps/plink.GRCh38.map.zip
unzip plink.GRCh38.map.zip -d genetic_maps/

# SHAPEIT5 format (recommended for SHAPEIT5)
wget https://github.com/odelaneau/shapeit5/raw/main/maps/genetic_maps.b38.tar.gz
tar xzf genetic_maps.b38.tar.gz

Check Reference Panel

bash
# Basic stats
bcftools stats reference.vcf.gz | head -50

# Sample count
bcftools query -l reference.vcf.gz | wc -l

# Variant count
bcftools view -H reference.vcf.gz | wc -l

# Check chromosomes
bcftools index -s reference.vcf.gz

Lift Over Reference Panel

bash
# GRCh37 to GRCh38
# Using Picard
java -jar picard.jar LiftoverVcf \
    I=reference_hg19.vcf.gz \
    O=reference_hg38.vcf.gz \
    CHAIN=hg19ToHg38.over.chain.gz \
    REJECT=rejected.vcf \
    R=hg38.fa

# Or using CrossMap
CrossMap.py vcf hg19ToHg38.chain reference_hg19.vcf hg38.fa reference_hg38.vcf

Align Study to Reference

bash
# Check strand concordance
bcftools +fixref study.vcf.gz -Oz -o study_fixed.vcf.gz -- \
    -f reference.fa \
    -i reference_panel.vcf.gz \
    -m flip

# Statistics on fixes
bcftools +fixref study.vcf.gz -- -f reference.fa -m stats

Filter Reference Panel

bash
# Remove singletons (appear in only 1 sample)
bcftools view -c 2 reference.vcf.gz -Oz -o reference_no_singletons.vcf.gz

# Filter by MAF
bcftools view -q 0.001:minor reference.vcf.gz -Oz -o reference_maf001.vcf.gz

# Remove indels (SNPs only)
bcftools view -v snps reference.vcf.gz -Oz -o reference_snps.vcf.gz

Merge Custom Panel with 1000G

bash
# If you have additional reference samples
bcftools merge \
    1000GP.chr22.vcf.gz \
    custom_reference.chr22.vcf.gz \
    -Oz -o combined_reference.chr22.vcf.gz

# Ensure matching variants first
bcftools isec -n=2 \
    1000GP.chr22.vcf.gz \
    custom_reference.chr22.vcf.gz \
    -p isec_output

Reference Panel Comparison

Panel Samples Variants Populations
1000G Phase 3 2,504 88M 26 global
HRC r1.1 32,470 40M European-heavy
TOPMed r2 97,256 308M 60% European, diverse
UK10K 3,781 42M British

Related Skills

  • phasing-imputation/haplotype-phasing - Use panels for phasing
  • phasing-imputation/genotype-imputation - Use panels for imputation
  • variant-calling/vcf-manipulation - VCF file operations

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results