Agent skill
bio-phasing-imputation-reference-panels
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-phasing-imputation-reference-panels
SKILL.md
name: bio-phasing-imputation-reference-panels description: Download, prepare, and manage reference panels for phasing and imputation. Covers 1000 Genomes, HRC, and TOPMed panels. Use when setting up imputation infrastructure or selecting appropriate reference panels for target populations. tool_type: cli primary_tool: bcftools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Reference Panels
1000 Genomes Phase 3 (GRCh38)
# Download from IGSR
BASE_URL="http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased"
for chr in {1..22}; do
wget ${BASE_URL}/CCDG_14151_B01_GRM_WGS_2020-08-05_chr${chr}.filtered.shapeit2-duohmm-phased.vcf.gz
wget ${BASE_URL}/CCDG_14151_B01_GRM_WGS_2020-08-05_chr${chr}.filtered.shapeit2-duohmm-phased.vcf.gz.tbi
done
Subset by Population
# Download sample info
wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/1000G_2504_high_coverage.sequence.index
# Create population sample lists
grep "EUR" samples.txt | cut -f1 > european_samples.txt
grep "AFR" samples.txt | cut -f1 > african_samples.txt
grep "EAS" samples.txt | cut -f1 > east_asian_samples.txt
# Subset reference to specific population
bcftools view -S european_samples.txt \
1000GP.chr22.vcf.gz \
-Oz -o 1000GP_EUR.chr22.vcf.gz
Convert to Beagle Format
# Beagle uses VCF directly, but ensure proper format
bcftools view -m2 -M2 -v snps reference.vcf.gz | \
bcftools annotate --set-id '%CHROM:%POS:%REF:%ALT' | \
bgzip > reference_beagle.vcf.gz
bcftools index reference_beagle.vcf.gz
Convert to IMPUTE5 Format
# IMPUTE5 uses its own format
imp5Converter \
--h reference.vcf.gz \
--r chr22 \
--o reference.chr22.imp5
HRC Reference Panel
# HRC requires registration at EGA
# After access granted:
# Download from EGA using pyega3
pip install pyega3
pyega3 -cf credentials.json fetch EGAD00001002729
# HRC contains 32,470 samples (mostly European)
TOPMed Reference Panel
# TOPMed available through imputation servers
# Or download from dbGaP with appropriate access
# Use via Michigan Imputation Server:
# 1. Upload study VCF
# 2. Select "TOPMed r2" as reference
# 3. Download imputed results
Genetic Maps
# Beagle format (GRCh38) - from Browning lab
wget https://faculty.washington.edu/browning/beagle/genetic_maps/plink.GRCh38.map.zip
unzip plink.GRCh38.map.zip -d genetic_maps/
# SHAPEIT5 format (recommended for SHAPEIT5)
wget https://github.com/odelaneau/shapeit5/raw/main/maps/genetic_maps.b38.tar.gz
tar xzf genetic_maps.b38.tar.gz
Check Reference Panel
# Basic stats
bcftools stats reference.vcf.gz | head -50
# Sample count
bcftools query -l reference.vcf.gz | wc -l
# Variant count
bcftools view -H reference.vcf.gz | wc -l
# Check chromosomes
bcftools index -s reference.vcf.gz
Lift Over Reference Panel
# GRCh37 to GRCh38
# Using Picard
java -jar picard.jar LiftoverVcf \
I=reference_hg19.vcf.gz \
O=reference_hg38.vcf.gz \
CHAIN=hg19ToHg38.over.chain.gz \
REJECT=rejected.vcf \
R=hg38.fa
# Or using CrossMap
CrossMap.py vcf hg19ToHg38.chain reference_hg19.vcf hg38.fa reference_hg38.vcf
Align Study to Reference
# Check strand concordance
bcftools +fixref study.vcf.gz -Oz -o study_fixed.vcf.gz -- \
-f reference.fa \
-i reference_panel.vcf.gz \
-m flip
# Statistics on fixes
bcftools +fixref study.vcf.gz -- -f reference.fa -m stats
Filter Reference Panel
# Remove singletons (appear in only 1 sample)
bcftools view -c 2 reference.vcf.gz -Oz -o reference_no_singletons.vcf.gz
# Filter by MAF
bcftools view -q 0.001:minor reference.vcf.gz -Oz -o reference_maf001.vcf.gz
# Remove indels (SNPs only)
bcftools view -v snps reference.vcf.gz -Oz -o reference_snps.vcf.gz
Merge Custom Panel with 1000G
# If you have additional reference samples
bcftools merge \
1000GP.chr22.vcf.gz \
custom_reference.chr22.vcf.gz \
-Oz -o combined_reference.chr22.vcf.gz
# Ensure matching variants first
bcftools isec -n=2 \
1000GP.chr22.vcf.gz \
custom_reference.chr22.vcf.gz \
-p isec_output
Reference Panel Comparison
| Panel | Samples | Variants | Populations |
|---|---|---|---|
| 1000G Phase 3 | 2,504 | 88M | 26 global |
| HRC r1.1 | 32,470 | 40M | European-heavy |
| TOPMed r2 | 97,256 | 308M | 60% European, diverse |
| UK10K | 3,781 | 42M | British |
Related Skills
- phasing-imputation/haplotype-phasing - Use panels for phasing
- phasing-imputation/genotype-imputation - Use panels for imputation
- variant-calling/vcf-manipulation - VCF file operations
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?