Agent skill
bio-causal-genomics-colocalization-analysis
Test whether two traits share a causal variant at a genomic locus using Bayesian colocalization with coloc. Computes posterior probabilities for shared vs distinct causal variants between GWAS and eQTL signals. Use when determining if a GWAS signal and an eQTL share the same causal variant.
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-causal-genomics-colocalization-analysis
SKILL.md
Version Compatibility
Reference examples tested with: ggplot2 3.5+
Before using code patterns, verify installed versions match. If versions differ:
- R:
packageVersion('<pkg>')then?function_nameto verify parameters
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Colocalization Analysis
"Test whether my GWAS signal and eQTL share the same causal variant" → Compute Bayesian posterior probabilities for five colocalization hypotheses (no association, trait-1-only, trait-2-only, distinct causal variants, shared causal variant) to distinguish true causal overlap from LD-driven coincidence.
- R:
coloc::coloc.abf()for approximate Bayes factor colocalization
Overview
Colocalization tests whether two association signals at the same locus are driven by the same causal variant. This distinguishes shared causality from coincidental overlap due to LD.
Five hypotheses tested by coloc:
- H0: No association with either trait
- H1: Association with trait 1 only
- H2: Association with trait 2 only
- H3: Both associated, different causal variants
- H4: Both associated, shared causal variant
coloc.abf Analysis
Goal: Test whether two traits share a causal variant at a GWAS locus using Bayesian colocalization.
Approach: Format summary statistics for each trait as named lists, run coloc.abf to compute posterior probabilities for five hypotheses (H0-H4), and interpret PP.H4 as evidence for a shared causal variant.
library(coloc)
# --- Input format: named list with GWAS summary stats ---
# Required fields: beta, varbeta, snp, position, type, N
# type = 'quant' (continuous) or 'cc' (case-control)
gwas_data <- list(
beta = gwas_df$BETA,
varbeta = gwas_df$SE^2,
snp = gwas_df$SNP,
position = gwas_df$POS,
type = 'cc', # Case-control study
s = 0.3, # Proportion of cases (required for cc)
N = 50000 # Total sample size
)
eqtl_data <- list(
beta = eqtl_df$BETA,
varbeta = eqtl_df$SE^2,
snp = eqtl_df$SNP,
position = eqtl_df$POS,
type = 'quant', # Quantitative trait (expression)
N = 500, # eQTL sample size
sdY = 1 # SD of trait (1 if already normalized)
)
# --- Run colocalization ---
result <- coloc.abf(dataset1 = gwas_data, dataset2 = eqtl_data)
# Posterior probabilities
# PP.H4 > 0.8: Strong evidence for colocalization (shared variant)
# PP.H3 > 0.8: Distinct causal variants at the locus
# PP.H4 between 0.5-0.8: Suggestive but inconclusive
print(result$summary)
Prior Sensitivity
# Default priors: p1 = 1e-4, p2 = 1e-4, p12 = 1e-5
# p1: Prior probability a SNP is associated with trait 1
# p2: Prior probability a SNP is associated with trait 2
# p12: Prior probability a SNP is associated with both traits
#
# Ratio p12/p1 represents prior belief in colocalization
# Default: p12/p1 = 0.1 (10% of trait 1 SNPs also affect trait 2)
result_sensitive <- coloc.abf(
dataset1 = gwas_data,
dataset2 = eqtl_data,
p1 = 1e-4,
p2 = 1e-4,
p12 = 5e-6 # More conservative prior for shared association
)
# Sensitivity analysis across prior values
sensitivity(result, 'H4 > 0.8')
Using P-values (No Beta/SE)
# When only p-values are available, use MAF to approximate
gwas_pval <- list(
pvalues = gwas_df$P,
MAF = gwas_df$MAF,
snp = gwas_df$SNP,
position = gwas_df$POS,
type = 'cc',
s = 0.3,
N = 50000
)
result <- coloc.abf(dataset1 = gwas_pval, dataset2 = eqtl_data)
SuSiE-Coloc (Multiple Causal Variants)
Goal: Test colocalization at loci with multiple independent causal signals.
Approach: Run SuSiE fine-mapping on each dataset to identify credible sets, then test colocalization between all pairs of credible sets using coloc.susie.
library(coloc)
library(susieR)
# coloc.abf assumes a single causal variant per locus
# SuSiE-coloc handles multiple causal variants
# LD matrix required (correlation matrix from reference panel)
ld_matrix <- as.matrix(read.table('ld_matrix.txt'))
# Run SuSiE on each dataset
susie_gwas <- runsusie(
list(beta = gwas_df$BETA, varbeta = gwas_df$SE^2,
snp = gwas_df$SNP, position = gwas_df$POS,
type = 'cc', s = 0.3, N = 50000, LD = ld_matrix),
L = 10 # Max number of causal variants to search for
)
susie_eqtl <- runsusie(
list(beta = eqtl_df$BETA, varbeta = eqtl_df$SE^2,
snp = eqtl_df$SNP, position = eqtl_df$POS,
type = 'quant', N = 500, sdY = 1, LD = ld_matrix),
L = 10
)
# Coloc using SuSiE credible sets
result_susie <- coloc.susie(susie_gwas, susie_eqtl)
print(result_susie$summary)
# Each row tests colocalization between a pair of credible sets
# hit1, hit2: Credible set indices from dataset 1 and 2
HyPrColoc (Multi-Trait)
Goal: Test colocalization across three or more traits simultaneously to identify shared causal variant clusters.
Approach: Provide beta and SE matrices (SNPs x traits) to hyprcoloc, which clusters traits sharing a causal variant using a branch-and-bound algorithm.
# install.packages('remotes')
# remotes::install_github('jrs95/hyprcoloc')
library(hyprcoloc)
# Test colocalization across multiple traits simultaneously
# Input: matrices of betas and SEs (rows = SNPs, columns = traits)
betas <- cbind(gwas_df$BETA, eqtl1_df$BETA, eqtl2_df$BETA)
ses <- cbind(gwas_df$SE, eqtl1_df$SE, eqtl2_df$SE)
colnames(betas) <- colnames(ses) <- c('GWAS', 'eQTL_gene1', 'eQTL_gene2')
rownames(betas) <- rownames(ses) <- gwas_df$SNP
result_hypr <- hyprcoloc(
effect.est = betas,
effect.se = ses,
trait.names = colnames(betas),
snp.id = rownames(betas)
)
# Output: clusters of traits sharing a causal variant
print(result_hypr$results)
Input Preparation
# --- Extract a locus (1 Mb window around lead SNP) ---
extract_locus <- function(sumstats, lead_snp_pos, chr, window = 500000) {
locus <- sumstats[sumstats$CHR == chr &
sumstats$POS >= (lead_snp_pos - window) &
sumstats$POS <= (lead_snp_pos + window), ]
locus[order(locus$POS), ]
}
# --- Generate LD matrix from plink ---
# plink --bfile ref_panel --chr 6 --from-bp 30000000 --to-bp 31000000 \
# --r square --out ld_matrix
# Read into R:
ld <- as.matrix(read.table('ld_matrix.ld'))
Visualization
library(ggplot2)
plot_coloc_locus <- function(gwas_df, eqtl_df, result) {
pp4 <- round(result$summary['PP.H4.abf'], 3)
p1 <- ggplot(gwas_df, aes(x = POS / 1e6, y = -log10(P))) +
geom_point(alpha = 0.6) +
labs(x = 'Position (Mb)', y = '-log10(P)', title = paste('GWAS | PP.H4 =', pp4)) +
theme_minimal()
p2 <- ggplot(eqtl_df, aes(x = POS / 1e6, y = -log10(P))) +
geom_point(alpha = 0.6, color = 'steelblue') +
labs(x = 'Position (Mb)', y = '-log10(P)', title = 'eQTL') +
theme_minimal()
library(patchwork)
p1 / p2
}
LocusCompare Plot
# LocusCompare: scatter of -log10(P) for GWAS vs eQTL at shared SNPs
plot_locuscompare <- function(gwas_df, eqtl_df) {
merged <- merge(
gwas_df[, c('SNP', 'P')],
eqtl_df[, c('SNP', 'P')],
by = 'SNP', suffixes = c('.gwas', '.eqtl')
)
ggplot(merged, aes(x = -log10(P.gwas), y = -log10(P.eqtl))) +
geom_point(alpha = 0.5) +
geom_smooth(method = 'lm', se = FALSE, linetype = 'dashed', color = 'grey50') +
labs(x = '-log10(P) GWAS', y = '-log10(P) eQTL', title = 'LocusCompare') +
theme_minimal()
}
Decision Framework
PP.H4 > 0.8: Strong colocalization -- traits share a causal variant
PP.H3 > 0.8: Distinct causal variants -- LD-driven overlap, not shared causality
PP.H4 0.5-0.8: Suggestive -- increase sample size, try SuSiE-coloc
PP.H0/H1/H2 dominant: Insufficient signal at this locus
Common pitfalls:
- Small eQTL sample sizes reduce power (N < 200 is problematic)
- LD can inflate PP.H3 when two nearby causal variants exist -- use SuSiE-coloc
- Always check that both traits have significant signals at the locus before running coloc
Related Skills
- mendelian-randomization - Test causal effects using genetic instruments
- fine-mapping - Identify causal variants and credible sets
- population-genetics/linkage-disequilibrium - LD reference panels for SuSiE-coloc
- differential-expression/deseq2-basics - Generate eQTL data for colocalization
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?