Agent skill
bio-workflows-methylation-pipeline
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-workflows-methylation-pipeline
SKILL.md
name: bio-workflows-methylation-pipeline description: End-to-end bisulfite sequencing workflow from FASTQ to differentially methylated regions. Covers Bismark alignment, methylation calling, and DMR detection with methylKit. Use when analyzing bisulfite sequencing data. tool_type: mixed primary_tool: Bismark workflow: true depends_on:
- read-qc/fastp-workflow
- methylation-analysis/bismark-alignment
- methylation-analysis/methylation-calling
- methylation-analysis/methylkit-analysis
- methylation-analysis/dmr-detection qc_checkpoints:
- after_qc: "Q30 >80%, adapter content removed"
- after_alignment: "Mapping efficiency >50%, bisulfite conversion >99%"
- after_calling: "Coverage distribution reasonable, no biased positions" measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Methylation Pipeline
Complete workflow from bisulfite sequencing FASTQ to differentially methylated regions.
Workflow Overview
FASTQ files
|
v
[1. QC & Trimming] -----> fastp/Trim Galore
|
v
[2. Alignment] ---------> Bismark
|
v
[3. Deduplication] -----> deduplicate_bismark
|
v
[4. Methylation Calling] -> bismark_methylation_extractor
|
v
[5. Analysis] -----------> methylKit (R)
|
v
[6. DMR Detection] ------> methylKit/DSS
|
v
Differentially methylated regions
Primary Path: Bismark + methylKit
Step 1: Quality Control
# Trim Galore recommended for bisulfite data (handles adapter bias)
trim_galore --paired --fastqc \
-o trimmed/ \
sample_R1.fastq.gz sample_R2.fastq.gz
# Or fastp with conservative settings
fastp -i sample_R1.fastq.gz -I sample_R2.fastq.gz \
-o trimmed/sample_R1.fq.gz -O trimmed/sample_R2.fq.gz \
--detect_adapter_for_pe \
--qualified_quality_phred 20 \
--length_required 35 \
--html qc/sample_fastp.html
Step 2: Bismark Alignment
# Prepare genome (once)
bismark_genome_preparation --bowtie2 genome/
# Align
bismark --genome genome/ \
-1 trimmed/sample_R1_val_1.fq.gz \
-2 trimmed/sample_R2_val_2.fq.gz \
-o aligned/ \
--parallel 4 \
--temp_dir tmp/
# Output: sample_R1_val_1_bismark_bt2_pe.bam
QC Checkpoint: Check Bismark report
- Mapping efficiency >50% (BS-seq has lower rates)
- Bisulfite conversion rate >99%
Step 3: Deduplication
deduplicate_bismark \
--bam \
-p \
-o deduplicated/ \
aligned/sample_R1_val_1_bismark_bt2_pe.bam
Step 4: Methylation Calling
bismark_methylation_extractor \
--paired-end \
--comprehensive \
--bedGraph \
--cytosine_report \
--genome_folder genome/ \
-o methylation/ \
deduplicated/sample_R1_val_1_bismark_bt2_pe.deduplicated.bam
# Generate summary report
bismark2report
bismark2summary
Step 5: Analysis with methylKit
library(methylKit)
# Read methylation calls
files <- list(
'methylation/control_1.CpG_report.txt',
'methylation/control_2.CpG_report.txt',
'methylation/treated_1.CpG_report.txt',
'methylation/treated_2.CpG_report.txt'
)
sample_ids <- c('control_1', 'control_2', 'treated_1', 'treated_2')
treatment <- c(0, 0, 1, 1)
# Read cytosine reports
meth_obj <- methRead(
location = as.list(files),
sample.id = as.list(sample_ids),
assembly = 'hg38',
treatment = treatment,
context = 'CpG',
pipeline = 'bismarkCytosineReport'
)
# Filter by coverage
meth_filtered <- filterByCoverage(meth_obj, lo.count = 10, hi.perc = 99.9)
# Normalize coverage
meth_norm <- normalizeCoverage(meth_filtered)
# Merge samples (keep sites covered in all)
meth_merged <- unite(meth_norm, destrand = TRUE)
# Sample statistics
getMethylationStats(meth_obj[[1]], plot = TRUE)
getCoverageStats(meth_obj[[1]], plot = TRUE)
Step 6: DMR Detection
# Calculate differential methylation (per CpG)
diff_meth <- calculateDiffMeth(meth_merged)
# Get significant DMCs
dmc <- getMethylDiff(diff_meth, difference = 25, qvalue = 0.01)
# Tile into regions (DMRs)
tiles <- tileMethylCounts(meth_merged, win.size = 1000, step.size = 1000)
diff_tiles <- calculateDiffMeth(tiles)
dmr <- getMethylDiff(diff_tiles, difference = 25, qvalue = 0.01)
# Export
write.csv(as.data.frame(dmc), 'dmc_results.csv')
write.csv(as.data.frame(dmr), 'dmr_results.csv')
# Annotate with genomic features
library(genomation)
gene_obj <- readTranscriptFeatures('genes.bed')
annotateWithGeneParts(as(dmr, 'GRanges'), gene_obj)
Parameter Recommendations
| Step | Parameter | Value |
|---|---|---|
| Trim Galore | default | Recommended for BS-seq |
| Bismark | --parallel | 4 (per sample parallelization) |
| methylKit | lo.count | 10 (minimum coverage) |
| methylKit | difference | 25 (% methylation difference) |
| methylKit | qvalue | 0.01 |
| DMR tiles | win.size | 500-1000 bp |
Troubleshooting
| Issue | Likely Cause | Solution |
|---|---|---|
| Low mapping rate | Normal for BS-seq | Expect 40-70% |
| Low conversion | Failed bisulfite treatment | Check spike-in controls |
| Few DMRs | Low coverage, small differences | Increase sequencing, relax thresholds |
| Biased positions | M-bias | Trim 10bp from read ends |
Complete Pipeline Script
#!/bin/bash
set -e
THREADS=4
GENOME="genome/"
SAMPLES="control_1 control_2 treated_1 treated_2"
OUTDIR="methylation_results"
mkdir -p ${OUTDIR}/{trimmed,aligned,deduplicated,methylation,qc}
# Step 1: QC
for sample in $SAMPLES; do
trim_galore --paired --fastqc -o ${OUTDIR}/trimmed/ \
${sample}_R1.fastq.gz ${sample}_R2.fastq.gz
done
# Step 2: Alignment
for sample in $SAMPLES; do
bismark --genome ${GENOME} \
-1 ${OUTDIR}/trimmed/${sample}_R1_val_1.fq.gz \
-2 ${OUTDIR}/trimmed/${sample}_R2_val_2.fq.gz \
-o ${OUTDIR}/aligned/ \
--parallel ${THREADS} --temp_dir tmp/
done
# Step 3: Deduplication
for sample in $SAMPLES; do
deduplicate_bismark --bam -p \
-o ${OUTDIR}/deduplicated/ \
${OUTDIR}/aligned/${sample}_R1_val_1_bismark_bt2_pe.bam
done
# Step 4: Methylation calling
for sample in $SAMPLES; do
bismark_methylation_extractor --paired-end --comprehensive \
--bedGraph --cytosine_report \
--genome_folder ${GENOME} \
-o ${OUTDIR}/methylation/ \
${OUTDIR}/deduplicated/${sample}_R1_val_1_bismark_bt2_pe.deduplicated.bam
done
bismark2report
echo "Pipeline complete. Run R script for DMR analysis."
Related Skills
- methylation-analysis/bismark-alignment - Bismark parameters
- methylation-analysis/methylation-calling - Calling details
- methylation-analysis/methylkit-analysis - methylKit functions
- methylation-analysis/dmr-detection - DMR algorithms
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?