Agent skill
bio-clip-seq-clip-peak-calling
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-clip-seq-clip-peak-calling
SKILL.md
name: bio-clip-seq-clip-peak-calling description: Call protein-RNA binding site peaks from CLIP-seq data using CLIPper, PureCLIP, or Piranha. Use when identifying RBP binding sites from aligned CLIP reads. tool_type: cli primary_tool: CLIPper measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
CLIP-seq Peak Calling
CLIPper (Recommended)
# Basic peak calling
clipper \
-b deduped.bam \
-s hg38 \
-o peaks.bed \
--save-pickle
# With FDR threshold
clipper \
-b deduped.bam \
-s hg38 \
-o peaks.bed \
--FDR 0.05 \
--superlocal
# Specify gene annotations
clipper \
-b deduped.bam \
-s hg38 \
--gene genes.bed \
-o peaks.bed
CLIPper Options
| Option | Description |
|---|---|
| -b | Input BAM file |
| -s | Species (hg38, mm10) |
| -o | Output BED file |
| --FDR | FDR threshold (default 0.05) |
| --superlocal | Use superlocal background |
| --gene | Custom gene annotation BED |
| --save-pickle | Save intermediate data |
PureCLIP (HMM-Based)
PureCLIP uses an HMM to model crosslink sites, incorporating enrichment and truncation signals.
# Installation
conda install -c bioconda pureclip
# Basic peak calling
pureclip \
-i deduped.bam \
-bai deduped.bam.bai \
-g genome.fa \
-o crosslink_sites.bed \
-or binding_regions.bed \
-nt 4
# -nt 4: Number of threads. Adjust based on CPU cores.
# -o: Single-nucleotide crosslink sites
# -or: Broader binding regions
PureCLIP Options
| Option | Description |
|---|---|
| -i | Input BAM file |
| -bai | BAM index file |
| -g | Reference genome FASTA |
| -o | Crosslink sites output |
| -or | Binding regions output |
| -nt | Number of threads |
| -iv | Interval file to restrict analysis |
| -dm | Min distance for merging |
PureCLIP with Input Control
# With SMInput control BAM
pureclip \
-i clip.bam \
-bai clip.bam.bai \
-g genome.fa \
-ibam sminput.bam \
-ibai sminput.bam.bai \
-o crosslinks.bed \
-or regions.bed \
-nt 8
# -ibam/-ibai: Input control BAM for background modeling
PureCLIP Output
# Crosslink sites BED contains:
# chr start end name score strand
# Score interpretation:
# Higher scores = more confident crosslink sites
# Filter by score
# score>=3: Medium confidence. Use 5+ for high confidence.
awk '$5 >= 3' crosslink_sites.bed > filtered_sites.bed
PureCLIP for Different CLIP Types
# eCLIP (recommended settings)
pureclip -i eclip.bam -bai eclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4 -dm 8
# iCLIP (single-nucleotide resolution)
pureclip -i iclip.bam -bai iclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4
# PAR-CLIP (T-to-C transitions)
pureclip -i parclip.bam -bai parclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4
Piranha
# Basic usage
Piranha -s deduped.bam -o peaks.bed
# With p-value threshold
Piranha -s deduped.bam -o peaks.bed -p 0.01
# Stranded analysis
Piranha -s deduped.bam -o peaks.bed -p 0.01 -u
# Zero-truncated negative binomial
Piranha -s deduped.bam -o peaks.bed -d ZeroTruncatedNegativeBinomial
PEAKachu (for PAR-CLIP)
# PAR-CLIP specific caller
peakachu adaptive \
-c control.bam \
-t treatment.bam \
-r reference.fa \
-o peakachu_peaks.gff
MACS3 for CLIP (Alternative)
# Use narrow peak calling mode
macs3 callpeak \
-t deduped.bam \
-f BAM \
-g hs \
-n clip_peaks \
--nomodel \
--extsize 50 \
-q 0.01
Strand-Specific Peak Calling
# Split BAM by strand
samtools view -h -F 16 deduped.bam | samtools view -Sb - > plus_strand.bam
samtools view -h -f 16 deduped.bam | samtools view -Sb - > minus_strand.bam
# Call peaks on each strand
clipper -b plus_strand.bam -s hg38 -o peaks_plus.bed
clipper -b minus_strand.bam -s hg38 -o peaks_minus.bed
# Combine
cat peaks_plus.bed peaks_minus.bed | sort -k1,1 -k2,2n > peaks_all.bed
Filter Peaks
# By score
awk '$5 >= 10' peaks.bed > peaks_filtered.bed
# By size
awk '($3 - $2) >= 20 && ($3 - $2) <= 200' peaks.bed > peaks_sized.bed
# By read count (if in name field)
awk '$5 >= 5' peaks.bed > peaks_min5reads.bed
Merge Replicates
# Use bedtools to find consensus peaks
bedtools intersect -a rep1_peaks.bed -b rep2_peaks.bed -wa | \
sort -u > consensus_peaks.bed
# Require overlap in N replicates
bedtools multiinter -i rep1.bed rep2.bed rep3.bed | \
awk '$4 >= 2' | \
bedtools merge > consensus_peaks.bed
Peak Metrics
import pandas as pd
def load_clip_peaks(bed_path):
peaks = pd.read_csv(bed_path, sep='\t', header=None,
names=['chrom', 'start', 'end', 'name', 'score', 'strand'])
return peaks
def peak_stats(peaks):
stats = {
'n_peaks': len(peaks),
'mean_width': (peaks['end'] - peaks['start']).mean(),
'median_score': peaks['score'].median(),
'peaks_per_chrom': peaks.groupby('chrom').size().to_dict()
}
return stats
peaks = load_clip_peaks('peaks.bed')
print(peak_stats(peaks))
Quality Metrics
| Metric | Good Value | Description |
|---|---|---|
| Peak count | 1,000-50,000 | Depends on RBP |
| Peak width | 20-100 nt | Typical for RBP footprint |
| FRiP | >0.1 | Fraction reads in peaks |
Calculate FRiP
# Reads in peaks
reads_in_peaks=$(bedtools intersect -a deduped.bam -b peaks.bed -u | samtools view -c -)
# Total reads
total_reads=$(samtools view -c deduped.bam)
# FRiP
frip=$(echo "scale=4; $reads_in_peaks / $total_reads" | bc)
echo "FRiP: $frip"
Related Skills
- clip-alignment - Generate aligned BAM
- clip-preprocessing - UMI deduplication
- binding-site-annotation - Annotate peaks with gene features
- clip-motif-analysis - Find enriched motifs in peaks
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?