Agent skill

bio-genome-intervals-bed-file-basics

Stars 2,009
Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-genome-intervals-bed-file-basics

SKILL.md


name: bio-genome-intervals-bed-file-basics description: BED file format fundamentals, creation, validation, and basic operations. Covers BED3 through BED12 formats, coordinate systems, sorting, and format conversion using bedtools and pybedtools. Use when working with genomic coordinates or preparing interval files for downstream tools. tool_type: mixed primary_tool: bedtools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

BED File Basics

BED (Browser Extensible Data) format stores genomic intervals. Uses 0-based, half-open coordinates.

BED Format Columns

BED3:  chr  start  end
BED4:  chr  start  end  name
BED5:  chr  start  end  name  score
BED6:  chr  start  end  name  score  strand
BED12: chr  start  end  name  score  strand  thickStart  thickEnd  rgb  blockCount  blockSizes  blockStarts

Coordinate System

BED uses 0-based, half-open coordinates:

  • Start: 0-based (first base is 0)
  • End: exclusive (not included)
  • Position 100-200 means bases at positions 100-199

Create BED Files

From Text (CLI)

bash
# Create simple BED3
echo -e "chr1\t100\t200\nchr1\t300\t400" > regions.bed

# Create BED6 with name and strand
echo -e "chr1\t100\t200\tpeak1\t100\t+" > peaks.bed

From Python

python
import pybedtools

# From string
bed_string = '''chr1\t100\t200\tpeak1\t100\t+
chr1\t300\t400\tpeak2\t200\t-'''
bed = pybedtools.BedTool(bed_string, from_string=True)

# From list of tuples
intervals = [
    ('chr1', 100, 200, 'peak1', 100, '+'),
    ('chr1', 300, 400, 'peak2', 200, '-'),
]
bed = pybedtools.BedTool(intervals)

# From pandas DataFrame
import pandas as pd
df = pd.DataFrame({
    'chrom': ['chr1', 'chr1'],
    'start': [100, 300],
    'end': [200, 400],
    'name': ['peak1', 'peak2'],
    'score': [100, 200],
    'strand': ['+', '-']
})
bed = pybedtools.BedTool.from_dataframe(df)

# Save to file
bed.saveas('output.bed')

Sort BED Files

CLI

bash
# Sort by chromosome and position
sort -k1,1 -k2,2n input.bed > sorted.bed

# Using bedtools
bedtools sort -i input.bed > sorted.bed

# Sort by chromosome, start, then end
sort -k1,1 -k2,2n -k3,3n input.bed > sorted.bed

Python

python
import pybedtools

bed = pybedtools.BedTool('input.bed')
sorted_bed = bed.sort()
sorted_bed.saveas('sorted.bed')

Validate BED Files

Check Format

bash
# Check column count
awk -F'\t' '{print NF}' input.bed | sort -u

# Check for invalid coordinates (start >= end)
awk '$2 >= $3' input.bed

# Check for negative coordinates
awk '$2 < 0 || $3 < 0' input.bed

# Validate with bedtools
bedtools sort -i input.bed > /dev/null 2>&1 && echo "Valid" || echo "Invalid"

Python Validation

python
import pybedtools

def validate_bed(filepath):
    try:
        bed = pybedtools.BedTool(filepath)
        for interval in bed:
            if interval.start < 0 or interval.end < 0:
                return False, 'Negative coordinates'
            if interval.start >= interval.end:
                return False, f'Invalid interval: {interval.start} >= {interval.end}'
        return True, 'Valid'
    except Exception as e:
        return False, str(e)

valid, msg = validate_bed('input.bed')
print(f'{msg}')

Read BED Files

Python

python
import pybedtools

# Load BED file
bed = pybedtools.BedTool('input.bed')

# Iterate over intervals
for interval in bed:
    print(f'{interval.chrom}:{interval.start}-{interval.end}')
    print(f'Name: {interval.name}, Score: {interval.score}, Strand: {interval.strand}')

# Count intervals
n_intervals = bed.count()

# Convert to pandas DataFrame
df = bed.to_dataframe()

# Access specific columns
df = bed.to_dataframe(names=['chrom', 'start', 'end', 'name', 'score', 'strand'])

Filter BED Files

By Chromosome

bash
# Single chromosome
grep "^chr1\t" input.bed > chr1.bed

# Multiple chromosomes
grep -E "^(chr1|chr2)\t" input.bed > chr1_2.bed

# Exclude chromosome
grep -v "^chrM\t" input.bed > no_chrM.bed

By Size

bash
# Intervals >= 100bp
awk '($3 - $2) >= 100' input.bed > large.bed

# Intervals between 100-500bp
awk '($3 - $2) >= 100 && ($3 - $2) <= 500' input.bed > medium.bed

Python Filtering

python
import pybedtools

bed = pybedtools.BedTool('input.bed')

# Filter by chromosome
chr1 = bed.filter(lambda x: x.chrom == 'chr1')

# Filter by size
large = bed.filter(lambda x: len(x) >= 100)

# Filter by strand
plus_strand = bed.filter(lambda x: x.strand == '+')

# Filter by score
high_score = bed.filter(lambda x: float(x.score) >= 500)

# Chain filters
result = bed.filter(lambda x: x.chrom == 'chr1' and len(x) >= 100)
result.saveas('filtered.bed')

Convert Formats

BED to Other Formats

bash
# BED to GFF
bedtools bed12togff -i input.bed > output.gff

# BED to FASTA (extract sequences)
bedtools getfasta -fi reference.fa -bed input.bed -fo output.fa

# BED to FASTA with names
bedtools getfasta -fi reference.fa -bed input.bed -name -fo output.fa

From VCF to BED

bash
# Extract variant positions
bcftools query -f '%CHROM\t%POS0\t%END\n' input.vcf > variants.bed

# Or using awk (simpler for SNPs)
grep -v "^#" input.vcf | awk '{print $1"\t"$2-1"\t"$2}' > snps.bed

From BAM to BED

bash
# Convert alignments to BED
bedtools bamtobed -i input.bam > alignments.bed

# BED12 for spliced alignments
bedtools bamtobed -i input.bam -split > spliced.bed

Interval Arithmetic Basics

python
import pybedtools

interval = pybedtools.create_interval_from_list(['chr1', '100', '200', 'peak1', '0', '+'])

# Access fields
print(interval.chrom)   # chr1
print(interval.start)   # 100 (int)
print(interval.end)     # 200 (int)
print(interval.name)    # peak1
print(interval.score)   # 0
print(interval.strand)  # +

# Get length
print(len(interval))    # 100

# Get fields list
print(interval.fields)  # ['chr1', '100', '200', 'peak1', '0', '+']

Make Windows - Create Genomic Intervals

CLI

bash
# Fixed-size windows across genome
bedtools makewindows -g genome.txt -w 10000 > windows_10kb.bed

# Windows with step size (sliding windows)
bedtools makewindows -g genome.txt -w 10000 -s 5000 > sliding_10kb.bed

# Fixed number of windows per chromosome
bedtools makewindows -g genome.txt -n 100 > 100_windows_per_chr.bed

# Windows within BED regions
bedtools makewindows -b regions.bed -w 1000 > windows_in_regions.bed

# Add window ID
bedtools makewindows -g genome.txt -w 10000 -i winnum > numbered_windows.bed

# Source chromosome in name
bedtools makewindows -g genome.txt -w 10000 -i srcwinnum > windows_with_source.bed

Python

python
import pybedtools

# From genome file
windows = pybedtools.BedTool().window_maker(g='genome.txt', w=10000)

# Sliding windows
windows = pybedtools.BedTool().window_maker(g='genome.txt', w=10000, s=5000)

# From BED regions
bed = pybedtools.BedTool('regions.bed')
windows = pybedtools.BedTool().window_maker(b=bed.fn, w=1000)

windows.saveas('windows.bed')

Common BED Extensions

Format Description Columns
narrowPeak ENCODE narrow peaks BED6 + signalValue, pValue, qValue, peak
broadPeak ENCODE broad peaks BED6 + signalValue, pValue, qValue
bedGraph Signal track chr, start, end, value
bedpe Paired intervals chr1, s1, e1, chr2, s2, e2, name, score, strand1, strand2

Cleanup Temp Files

python
import pybedtools

# At end of script
pybedtools.cleanup()

# Or use context manager (auto-cleanup)
pybedtools.set_tempdir('/tmp/pybedtools')

Related Skills

  • interval-arithmetic - intersect, subtract, merge operations
  • gtf-gff-handling - annotation file parsing
  • coverage-analysis - depth calculations
  • alignment-files/sam-bam-basics - BAM to BED conversion

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results