Agent skill
bio-bam
This skill should be used when the user asks to analyze BAM/SAM/CRAM alignment files from WGS/WES sequencing. Triggers include requests to extract reads from specific regions, identify insertions and deletions, calculate coverage statistics, or export read data as JSON for downstream analysis.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/testing/bio-bam
SKILL.md
BAM Toolkit
Toolkit for BAM/SAM/CRAM alignment file analysis: extract reads, identify indels, and calculate coverage. Designed for WGS/WES sequencing result inspection and quality control.
Quick Start
Install
uv pip install pysam typer
Basic Usage
# 1. 特定領域のリードを抽出
python scripts/extract_reads.py --bam alignment.bam --region chr1:1000-2000 --output reads.json --format json
# 2. Indel を抽出
python scripts/extract_indels.py --bam alignment.bam --region chr1:1000-2000 --output indels.json
# 3. カバレッジ統計を計算
python scripts/calculate_coverage.py --bam alignment.bam --region chr1:1000-2000 --output coverage.json
Scripts
extract_reads.py - Read Extraction
Extract reads from BAM/SAM/CRAM files for specific genomic regions and export as BAM or JSON format.
Required Arguments
--bam PATH- Input BAM/SAM/CRAM file path--region TEXT- Genomic region (e.g.,chr1:1000-2000)
Optional Arguments
Output:
--output PATH- Output file path (BAM or JSON based on extension)--format TEXT- Output format:bamorjson(default:bam)
Filters:
--min-mapq INT- Minimum mapping quality (default: 0)--proper-pairs- Only properly paired reads--no-duplicates- Exclude duplicate reads
Output Format (JSON)
{
"region": "chr1:1000-2000",
"total_reads": 150,
"filtered_from": 200,
"filters": {
"min_mapq": 30,
"proper_pairs_only": true,
"no_duplicates": true
},
"reads": [
{
"query_name": "read1",
"reference_start": 1050,
"reference_end": 1150,
"sequence": "ATCG...",
"quality": "IIII...",
"mapping_quality": 60,
"is_reverse": false,
"cigar": "100M",
"is_proper_pair": true,
"is_duplicate": false,
"is_paired": true,
"mate_is_reverse": true,
"mate_reference_name": "chr1",
"mate_reference_start": 1200,
"template_length": 250
}
]
}
Output Format (BAM)
When --format bam is specified, outputs a filtered BAM file containing only reads that pass the specified filters.
Usage Examples
# Extract reads as JSON
python scripts/extract_reads.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--output reads.json \
--format json
# Extract reads as BAM
python scripts/extract_reads.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--output reads.bam
# Extract high-quality properly-paired reads
python scripts/extract_reads.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--min-mapq 30 \
--proper-pairs \
--no-duplicates \
--output filtered.bam
extract_indels.py - Insertion/Deletion Extraction
Extract insertions and deletions from reads in specified genomic regions by parsing CIGAR strings.
Required Arguments
--bam PATH- Input BAM/SAM/CRAM file path--region TEXT- Genomic region (e.g.,chr1:1000-2000)
Optional Arguments
Output:
--output PATH- JSON output file path (default: stdout)
Filters:
--min-mapq INT- Minimum mapping quality (default: 20)--min-indel-size INT- Minimum indel size in bp (default: 1)
Output Format (JSON)
{
"region": "chr1:1000-2000",
"filters": {
"min_mapq": 20,
"min_indel_size": 1
},
"reads": {
"total": 500,
"processed": 450
},
"summary": {
"total_insertions": 15,
"total_deletions": 8,
"unique_insertions": 5,
"unique_deletions": 3
},
"insertions": [
{
"position": 1050,
"size": 3,
"sequence": "ATG",
"read_name": "read1",
"mapping_quality": 60,
"count": 3,
"supporting_reads": ["read1", "read2", "read3"]
}
],
"deletions": [
{
"position": 1100,
"size": 2,
"read_name": "read2",
"mapping_quality": 55,
"count": 2,
"supporting_reads": ["read2", "read5"]
}
]
}
Implementation Details
- Parses CIGAR strings to identify I (insertion) and D (deletion) operations
- Extracts sequences for insertions from read sequences
- Groups identical indels by position and sequence/size
- Counts supporting reads for each unique indel
- Filters by mapping quality and minimum indel size
Usage Examples
# Extract all indels
python scripts/extract_indels.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--output indels.json
# Extract large indels only (>= 5bp)
python scripts/extract_indels.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--min-indel-size 5 \
--output large_indels.json
# High-quality indels only
python scripts/extract_indels.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--min-mapq 30 \
--output hq_indels.json
calculate_coverage.py - Coverage Calculation
Calculate coverage statistics for specified genomic regions using pileup operations.
Required Arguments
--bam PATH- Input BAM/SAM/CRAM file path--region TEXT- Genomic region (e.g.,chr1:1000-2000)
Optional Arguments
Output:
--output PATH- JSON output file path (default: stdout)
Options:
--min-mapq INT- Minimum mapping quality (default: 0)--min-baseq INT- Minimum base quality (default: 0)
Output Format (JSON)
{
"region": "chr1:1000-2000",
"filters": {
"min_mapq": 0,
"min_baseq": 0
},
"statistics": {
"total_bases": 1000,
"mean_coverage": 45.3,
"median_coverage": 48,
"min_coverage": 0,
"max_coverage": 120,
"bases_with_coverage": 995,
"bases_without_coverage": 5,
"percent_covered": 99.5
}
}
Usage Examples
# Calculate coverage statistics
python scripts/calculate_coverage.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--output coverage.json
# High-quality bases only
python scripts/calculate_coverage.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--min-mapq 30 \
--min-baseq 20 \
--output hq_coverage.json
Workflow Examples
Example 1: Comprehensive Read Analysis Workflow
Combine all three scripts for complete BAM analysis:
# Step 1: Calculate coverage statistics
python scripts/calculate_coverage.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--output coverage.json
# Step 2: Extract indels
python scripts/extract_indels.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--min-mapq 30 \
--output indels.json
# Step 3: Extract reads as JSON for detailed inspection
python scripts/extract_reads.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--min-mapq 30 \
--format json \
--output reads.json
Example 2: High-Quality Indel Discovery
Identify large insertions and deletions with high-quality reads:
# Extract large indels (>= 10bp) with high mapping quality
python scripts/extract_indels.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--min-mapq 30 \
--min-indel-size 10 \
--output large_indels.json
# Extract supporting reads for manual validation
python scripts/extract_reads.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--min-mapq 30 \
--output supporting_reads.bam
Example 3: Coverage Quality Control
Assess sequencing coverage quality across target region:
# Calculate coverage statistics with quality filters
python scripts/calculate_coverage.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--min-mapq 30 \
--min-baseq 20 \
--output coverage_stats.json
# Extract reads from low-coverage regions for inspection
python scripts/extract_reads.py \
--bam alignment.bam \
--region chr1:1500-1600 \
--format json \
--output low_coverage_reads.json
Error Handling
Missing BAM Index
$ python scripts/extract_reads.py --bam alignment.bam --region chr1:1000-2000
Error fetching reads: random access requires an index
Make sure the region 'chr1:1000-2000' exists and BAM file is indexed.
Solution: Create BAM index with samtools:
samtools index alignment.bam
Invalid Region Format
$ python scripts/extract_reads.py --bam alignment.bam --region 1000-2000
Error: Invalid region format: 1000-2000. Expected format: chr1:1000-2000
Solution: Use correct region format chr:start-end:
python scripts/extract_reads.py --bam alignment.bam --region chr1:1000-2000
Best Practices
1. Always Index BAM Files
Create BAM index before using any scripts to enable efficient random access:
# ✅ Good: Index BAM file first
samtools index alignment.bam
# Then use bam-toolkit scripts
python scripts/extract_reads.py --bam alignment.bam --region chr1:1000-2000
2. Apply Quality Filters for Reliable Results
Use mapping quality and base quality filters to ensure reliable analysis:
# ✅ Good: Apply quality filters
python scripts/extract_indels.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--min-mapq 30
# ✅ Good: Use proper pairs for structural variant analysis
python scripts/extract_reads.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--proper-pairs \
--no-duplicates
3. Extract Reads as BAM for Further Processing
Use BAM output when planning to use extracted reads with other tools:
# ✅ Good: Extract as BAM for use with samtools/IGV
python scripts/extract_reads.py \
--bam alignment.bam \
--region chr1:1000-2000 \
--min-mapq 30 \
--output filtered.bam
# Then index and use with other tools
samtools index filtered.bam
samtools view filtered.bam
When to Use bam-toolkit vs samtools
| Task | bam-toolkit | samtools |
|---|---|---|
| Read extraction as JSON | ✅ extract_reads.py | - |
| Indel extraction with grouping | ✅ extract_indels.py | - |
| Coverage statistics as JSON | ✅ calculate_coverage.py | ✅ samtools depth |
| BAM-to-BAM filtering | ✅ extract_reads.py | ✅ samtools view |
| Read alignment statistics | - | ✅ samtools flagstat |
| BAM indexing | - | ✅ samtools index |
Recommended Workflow:
- Index BAM files with samtools
- Use bam-toolkit scripts for structured JSON output
- Use samtools for standard BAM operations (sorting, indexing, format conversion)
Related Skills
- vcf-toolkit - VCF/BCF variant file operations
- sequence-io - FASTA/FASTQ sequence file operations
- blast-search - BLAST homology search
- blat-api-searching - BLAT genome mapping
Troubleshooting
BAM File Not Readable
Ensure BAM file is properly formatted and readable:
# Check BAM file integrity
samtools quickcheck alignment.bam
# View BAM header
samtools view -H alignment.bam
Chromosome Name Mismatch
Chromosome names must match between BAM file and region specification:
# Check chromosome names in BAM
samtools idxstats alignment.bam | cut -f1
# Use correct chromosome name
# If BAM uses "1" instead of "chr1":
python scripts/extract_reads.py --bam alignment.bam --region 1:1000-2000
Empty Output
If output is empty, check:
- Region contains aligned reads:
samtools view alignment.bam chr1:1000-2000 | head - Filters are not too restrictive (try reducing
--min-mapq) - Chromosome name matches BAM file
Didn't find tool you were looking for?