Agent skill

bio-alignment-sorting

Stars 2,009
Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-alignment-sorting

SKILL.md


name: bio-alignment-sorting description: Sort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis. tool_type: cli primary_tool: samtools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

Alignment Sorting

Sort alignment files by coordinate or read name using samtools and pysam.

Sort Orders

Order Flag Use Case
Coordinate default Indexing, visualization, variant calling
Name -n Paired-end processing, fixmate, markdup
Tag -t TAG Sort by specific tag value

samtools sort

Sort by Coordinate (Default)

bash
samtools sort -o sorted.bam input.bam

Sort by Read Name

bash
samtools sort -n -o namesorted.bam input.bam

Multi-threaded Sorting

bash
samtools sort -@ 8 -o sorted.bam input.bam

Control Memory Usage

bash
samtools sort -m 4G -@ 4 -o sorted.bam input.bam

Set Temporary Directory

bash
samtools sort -T /tmp/sort_tmp -o sorted.bam input.bam

Specify Output Format

bash
# Output as BAM (default)
samtools sort -O bam -o sorted.bam input.bam

# Output as CRAM
samtools sort -O cram --reference ref.fa -o sorted.cram input.bam

Sort by Tag

bash
# Sort by cell barcode (10x Genomics)
samtools sort -t CB -o sorted_by_barcode.bam input.bam

Pipe from Aligner

bash
bwa mem ref.fa reads.fq | samtools sort -o aligned.bam

samtools collate

Group paired reads together without full sorting (faster than name sort for some workflows):

bash
# Collate paired reads
samtools collate -o collated.bam input.bam

# With output prefix for temp files
samtools collate -O input.bam /tmp/collate > collated.bam

# Fast mode (output to stdout)
samtools collate -u -O input.bam /tmp/collate | samtools fastq -1 R1.fq -2 R2.fq -

Check Sort Order

From Header

bash
samtools view -H input.bam | grep "^@HD"
# SO:coordinate = coordinate sorted
# SO:queryname = name sorted
# SO:unsorted = not sorted

Verify Sorted

bash
# Check if coordinate sorted (returns 0 if sorted)
samtools view input.bam | awk '$4 < prev {exit 1} {prev=$4}'

pysam Python Alternative

Sort with pysam

python
import pysam

pysam.sort('-o', 'sorted.bam', 'input.bam')

Sort by Name

python
pysam.sort('-n', '-o', 'namesorted.bam', 'input.bam')

Sort with Options

python
pysam.sort('-@', '4', '-m', '2G', '-o', 'sorted.bam', 'input.bam')

Manual Sorting in Python

python
import pysam

with pysam.AlignmentFile('input.bam', 'rb') as infile:
    header = infile.header
    reads = list(infile)

reads.sort(key=lambda r: (r.reference_id, r.reference_start))

with pysam.AlignmentFile('sorted.bam', 'wb', header=header) as outfile:
    for read in reads:
        outfile.write(read)

Check Sort Order in pysam

python
import pysam

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    hd = bam.header.get('HD', {})
    sort_order = hd.get('SO', 'unknown')
    print(f'Sort order: {sort_order}')

Stream Sort from Aligner

For streaming from aligners, use shell pipes (simpler and more reliable):

python
import subprocess

subprocess.run(
    'bwa mem ref.fa reads.fq | samtools sort -o aligned.bam',
    shell=True, check=True
)

Or use pysam with a named pipe:

python
import os
import pysam
import subprocess

os.mkfifo('aligner.pipe')
try:
    aligner = subprocess.Popen(['bwa', 'mem', 'ref.fa', 'reads.fq'],
                               stdout=open('aligner.pipe', 'w'))
    pysam.sort('-o', 'aligned.bam', 'aligner.pipe')
    aligner.wait()
finally:
    os.unlink('aligner.pipe')

samtools merge

Combine multiple BAM files into one.

Basic Merge

bash
samtools merge merged.bam sample1.bam sample2.bam sample3.bam

Merge with Threads

bash
samtools merge -@ 4 merged.bam sample1.bam sample2.bam sample3.bam

Merge from File List

bash
# files.txt contains one BAM path per line
samtools merge -b files.txt merged.bam

Force Overwrite

bash
samtools merge -f merged.bam sample1.bam sample2.bam

Merge Specific Region

bash
samtools merge -R chr1:1000000-2000000 merged_region.bam sample1.bam sample2.bam

pysam Merge

python
import pysam

pysam.merge('-f', 'merged.bam', 'sample1.bam', 'sample2.bam', 'sample3.bam')

Common Workflows

Align and Sort

bash
bwa mem -t 8 ref.fa R1.fq R2.fq | samtools sort -@ 4 -o aligned.bam
samtools index aligned.bam

Re-sort by Name for Duplicate Marking

bash
# Full workflow: sort by name, fixmate, sort by coord, markdup
samtools sort -n -o namesorted.bam input.bam
samtools fixmate -m namesorted.bam fixmate.bam
samtools sort -o sorted.bam fixmate.bam
samtools markdup sorted.bam marked.bam

Convert Name-sorted to Coordinate-sorted

bash
samtools sort -o coord_sorted.bam name_sorted.bam
samtools index coord_sorted.bam

Extract FASTQ from Sorted BAM

bash
# Collate first to group pairs
samtools collate -u -O input.bam /tmp/collate | \
    samtools fastq -1 R1.fq -2 R2.fq -0 /dev/null -s /dev/null -

Performance Tips

Parameter Effect
-@ N Use N additional threads
-m SIZE Memory per thread (e.g., 4G)
-T PREFIX Temp file location (use fast disk)
-l LEVEL Compression level (1-9, default 6)

Optimal Settings for Large Files

bash
# Use 8 threads, 4GB per thread, low compression for speed
samtools sort -@ 8 -m 4G -l 1 -o sorted.bam input.bam

Quick Reference

Task Command
Sort by coordinate samtools sort -o out.bam in.bam
Sort by name samtools sort -n -o out.bam in.bam
Sort with threads samtools sort -@ 8 -o out.bam in.bam
Collate pairs samtools collate -o out.bam in.bam
Merge BAMs samtools merge out.bam in1.bam in2.bam
Check sort order samtools view -H in.bam | grep "^@HD"
Sort + index samtools sort -o out.bam in.bam && samtools index out.bam

Common Errors

Error Cause Solution
out of memory Insufficient RAM Use -m to limit per-thread memory
disk full Temp files filling disk Use -T to specify different location
truncated file Interrupted sort Re-run sort from original

Related Skills

  • sam-bam-basics - View and convert alignment files
  • alignment-indexing - Index after coordinate sorting
  • duplicate-handling - Requires name-sorted input for fixmate
  • alignment-filtering - Filter before or after sorting

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results