Agent skill
bio-read-qc-fastp-workflow
All-in-one read preprocessing with fastp including adapter trimming, quality filtering, deduplication, base correction, and HTML report generation. Use when preprocessing Illumina data and wanting a single fast tool instead of separate Cutadapt, Trimmomatic, and FastQC steps.
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-read-qc-fastp-workflow
SKILL.md
Version Compatibility
Reference examples tested with: FastQC 0.12+, fastp 0.23+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
pip show <package>thenhelp(module.function)to check signatures - CLI:
<tool> --versionthen<tool> --helpto confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
fastp Workflow
All-in-one preprocessing tool that handles adapter trimming, quality filtering, deduplication, and report generation in a single pass.
"Preprocess FASTQ reads with fastp" → Run adapter trimming, quality filtering, and QC reporting in a single pass.
- CLI:
fastp -i R1.fq -I R2.fq -o clean_R1.fq -O clean_R2.fq --html report.html
Basic Usage
Single-End
fastp -i input.fastq.gz -o output.fastq.gz
Paired-End
fastp -i R1.fastq.gz -I R2.fastq.gz -o R1_clean.fastq.gz -O R2_clean.fastq.gz
With Custom HTML/JSON Reports
fastp -i R1.fq.gz -I R2.fq.gz \
-o R1_clean.fq.gz -O R2_clean.fq.gz \
-h sample_report.html \
-j sample_report.json
Adapter Trimming
fastp auto-detects Illumina adapters by default.
# Auto-detect (default)
fastp -i in.fq -o out.fq
# Specify adapters manually
fastp -i in.fq -o out.fq \
--adapter_sequence AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
# Paired-end with manual adapters
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq \
--adapter_sequence AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
--adapter_sequence_r2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
# Disable adapter trimming
fastp -i in.fq -o out.fq --disable_adapter_trimming
# Adapter FASTA file
fastp -i in.fq -o out.fq --adapter_fasta adapters.fa
Quality Filtering
# Per-base quality threshold (default Q15)
fastp -i in.fq -o out.fq -q 20
# Mean read quality threshold
fastp -i in.fq -o out.fq -e 25
# Max unqualified bases percent (default 40)
fastp -i in.fq -o out.fq -q 20 --unqualified_percent_limit 30
# Disable quality filtering
fastp -i in.fq -o out.fq --disable_quality_filtering
Quality Trimming
# Sliding window from 3' end (recommended)
fastp -i in.fq -o out.fq \
--cut_right \
--cut_right_window_size 4 \
--cut_right_mean_quality 20
# Sliding window from 5' end
fastp -i in.fq -o out.fq \
--cut_front \
--cut_front_window_size 4 \
--cut_front_mean_quality 20
# Both ends
fastp -i in.fq -o out.fq \
--cut_front --cut_tail \
--cut_front_window_size 4 \
--cut_front_mean_quality 20 \
--cut_tail_window_size 4 \
--cut_tail_mean_quality 20
Length Filtering
# Minimum length (default 15)
fastp -i in.fq -o out.fq -l 36
# Maximum length
fastp -i in.fq -o out.fq --length_limit 150
# Required length (discard shorter AND longer)
fastp -i in.fq -o out.fq -l 100 --length_limit 100
Poly-X Trimming
# Trim poly-G (NovaSeq/NextSeq artifacts) - auto-enabled for these platforms
fastp -i in.fq -o out.fq --trim_poly_g
# Disable poly-G trimming
fastp -i in.fq -o out.fq --disable_trim_poly_g
# Trim poly-X (any homopolymer)
fastp -i in.fq -o out.fq --trim_poly_x
# Custom poly-G minimum length (default 10)
fastp -i in.fq -o out.fq --trim_poly_g --poly_g_min_len 5
N Base Handling
# Max N bases (default 5)
fastp -i in.fq -o out.fq -n 3
# Disable N filtering
fastp -i in.fq -o out.fq --n_base_limit 50
Deduplication
# Enable deduplication
fastp -i in.fq -o out.fq --dedup
# Accuracy level (1-6, higher = more memory, default 3)
fastp -i in.fq -o out.fq --dedup --dup_calc_accuracy 4
Base Correction (Paired-End Only)
# Enable overlap-based correction
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq --correction
# Required overlap length (default 30)
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq \
--correction --overlap_len_require 20
Paired-End Merge
# Merge overlapping paired reads
fastp -i R1.fq -I R2.fq \
--merge --merged_out merged.fq \
-o R1_unmerged.fq -O R2_unmerged.fq
UMI Processing
# UMI in read (extract to header)
fastp -i in.fq -o out.fq \
--umi --umi_loc read1 --umi_len 8
# UMI in separate read
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq \
--umi --umi_loc index1
# UMI locations: index1, index2, read1, read2, per_index, per_read
Complete Workflow Example
Standard Illumina Pipeline
fastp \
-i raw_R1.fastq.gz -I raw_R2.fastq.gz \
-o clean_R1.fastq.gz -O clean_R2.fastq.gz \
--detect_adapter_for_pe \
--cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
-q 20 -l 36 \
--thread 8 \
-h sample_fastp.html -j sample_fastp.json
NovaSeq/NextSeq Pipeline
fastp \
-i raw_R1.fastq.gz -I raw_R2.fastq.gz \
-o clean_R1.fastq.gz -O clean_R2.fastq.gz \
--detect_adapter_for_pe \
--trim_poly_g \
--cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
-q 20 -l 36 \
--thread 8 \
-h sample_fastp.html -j sample_fastp.json
RNA-seq Pipeline
fastp \
-i raw_R1.fastq.gz -I raw_R2.fastq.gz \
-o clean_R1.fastq.gz -O clean_R2.fastq.gz \
--detect_adapter_for_pe \
--cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
-q 20 -l 50 \
--thread 8 \
-h sample_fastp.html -j sample_fastp.json
Output Files
| File | Description |
|---|---|
*.html |
Interactive HTML report |
*.json |
Machine-readable statistics |
| Output FASTQ | Processed reads |
JSON Report Structure
import json
with open('sample_fastp.json') as f:
report = json.load(f)
summary = report['summary']
print(f"Total reads: {summary['before_filtering']['total_reads']}")
print(f"Passed reads: {summary['after_filtering']['total_reads']}")
print(f"Q20 rate: {summary['after_filtering']['q20_rate']:.2%}")
print(f"Q30 rate: {summary['after_filtering']['q30_rate']:.2%}")
Performance
# Set threads (default 3)
fastp -i in.fq -o out.fq --thread 8
# Disable HTML report (faster)
fastp -i in.fq -o out.fq --html /dev/null
# Process from stdin
zcat in.fq.gz | fastp --stdin -o out.fq
Related Skills
- quality-reports - MultiQC can aggregate fastp JSON reports
- adapter-trimming - Cutadapt for complex adapter scenarios
- quality-filtering - Trimmomatic alternative
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?