Agent skills
bio-read-qc-fastp-workflow

Agent skill

bio-read-qc-fastp-workflow

All-in-one read preprocessing with fastp including adapter trimming, quality filtering, deduplication, base correction, and HTML report generation. Use when preprocessing Illumina data and wanting a single fast tool instead of separate Cutadapt, Trimmomatic, and FastQC steps.

View SKILL.md on GitHub Repository

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-read-qc-fastp-workflow

SKILL.md

Version Compatibility

Reference examples tested with: FastQC 0.12+, fastp 0.23+

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

fastp Workflow

All-in-one preprocessing tool that handles adapter trimming, quality filtering, deduplication, and report generation in a single pass.

"Preprocess FASTQ reads with fastp" → Run adapter trimming, quality filtering, and QC reporting in a single pass.

CLI: fastp -i R1.fq -I R2.fq -o clean_R1.fq -O clean_R2.fq --html report.html

Basic Usage

Single-End

bash

fastp -i input.fastq.gz -o output.fastq.gz

Paired-End

bash

fastp -i R1.fastq.gz -I R2.fastq.gz -o R1_clean.fastq.gz -O R2_clean.fastq.gz

With Custom HTML/JSON Reports

bash

fastp -i R1.fq.gz -I R2.fq.gz \
      -o R1_clean.fq.gz -O R2_clean.fq.gz \
      -h sample_report.html \
      -j sample_report.json

Adapter Trimming

fastp auto-detects Illumina adapters by default.

bash

# Auto-detect (default)
fastp -i in.fq -o out.fq

# Specify adapters manually
fastp -i in.fq -o out.fq \
      --adapter_sequence AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

# Paired-end with manual adapters
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq \
      --adapter_sequence AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
      --adapter_sequence_r2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

# Disable adapter trimming
fastp -i in.fq -o out.fq --disable_adapter_trimming

# Adapter FASTA file
fastp -i in.fq -o out.fq --adapter_fasta adapters.fa

Quality Filtering

bash

# Per-base quality threshold (default Q15)
fastp -i in.fq -o out.fq -q 20

# Mean read quality threshold
fastp -i in.fq -o out.fq -e 25

# Max unqualified bases percent (default 40)
fastp -i in.fq -o out.fq -q 20 --unqualified_percent_limit 30

# Disable quality filtering
fastp -i in.fq -o out.fq --disable_quality_filtering

Quality Trimming

bash

# Sliding window from 3' end (recommended)
fastp -i in.fq -o out.fq \
      --cut_right \
      --cut_right_window_size 4 \
      --cut_right_mean_quality 20

# Sliding window from 5' end
fastp -i in.fq -o out.fq \
      --cut_front \
      --cut_front_window_size 4 \
      --cut_front_mean_quality 20

# Both ends
fastp -i in.fq -o out.fq \
      --cut_front --cut_tail \
      --cut_front_window_size 4 \
      --cut_front_mean_quality 20 \
      --cut_tail_window_size 4 \
      --cut_tail_mean_quality 20

Length Filtering

bash

# Minimum length (default 15)
fastp -i in.fq -o out.fq -l 36

# Maximum length
fastp -i in.fq -o out.fq --length_limit 150

# Required length (discard shorter AND longer)
fastp -i in.fq -o out.fq -l 100 --length_limit 100

Poly-X Trimming

bash

# Trim poly-G (NovaSeq/NextSeq artifacts) - auto-enabled for these platforms
fastp -i in.fq -o out.fq --trim_poly_g

# Disable poly-G trimming
fastp -i in.fq -o out.fq --disable_trim_poly_g

# Trim poly-X (any homopolymer)
fastp -i in.fq -o out.fq --trim_poly_x

# Custom poly-G minimum length (default 10)
fastp -i in.fq -o out.fq --trim_poly_g --poly_g_min_len 5

N Base Handling

bash

# Max N bases (default 5)
fastp -i in.fq -o out.fq -n 3

# Disable N filtering
fastp -i in.fq -o out.fq --n_base_limit 50

Deduplication

bash

# Enable deduplication
fastp -i in.fq -o out.fq --dedup

# Accuracy level (1-6, higher = more memory, default 3)
fastp -i in.fq -o out.fq --dedup --dup_calc_accuracy 4

Base Correction (Paired-End Only)

bash

# Enable overlap-based correction
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq --correction

# Required overlap length (default 30)
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq \
      --correction --overlap_len_require 20

Paired-End Merge

bash

# Merge overlapping paired reads
fastp -i R1.fq -I R2.fq \
      --merge --merged_out merged.fq \
      -o R1_unmerged.fq -O R2_unmerged.fq

UMI Processing

bash

# UMI in read (extract to header)
fastp -i in.fq -o out.fq \
      --umi --umi_loc read1 --umi_len 8

# UMI in separate read
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq \
      --umi --umi_loc index1

# UMI locations: index1, index2, read1, read2, per_index, per_read

Complete Workflow Example

Standard Illumina Pipeline

bash

fastp \
    -i raw_R1.fastq.gz -I raw_R2.fastq.gz \
    -o clean_R1.fastq.gz -O clean_R2.fastq.gz \
    --detect_adapter_for_pe \
    --cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
    -q 20 -l 36 \
    --thread 8 \
    -h sample_fastp.html -j sample_fastp.json

NovaSeq/NextSeq Pipeline

bash

fastp \
    -i raw_R1.fastq.gz -I raw_R2.fastq.gz \
    -o clean_R1.fastq.gz -O clean_R2.fastq.gz \
    --detect_adapter_for_pe \
    --trim_poly_g \
    --cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
    -q 20 -l 36 \
    --thread 8 \
    -h sample_fastp.html -j sample_fastp.json

RNA-seq Pipeline

bash

fastp \
    -i raw_R1.fastq.gz -I raw_R2.fastq.gz \
    -o clean_R1.fastq.gz -O clean_R2.fastq.gz \
    --detect_adapter_for_pe \
    --cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
    -q 20 -l 50 \
    --thread 8 \
    -h sample_fastp.html -j sample_fastp.json

Output Files

File	Description
`*.html`	Interactive HTML report
`*.json`	Machine-readable statistics
Output FASTQ	Processed reads

JSON Report Structure

python

import json

with open('sample_fastp.json') as f:
    report = json.load(f)

summary = report['summary']
print(f"Total reads: {summary['before_filtering']['total_reads']}")
print(f"Passed reads: {summary['after_filtering']['total_reads']}")
print(f"Q20 rate: {summary['after_filtering']['q20_rate']:.2%}")
print(f"Q30 rate: {summary['after_filtering']['q30_rate']:.2%}")

Performance

bash

# Set threads (default 3)
fastp -i in.fq -o out.fq --thread 8

# Disable HTML report (faster)
fastp -i in.fq -o out.fq --html /dev/null

# Process from stdin
zcat in.fq.gz | fastp --stdin -o out.fq

Related Skills

quality-reports - MultiQC can aggregate fastp JSON reports
adapter-trimming - Cutadapt for complex adapter scenarios
quality-filtering - Trimmomatic alternative

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/bio-read-qc-fastp-workflow
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Version Compatibility

fastp Workflow

Basic Usage

Single-End

Paired-End

With Custom HTML/JSON Reports

Adapter Trimming

Quality Filtering

Quality Trimming

Length Filtering

Poly-X Trimming

N Base Handling

Deduplication

Base Correction (Paired-End Only)

Paired-End Merge

UMI Processing

Complete Workflow Example

Standard Illumina Pipeline

NovaSeq/NextSeq Pipeline

RNA-seq Pipeline

Output Files

JSON Report Structure

Performance

Related Skills

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations