Agent skills
bio-metabolomics-xcms-preproce...

Agent skill

bio-metabolomics-xcms-preprocessing

XCMS3 workflow for LC-MS/MS metabolomics preprocessing. Covers peak detection, retention time alignment, correspondence (grouping), and gap filling. Use when processing raw LC-MS data into a feature table for untargeted metabolomics.

View SKILL.md on GitHub Repository

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-metabolomics-xcms-preprocessing

SKILL.md

Version Compatibility

Reference examples tested with: MSnbase 2.28+, scanpy 1.10+, xcms 4.0+

Before using code patterns, verify installed versions match. If versions differ:

R: packageVersion('<pkg>') then ?function_name to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

XCMS Metabolomics Preprocessing

Requires Bioconductor 3.18+ with xcms 4.0+ and MSnbase 2.28+.

Load Raw Data

Goal: Import raw LC-MS files into R for downstream peak detection and alignment.

Approach: Read mzML/mzXML files into an OnDiskMSnExp object using MSnbase for memory-efficient access.

"Process my raw LC-MS data into a feature table" → Detect chromatographic peaks, align retention times across samples, group corresponding peaks, and fill missing values to produce a sample-by-feature intensity matrix.

library(xcms)
library(MSnbase)

# Read mzML/mzXML files
raw_files <- list.files('raw_data', pattern = '\\.(mzML|mzXML)$', full.names = TRUE)

# Create OnDiskMSnExp object
raw_data <- readMSData(raw_files, mode = 'onDisk')

# Check data
raw_data
table(msLevel(raw_data))

Define Sample Groups

Goal: Attach sample metadata (group labels, injection order) to the raw data object.

Approach: Create a data frame of sample information and assign it to the phenoData slot.

# Sample metadata
sample_info <- data.frame(
    sample_name = basename(raw_files),
    sample_group = c(rep('Control', 5), rep('Treatment', 5), rep('QC', 3)),
    injection_order = 1:length(raw_files)
)

# Assign to phenoData
pData(raw_data) <- sample_info

Peak Detection (Centroided)

Goal: Identify chromatographic peaks in centroided LC-MS data.

Approach: Use the CentWave algorithm which detects peaks by continuous wavelet transform on regions of interest defined by m/z and RT.

# CentWave algorithm for centroided data
cwp <- CentWaveParam(
    peakwidth = c(5, 30),       # Peak width range in seconds
    ppm = 15,                    # m/z tolerance
    snthresh = 10,               # Signal-to-noise threshold
    prefilter = c(3, 1000),      # Min peaks and intensity
    mzdiff = 0.01,               # Minimum m/z difference
    noise = 1000,                # Noise level
    integrate = 1                # Integration method
)

# Run peak detection
xdata <- findChromPeaks(raw_data, param = cwp)

# Summary
head(chromPeaks(xdata))
cat('Peaks found:', nrow(chromPeaks(xdata)), '\n')

Peak Detection (Profile Data)

Goal: Detect peaks in profile (non-centroided) LC-MS data.

Approach: Use the MatchedFilter algorithm designed for continuum data, which convolves with a Gaussian model peak.

# MatchedFilter for profile/continuum data
mfp <- MatchedFilterParam(
    binSize = 0.1,
    fwhm = 30,
    snthresh = 10,
    step = 0.1,
    mzdiff = 0.8
)

xdata_profile <- findChromPeaks(raw_data, param = mfp)

Retention Time Alignment

Goal: Correct retention time drift across samples to enable peak correspondence.

Approach: Apply Obiwarp alignment which uses dynamic time warping on the TIC profiles to compute sample-wise RT corrections.

# Obiwarp alignment (recommended)
obp <- ObiwarpParam(
    binSize = 0.5,
    response = 1,
    distFun = 'cor_opt',
    gapInit = 0.3,
    gapExtend = 2.4
)

xdata <- adjustRtime(xdata, param = obp)

# Check alignment
plotAdjustedRtime(xdata)

Peak Correspondence (Grouping)

Goal: Group corresponding chromatographic peaks across samples into consensus features.

Approach: Use peak density-based grouping which models the RT distribution of peaks in m/z slices to identify features present across samples.

# Group peaks across samples
pdp <- PeakDensityParam(
    sampleGroups = pData(xdata)$sample_group,
    bw = 5,                      # RT bandwidth
    minFraction = 0.5,           # Min fraction of samples
    minSamples = 1,              # Min samples per group
    binSize = 0.025              # m/z bin size
)

xdata <- groupChromPeaks(xdata, param = pdp)

# Check feature definitions
featureDefinitions(xdata)
cat('Features:', nrow(featureDefinitions(xdata)), '\n')

Gap Filling

Goal: Recover signal for features that were missed during initial peak detection in some samples.

Approach: Integrate intensity in the expected m/z-RT region for features with missing values using ChromPeakAreaParam.

# Fill in missing peaks
fpp <- ChromPeakAreaParam()
xdata <- fillChromPeaks(xdata, param = fpp)

# Alternative: FillChromPeaksParam for more control
fpp2 <- FillChromPeaksParam(
    expandMz = 0,
    expandRt = 0,
    ppm = 0
)

Extract Feature Table

Goal: Generate a samples-by-features intensity matrix with m/z and RT annotations for downstream analysis.

Approach: Extract feature values and definitions from the processed XCMSnExp object and combine into an exportable table.

# Get feature values (intensity matrix)
feature_values <- featureValues(xdata, method = 'maxint', value = 'into')

# Feature definitions (m/z, RT)
feature_defs <- featureDefinitions(xdata)
feature_defs <- as.data.frame(feature_defs)
feature_defs$feature_id <- rownames(feature_defs)

# Combine
feature_table <- cbind(feature_defs[, c('feature_id', 'mzmed', 'rtmed')], feature_values)
rownames(feature_table) <- feature_table$feature_id

# Save
write.csv(feature_table, 'feature_table.csv', row.names = FALSE)

Quality Control

Goal: Assess preprocessing quality through TIC plots, peak counts, RT correction, and PCA.

Approach: Visualize total ion chromatograms, per-sample peak counts, RT adjustment, and PCA of the feature matrix.

# TIC for each sample
tic <- chromatogram(raw_data, aggregationFun = 'sum')
plot(tic)

# Peak count per sample
peak_counts <- table(chromPeaks(xdata)[, 'sample'])
barplot(peak_counts, main = 'Peaks per sample')

# Check RT correction
par(mfrow = c(1, 2))
plotAdjustedRtime(xdata, col = pData(xdata)$sample_group)

# PCA of features
library(pcaMethods)
log_values <- log2(feature_values + 1)
log_values[is.na(log_values)] <- 0
pca <- pca(t(log_values), nPcs = 3, method = 'ppca')
plotPcs(pca, col = as.factor(pData(xdata)$sample_group))

CAMERA Annotation (Isotopes/Adducts)

Goal: Identify isotope patterns and adduct groups among detected peaks to reduce feature redundancy.

Approach: Use CAMERA to group peaks by RT correlation, assign isotope clusters, and annotate adduct types.

library(CAMERA)

# Create CAMERA object
xsa <- xsAnnotate(as(xdata, 'xcmsSet'))

# Group by RT
xsa <- groupFWHM(xsa, perfwhm = 0.6)

# Find isotopes
xsa <- findIsotopes(xsa, mzabs = 0.01, ppm = 10)

# Find adducts
xsa <- findAdducts(xsa, polarity = 'positive')

# Get annotated peak list
camera_results <- getPeaklist(xsa)

Export for MetaboAnalyst

Goal: Format the XCMS feature table for import into MetaboAnalyst web or R package.

Approach: Transpose the matrix, create M/Z-RT feature names, and prepend sample group information.

# Format for MetaboAnalyst web or R package
export_data <- t(feature_values)
colnames(export_data) <- paste0('M', round(feature_defs$mzmed, 4), 'T', round(feature_defs$rtmed, 1))

# Add sample info
export_df <- data.frame(Sample = rownames(export_data), Group = pData(xdata)$sample_group, export_data)

write.csv(export_df, 'metaboanalyst_input.csv', row.names = FALSE)

Related Skills

metabolite-annotation - Identify metabolites
normalization-qc - Normalize feature table
statistical-analysis - Differential analysis

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/bio-metabolomics-xcms-preprocessing
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Version Compatibility

XCMS Metabolomics Preprocessing

Load Raw Data

Define Sample Groups

Peak Detection (Centroided)

Peak Detection (Profile Data)

Retention Time Alignment

Peak Correspondence (Grouping)

Gap Filling

Extract Feature Table

Quality Control

CAMERA Annotation (Isotopes/Adducts)

Export for MetaboAnalyst

Related Skills

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations