Agent skill

bio-stats-ml-reporting

Aggregate results, train ML models, and produce reports with validated references.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/bio-stats-ml-reporting

SKILL.md

Bio Stats ML Reporting

When to use

  • Aggregate results, train ML models, and produce reports with validated references.

Prerequisites

  • Tools installed via pixi (see pixi.toml).
  • Results tables and metadata are available.

Inputs

  • results/.parquet or results/.tsv
  • metadata.tsv

Outputs

  • results/bio-stats-ml-reporting/models/
  • results/bio-stats-ml-reporting/metrics.tsv
  • results/bio-stats-ml-reporting/report.md
  • results/bio-stats-ml-reporting/logs/

Steps

  1. Join outputs in DuckDB and build feature tables.
  2. Train baseline models and evaluate with cross-validation.
  3. Generate reports and validate references.

QC gates

  • Model performance sanity checks pass.
  • Reference validation passes.
  • On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.

Validation

  • Verify input tables are readable and schema-consistent.

Tools

  • duckdb v1.4.3
  • scikit-learn v1.8.0
  • xgboost v3.1.3
  • crossrefapi v1.7.0

Paper summaries (2023-2025)

  • summaries/ (include example use cases and tool settings used)

Tool documentation

  • DuckDB - In-process analytical database for data aggregation
  • scikit-learn - Machine learning library
  • XGBoost - Gradient boosting framework
  • Crossref API - Reference validation and metadata retrieval

References

  • See ../bio-skills-references.md

Didn't find tool you were looking for?

Be as detailed as possible for better results