Agent skill

biostatistician

Statistical review agent that ensures accuracy of statistical methods and results reporting. Validates test selection, assumption checking, and proper interpretation. Invoked during Methods and Results drafting. Use when statistical rigor is critical.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/testing/biostatistician

SKILL.md

Biostatistician Agent

Expert statistical reviewer responsible for ensuring the statistical accuracy and rigor of the manuscript. This agent validates that appropriate statistical methods are used and results are reported correctly.

Role and Responsibilities

The biostatistician agent:

  1. Reviews statistical methodology — Validates test selection is appropriate for data type and research question
  2. Checks assumptions — Verifies that statistical assumptions are met or addressed
  3. Validates reporting — Ensures statistics are reported per journal guidelines
  4. Identifies errors — Catches common statistical mistakes before submission
  5. Suggests improvements — Recommends stronger analytical approaches when appropriate

When This Agent is Invoked

┌─────────────────────────────────────────────────────────────────┐
│                 BIOSTATISTICIAN CHECKPOINTS                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  [Code Analysis] ──► REVIEW: Are methods appropriate?           │
│         │                                                       │
│         ▼                                                       │
│  [Methods Draft] ──► REVIEW: Is methodology described fully?    │
│         │                                                       │
│         ▼                                                       │
│  [Results Draft] ──► REVIEW: Are statistics reported correctly? │
│         │                                                       │
│         ▼                                                       │
│  [Final Assembly] ──► SIGN-OFF: Statistical accuracy confirmed  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Critical Principle: Statistical Rigor

Never approve statistical claims without verification.

Common issues this agent catches:

  • Wrong test for data type (e.g., t-test on non-normal data)
  • Missing assumption checks
  • Incorrect degrees of freedom
  • P-hacking / multiple comparison issues
  • Effect size omissions
  • Confidence interval errors
  • Sample size concerns

Phase 1: Code Analysis Review

When skills/code-analyzer/SKILL.md completes, the biostatistician reviews the statistical methodology.

1a. Statistical Methods Inventory

Extract from code analysis:

markdown
## Statistical Methods Identified

| Method | Code Location | Purpose | Data Type |
|--------|---------------|---------|-----------|
| Independent t-test | analysis.py:45 | Group comparison | Continuous |
| Chi-square | analysis.py:78 | Categorical association | Categorical |
| Pearson correlation | analysis.py:92 | Variable relationship | Continuous |
| Logistic regression | model.py:23 | Prediction | Binary outcome |

1b. Assumption Checklist

For each statistical test, verify assumptions:

Parametric Tests (t-test, ANOVA, Pearson)

Assumption How to Check Evidence in Code? Met?
Normality Shapiro-Wilk, Q-Q plot scipy.stats.shapiro() [ ]
Homogeneity of variance Levene's test scipy.stats.levene() [ ]
Independence Study design N/A [ ]
Continuous data Data type check df.dtype [ ]

Non-parametric Tests (Mann-Whitney, Kruskal-Wallis)

Assumption How to Check Evidence in Code? Met?
Independence Study design N/A [ ]
Ordinal/continuous Data type df.dtype [ ]
Similar distributions Visual inspection Histogram/density [ ]

Regression (Linear, Logistic)

Assumption How to Check Evidence in Code? Met?
Linearity Residual plots plt.scatter(pred, resid) [ ]
Independence Durbin-Watson statsmodels output [ ]
Homoscedasticity Residual plots Visual [ ]
No multicollinearity VIF variance_inflation_factor [ ]
Normality of residuals Q-Q plot scipy.stats.probplot [ ]

1c. Test Appropriateness Review

For each statistical test, verify it matches the data:

markdown
## Test Appropriateness Review

### Test 1: Independent t-test (analysis.py:45)

**Research question**: Is there a difference in [outcome] between [groups]?

**Data characteristics**:
- Outcome variable: [name] — Continuous? [ ] Yes [ ] No
- Groups: [n] groups — Exactly 2? [ ] Yes [ ] No
- Sample sizes: Group A = [n], Group B = [n]
- Distribution: Normal? [ ] Yes [ ] No [ ] Not checked

**Verdict**:
- [ ] APPROPRIATE — Data meets assumptions
- [ ] NEEDS ADJUSTMENT — Consider [alternative test]
- [ ] INAPPROPRIATE — Should use [correct test] because [reason]

**If inappropriate, recommend**:
> The data appears to be [non-normal/skewed/etc.]. Consider using Mann-Whitney U test instead of independent t-test, or apply a transformation.

1d. Multiple Comparisons Check

markdown
## Multiple Comparisons Assessment

**Number of statistical tests performed**: [n]
**Family-wise error rate without correction**: [calculated rate]

**Correction applied in code?**
- [ ] Bonferroni
- [ ] Holm
- [ ] Benjamini-Hochberg (FDR)
- [ ] None

**Recommendation**:
- [ ] Correction appropriate and applied
- [ ] Correction needed but not applied — FLAG
- [ ] Correction not needed (single primary outcome)

1e. Sample Size and Power

markdown
## Sample Size Assessment

**Total sample size**: [n]
**Per-group sample sizes**: [list]

**Power analysis in code?**: [ ] Yes [ ] No

**Concerns**:
- [ ] Sample size adequate for primary analysis
- [ ] Small sample may limit generalizability
- [ ] Subgroup analyses may be underpowered — FLAG

Phase 2: Methods Section Review

When drafts/methods.md is created, review for statistical completeness.

2a. Required Statistical Elements

Check that Methods includes:

Element Present? Correct? Notes
Statistical software and version [ ] [ ]
Significance threshold (α) [ ] [ ] Usually 0.05
Primary outcome definition [ ] [ ]
Statistical tests listed [ ] [ ] Match code?
Assumption handling [ ] [ ] How violations addressed
Multiple comparison correction [ ] [ ] If applicable
Missing data handling [ ] [ ]
Sample size justification [ ] [ ] Power analysis if prospective

2b. Methods Accuracy Check

Compare Methods draft to code analysis:

markdown
## Methods vs Code Comparison

| Described in Methods | Found in Code | Match? |
|---------------------|---------------|--------|
| "Independent t-test" | `scipy.stats.ttest_ind` | ✓ |
| "Bonferroni correction" | `multipletests(..., method='bonferroni')` | ✓ |
| "Logistic regression" | Not found | ✗ FLAG |

2c. Methods Language Review

Check for appropriate statistical language:

Good examples:

  • "Statistical significance was set at p < 0.05"
  • "Data were analyzed using Python 3.9 with SciPy 1.7.1"
  • "Normality was assessed using Shapiro-Wilk test"
  • "Mann-Whitney U test was used due to non-normal distribution"

Problematic language to flag:

  • "The data was statistically significant" (p-values are significant, not data)
  • "Proved the hypothesis" (statistics don't prove, they provide evidence)
  • "Trend toward significance" (either significant or not at chosen α)
  • Missing software versions

Phase 3: Results Section Review

When drafts/results.md is created, review for statistical accuracy.

3a. Statistical Reporting Checklist

For each reported statistic:

Statistic Format Correct? Value Plausible? CI Included?
t(df) = X.XX [ ] [ ] [ ]
p = 0.XXX [ ] [ ] N/A
95% CI [X.XX, Y.YY] [ ] [ ]
OR = X.XX [ ] [ ] [ ]
r = 0.XX [ ] [ ] [ ]

3b. P-value Reporting

Check p-value formatting:

Correct Incorrect
p = 0.043 p = .043 (missing leading zero)
p < 0.001 p = 0.000 (never zero)
p = 0.12 p = NS (report exact value)

3c. Effect Size Verification

Verify effect sizes are:

  • Reported for primary outcomes
  • Accompanied by confidence intervals
  • Interpreted appropriately
markdown
## Effect Size Review

| Finding | Effect Size Reported? | CI Reported? | Interpretation? |
|---------|----------------------|--------------|-----------------|
| Primary outcome | [ ] | [ ] | [ ] |
| Secondary outcome 1 | [ ] | [ ] | [ ] |

3d. Results-Methods Consistency

Verify Results matches Methods:

markdown
## Consistency Check

| Test Mentioned in Methods | Result Reported? | Statistics Complete? |
|---------------------------|------------------|---------------------|
| Independent t-test | [ ] | [ ] t, df, p, CI |
| Chi-square | [ ] | [ ] χ², df, p |
| Correlation | [ ] | [ ] r, p, CI |

3e. Common Errors to Flag

Statistical errors:

  • P-value without test statistic
  • Mean without SD or SE
  • Percentage without denominator
  • Confidence interval with wrong width (99% vs 95%)
  • Degrees of freedom don't match sample size
  • Effect direction unclear

Interpretation errors:

  • "Significant" used for non-significant result
  • Clinical significance conflated with statistical significance
  • Causal language for observational data
  • Overinterpretation of subgroup analyses

Phase 4: Final Statistical Sign-Off

Before manuscript assembly, provide final review.

4a. Statistical Summary Report

Generate notes/statistical-review.md:

markdown
# Statistical Review Report

**Reviewer**: Biostatistician Agent
**Date**: [timestamp]
**Manuscript**: [title]

## Overview

| Aspect | Status |
|--------|--------|
| Methods appropriateness | ✓ Approved / ⚠ Concerns / ✗ Issues |
| Assumption checking | ✓ Approved / ⚠ Concerns / ✗ Issues |
| Results accuracy | ✓ Approved / ⚠ Concerns / ✗ Issues |
| Reporting completeness | ✓ Approved / ⚠ Concerns / ✗ Issues |

## Statistical Methods Used

| Method | Appropriate? | Assumptions Met? |
|--------|--------------|------------------|
| [method 1] | ✓ | ✓ |
| [method 2] | ✓ | ⚠ See note |

## Issues Identified

### Critical (Must Fix)

1. **[Issue]**: [Description]
   - **Location**: [Methods/Results, specific text]
   - **Problem**: [What's wrong]
   - **Recommendation**: [How to fix]

### Minor (Should Fix)

1. **[Issue]**: [Description]
   - **Recommendation**: [Suggestion]

## Verification Checklist

- [ ] All tests appropriate for data types
- [ ] Assumptions checked or acknowledged
- [ ] Multiple comparisons addressed
- [ ] P-values correctly formatted
- [ ] Effect sizes reported with CIs
- [ ] Software versions specified
- [ ] Results match methods described

## Sign-Off

**Statistical Accuracy**: [ ] APPROVED / [ ] APPROVED WITH MINOR REVISIONS / [ ] REVISIONS REQUIRED

**Notes**: [Any final comments]

4b. User Communication

If issues found, present to user:

## Statistical Review Complete

I've reviewed the statistical methods and results. Here's my assessment:

### Status: ⚠ Minor Revisions Recommended

### Issues Found:

1. **Missing effect size for primary outcome**
   - Location: Results, paragraph 2
   - Currently says: "Group A was significantly higher than Group B (p = 0.023)"
   - Should include: Mean difference and 95% CI
   - Suggested revision: "Group A was significantly higher than Group B (mean difference: 12.3, 95% CI: 2.1–22.5, p = 0.023)"

2. **Assumption check not mentioned in Methods**
   - The code checks normality with Shapiro-Wilk, but Methods doesn't mention this
   - Add: "Normality was assessed using Shapiro-Wilk test"

### Approved Elements:

✓ Statistical tests appropriate for data types
✓ P-values correctly formatted
✓ Sample sizes adequate
✓ Multiple comparison correction applied

**Shall I apply these revisions?**

Integration with Other Skills

Called By

  • code-analyzer → Biostatistician reviews methodology
  • results-interpreter → Biostatistician validates statistics
  • assembler → Biostatistician provides final sign-off

Outputs

  • notes/statistical-review.md — Comprehensive review report
  • Annotations in drafts/methods.md — Statistical corrections
  • Annotations in drafts/results.md — Reporting corrections

Quick Reference: Common Statistical Tests

Data Type Groups Test Assumptions
Continuous, normal 2 independent Independent t-test Normality, equal variance
Continuous, normal 2 paired Paired t-test Normality of differences
Continuous, normal 3+ independent One-way ANOVA Normality, equal variance
Continuous, non-normal 2 independent Mann-Whitney U Similar distributions
Continuous, non-normal 2 paired Wilcoxon signed-rank Symmetric differences
Continuous, non-normal 3+ independent Kruskal-Wallis Similar distributions
Categorical 2×2 Chi-square / Fisher's Expected counts ≥ 5
Categorical R×C Chi-square Expected counts ≥ 5
Continuous vs continuous Pearson correlation Normality, linearity
Ordinal/non-normal Spearman correlation Monotonic relationship
Binary outcome Predictors Logistic regression Independence, no multicollinearity
Continuous outcome Predictors Linear regression LINE assumptions
Time-to-event Groups Log-rank test Proportional hazards
Time-to-event Predictors Cox regression Proportional hazards

Output

The biostatistician agent produces:

  • notes/statistical-review.md — Full review report
  • Corrections/annotations for Methods and Results drafts
  • Final sign-off status for manuscript assembly

Return to parent skill with:

  • Statistical accuracy status: [Approved / Revisions needed]
  • Number of issues: [critical: n, minor: n]
  • Key concerns: [list if any]

Didn't find tool you were looking for?

Be as detailed as possible for better results