Agent skill
statistical-analysis-when-to-use
Sub-skill of statistical-analysis: When to Use (+4).
Install this agent skill to your Project
npx add-skill https://github.com/vamseeachanta/workspace-hub/tree/main/.claude/skills/_archive/data/analytics/statistical-analysis/when-to-use
SKILL.md
When to Use (+4)
When to Use
Use hypothesis testing when you need to determine whether an observed difference is likely real or could be due to random chance. Common scenarios:
- A/B test results: Is variant B actually better than A?
- Before/after comparison: Did the product change actually move the metric?
- Segment comparison: Do enterprise customers really have higher retention?
The Framework
- Null hypothesis (H0): There is no difference (the default assumption)
- Alternative hypothesis (H1): There is a difference
- Choose significance level (alpha): Typically 0.05 (5% chance of false positive)
- Compute test statistic and p-value
- Interpret: If p < alpha, reject H0 (evidence of a real difference)
Common Tests
| Scenario | Test | When to Use |
|---|---|---|
| Compare two group means | t-test (independent) | Normal data, two groups |
| Compare two group proportions | z-test for proportions | Conversion rates, binary outcomes |
| Compare paired measurements | Paired t-test | Before/after on same entities |
| Compare 3+ group means | ANOVA | Multiple segments or variants |
| Non-normal data, two groups | Mann-Whitney U test | Skewed metrics, ordinal data |
| Association between categories | Chi-squared test | Two categorical variables |
Practical Significance vs. Statistical Significance
Statistical significance means the difference is unlikely due to chance.
Practical significance means the difference is large enough to matter for business decisions.
A difference can be statistically significant but practically meaningless (common with large samples). Always report:
- Effect size: How big is the difference? (e.g., "Variant B improved conversion by 0.3 percentage points")
- Confidence interval: What's the range of plausible true effects?
- Business impact: What does this translate to in revenue, users, or other business terms?
Sample Size Considerations
- Small samples produce unreliable results, even with significant p-values
- Rule of thumb for proportions: Need at least 30 events per group for basic reliability
- For detecting small effects (e.g., 1% conversion rate change), you may need thousands of observations per group
- If your sample is small, say so: "With only 200 observations per group, we have limited power to detect effects smaller than X%"
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
gsd-complete-milestone
Archive completed milestone and prepare for next version
gsd-reapply-patches
Reapply local modifications after a GSD update
gsd-verify-work
Validate built features through conversational UAT
gsd-thread
Manage persistent context threads for cross-session work
clinical-trial-protocol
Generate clinical trial protocols for medical devices or drugs through a modular, waypoint-based architecture with research-only and full protocol modes.
single-cell-rna-qc
Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations.
Didn't find tool you were looking for?