Agent skill
data-analysis
Use to investigate data for surprising, actionable insights
Install this agent skill to your Project
npx add-skill https://github.com/sanand0/scripts/tree/main/agents/data-analysis
SKILL.md
Investigative Data Analysis
Hunt for stories that make smart readers lean forward and say "wait, really?" — findings that are high-impact, surprising, actionable, and defensible.
This is a DETAILED process. Create a PLAN and execute step by step.
1 — Understand the Data
- Structure: Dimensions (categorical) vs. measures (numeric), types, granularity, field relationships.
- Quality: Completeness, missing values, outliers, duplicates, encoding issues.
- Distribution: Value ranges, (log) normality, skewness, heavy tails, zero-inflation.
- Derived potential: Computable metrics (features, targets), joins, aggregations, time-series constructions.
2 — Define What Matters
- Who are the audiences and what are their key questions?
- What decisions could findings actually inform? What's actionable vs. merely interesting?
- What would contradict conventional wisdom or reveal hidden patterns?
3 — Hunt for Signal
Apply diverse analysis toolkits ranging from statistical tests to geospatial, network, NLP, time series, cohort, segmentation, survival analysis, etc. to expand the insights pool.
Look for stories that confirm something suspected but never proven, or overturn something everyone assumes is true:
- Extreme/unexpected distributions: What's at the tails? What shouldn't be there?
- Pattern breaks: Where does a trend suddenly shift? What changed, and when?
- Surprising correlations: What moves together that shouldn't? What's independent that should correlate?
- Standout entities: Who dramatically overperforms or underperforms relative to peers? Who drives trends vs. bucks them?
- Hidden populations: What patterns disappear in aggregate but emerge in subgroups? (Watch for Simpson's Paradox.)
- Dot connections: What patterns emerge when combining fields that seem unrelated at first?
- Clusters: What clusters or communities emerge? Where are the overlaps and outliers?
Search internally / externally:
- Discover domain-specific rules, context, that have an impact
- Search for WHY this happened
- Surface confounders
- Explore prior research
Find Leverage Points:
- Underutilized resources or capabilities
- Phase transitions: thresholds where behavior shifts nonlinearly
- Tipping points: what small change would move the aggregate needle?
- What actions are specific and implementable, not just directionally correct?
4 — Verify & Stress-Test
Cross-check externally: Is there outside evidence (benchmarks, research, industry data) that supports, refines, or contradicts the finding?
Test robustness: Does the finding hold under cross model checks, alternative model specs, thresholds, sub-samples, or time windows? Does a placebo test (shuffled labels, random baseline) reproduce it? If so, it's noise.
Check for errors & bias: Examine data provenance, definitions, collection methodology. Control for confounders, base rates, uncertainty. What's missing? Selection and survivorship bias are silent killers.
Check for logical fallacies:
- Correlation vs. causation — is there a plausible mechanism, or just co-movement?
- Goodhart's Law — is the metric gamed? Does measuring it change behavior?
- Simpson's Paradox — does segmentation flip the trend?
- Regression to the mean — are extreme values just natural variation reverting?
- Occam's Razor — is there a simpler explanation you're overlooking?
- Survivorship/selection bias — what's missing from the data entirely?
- Second-order effects — what happens downstream beyond the immediate impact?
- Inversion — try to disprove the finding. If you can't, it's more credible.
Consider limitations: What cannot be concluded? What caveats must accompany the finding to avoid misuse?
5 — Prioritize & Package
Select insights that are high-impact (meaningful effect sizes vs. base rates, not incremental), actionable (specific and implementable, not just "invest more in X"), surprising (challenges assumptions, reveals hidden patterns), and defensible (robust under scrutiny, bias-checked).
Lead with the most compelling finding → evidence → caveats → what to do with it.
Tone: Write like a journalist, not a statistician. Say "Sales reps in the Northeast close 2× faster — but only for deals under $10K", not "There may be some regional variation." Findings should make a smart reader lean forward.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
llm
Call LLM via CLI for transcription, vision, speech/image generation, piping prompts, sub-agents, ...
data-story
Write data findings as a compelling narrative story, Malcolm Gladwell prose, NYT graphics-team visuals, engaging & memorable even for a non-technical audience.
design
ALWAYS follow this design guide for any front-end work
demos
Use when creating demos or POCs
uv-uvx
Tips on using uv and uvx (Python build tools) effectively with GitHub, Torch, etc.
openai-docs
Use when the user asks how to build with OpenAI products or APIs and needs up-to-date official documentation with citations, help choosing the latest model for a use case, or explicit GPT-5.4 upgrade and prompt-upgrade guidance; prioritize OpenAI docs MCP tools, use bundled references only as helper context, and restrict any fallback browsing to official OpenAI domains.
Didn't find tool you were looking for?