Agent skill
data-validation-data-quality-checks
Sub-skill of data-validation: Data Quality Checks (+3).
Install this agent skill to your Project
npx add-skill https://github.com/vamseeachanta/workspace-hub/tree/main/.claude/skills/_archive/data/analytics/data-validation/data-quality-checks
SKILL.md
Data Quality Checks (+3)
Data Quality Checks
- Source verification: Confirmed which tables/data sources were used. Are they the right ones for this question?
- Freshness: Data is current enough for the analysis. Noted the "as of" date.
- Completeness: No unexpected gaps in time series or missing segments.
- Null handling: Checked null rates in key columns. Nulls are handled appropriately (excluded, imputed, or flagged).
- Deduplication: Confirmed no double-counting from bad joins or duplicate source records.
- Filter verification: All WHERE clauses and filters are correct. No unintended exclusions.
Calculation Checks
- Aggregation logic: GROUP BY includes all non-aggregated columns. Aggregation level matches the analysis grain.
- Denominator correctness: Rate and percentage calculations use the right denominator. Denominators are non-zero.
- Date alignment: Comparisons use the same time period length. Partial periods are excluded or noted.
- Join correctness: JOIN types are appropriate (INNER vs LEFT). Many-to-many joins haven't inflated counts.
- Metric definitions: Metrics match how stakeholders define them. Any deviations are noted.
- Subtotals sum: Parts add up to the whole where expected. If they don't, explain why (e.g., overlap).
Reasonableness Checks
- Magnitude: Numbers are in a plausible range. Revenue isn't negative. Percentages are between 0-100%.
- Trend continuity: No unexplained jumps or drops in time series.
- Cross-reference: Key numbers match other known sources (dashboards, previous reports, finance data).
- Order of magnitude: Total revenue is in the right ballpark. User counts match known figures.
- Edge cases: What happens at the boundaries? Empty segments, zero-activity periods, new entities.
Presentation Checks
- Chart accuracy: Bar charts start at zero. Axes are labeled. Scales are consistent across panels.
- Number formatting: Appropriate precision. Consistent currency/percentage formatting. Thousands separators where needed.
- Title clarity: Titles state the insight, not just the metric. Date ranges are specified.
- Caveat transparency: Known limitations and assumptions are stated explicitly.
- Reproducibility: Someone else could recreate this analysis from the documentation provided.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
gsd-complete-milestone
Archive completed milestone and prepare for next version
gsd-reapply-patches
Reapply local modifications after a GSD update
gsd-verify-work
Validate built features through conversational UAT
gsd-thread
Manage persistent context threads for cross-session work
clinical-trial-protocol
Generate clinical trial protocols for medical devices or drugs through a modular, waypoint-based architecture with research-only and full protocol modes.
single-cell-rna-qc
Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations.
Didn't find tool you were looking for?