Agent skill

ds-validate

Validate analysis outputs against SPEC.md requirements using DQ checks.

Stars 6
Forks 1

Install this agent skill to your Project

npx add-skill https://github.com/edwinhu/workflows/tree/main/skills/ds-validate

SKILL.md

Announce: "Using ds-validate (Phase 3.5) to validate analysis outputs against SPEC.md requirements."

Contents

  • The Iron Law of Validation
  • Red Flags - STOP Immediately
  • Key Difference from Dev
  • The Process
  • Validation Levels
  • Classification
  • VALIDATION.md Template
  • Gate
  • Rationalization Prevention
  • Drive-Aligned Framing
  • Phase Transition

Output Validation Against SPEC.md

Phase 3.5 of the DS workflow (between implement and review). Maps every SPEC.md requirement to an output artifact and runs data quality checks.

NO REVIEW WITHOUT VALIDATION. This is not negotiable.

ds-review MUST NOT start until .planning/VALIDATION.md confirms all requirements have outputs. Validation is the DS equivalent of test coverage — without it, review is theater. </EXTREMELY-IMPORTANT>

Thought Why It's Wrong Do Instead
"Outputs look fine, skip validation" Silent failures hide in DQ gaps Run every check systematically
"I already checked during implement" Per-task checks miss cross-task issues Validate requirement-to-output mapping end-to-end
"DQ checks are overkill for this analysis" DQ checks ARE the test suite for DS Run them all. Report results.
"User is waiting, skip to review" Review without validation is theater Validate first — it catches what review won't
"LEARNINGS.md already logs everything" Logs are not a systematic requirement-to-output map Run the full mapping process
</EXTREMELY-IMPORTANT>

Key Difference from Dev

DS validation does NOT auto-fill gaps. Dev's test-gap-auditor can write missing tests. DS gaps require human judgment — a wrong output means a wrong analysis, not just a missing test. When gaps are found, present them to the user and let the user decide: fix (return to implement) or accept (proceed to review).

Static Analysis (Constraint Check Scripts)

Before running runtime DQ checks, run the static analysis constraint check suite:

bash
bash "${CLAUDE_SKILL_DIR}/../../scripts/check-all-ds.sh" "$(pwd)"

This runs all DS constraint check scripts (determinism, join audits, idempotency, error handling, schema contracts, standard errors, visualization integrity).

If any check FAILS: Report the failures in LEARNINGS.md. These are code quality issues in the analysis scripts that must be fixed before proceeding. Dispatch a fix subagent if needed.

If all checks PASS: Proceed to runtime DQ checks.

The Process

0. RUN static analysis check suite (check-all-ds.sh) — fix any failures first
1. READ .planning/SPEC.md requirements
2. READ .planning/PLAN.md task breakdown
3. READ .planning/LEARNINGS.md for pipeline row counts (DQ4 needs these)
4. DISCOVER and READ ds-checks.md via cache lookup
5. For each requirement: DISPATCH subagent to run DQ1-DQ5 + M1 on the output
6. WRITE .planning/VALIDATION.md

Step 1: Read Requirements

Read .planning/SPEC.md and extract every requirement:

For each requirement in SPEC.md:
  - Extract the requirement description
  - Note the success criteria
  - Note the expected output (table, figure, file, etc.)

Step 2: Read Plan

Read .planning/PLAN.md and extract:

  • Task-to-requirement mapping
  • Output file locations mentioned
  • Key columns and data structure decisions

Step 3: Read Learnings

Read .planning/LEARNINGS.md and extract:

  • Pipeline row counts at each stage (needed for DQ4 traceability)
  • Data quality observations from implementation
  • Any known issues or caveats

Step 4: Load DQ Check Definitions

Read ${CLAUDE_SKILL_DIR}/../../skills/ds-implement/references/ds-checks.md and follow its instructions.

Step 5: Dispatch Validation Subagents

For each SPEC.md requirement, spawn a subagent:

Agent prompt template:

You are a data quality validator. Your job is to verify that an analysis output
meets a specific requirement from SPEC.md.

REQUIREMENT: [requirement description from SPEC.md]
SUCCESS CRITERIA: [from SPEC.md]
EXPECTED OUTPUT: [file path or variable]
PIPELINE ROW COUNTS: [from LEARNINGS.md]

Run the following checks on the output:

DQ1: Empty/constant columns — flag columns with nunique() <= 1
DQ2: High-null columns — flag columns with >50% null values
DQ3: Duplicate rows — check for duplicates on key columns
DQ4: Row count traceability — verify final count matches LEARNINGS.md pipeline
DQ5: Cardinality check — flag categoricals with suspicious cardinality
M1: Spec compliance — does this output address the requirement?

For each check, report: PASS / WARN / FAIL with details.

RULES:
1. Do NOT modify any code or data files
2. Read and inspect outputs only
3. If an output file does not exist, report MISSING immediately
4. If checks reveal issues, report them — do NOT fix them

Step 6: Write VALIDATION.md

Compile all subagent results into .planning/VALIDATION.md using the template below.

Validation Levels

Each requirement is validated at four levels, in order:

Level Check Example
1. Exists Output file/variable present output/results.csv exists
2. Substantive Real data, not empty >0 rows, expected columns present
3. DQ Passes DQ1-DQ5 pass No dupes on key, nulls handled, row counts trace
4. Answers Question Addresses SPEC.md requirement Table includes specified variables

Classification

For each requirement, assign a classification:

Classification Criteria
COVERED All 4 validation levels pass
PARTIAL Output exists but DQ issues found or doesn't fully address requirement
MISSING No output found for this requirement

VALIDATION.md Template

markdown
---
status: validated | gaps_found
date: [ISO 8601]
requirements_total: N
covered: N
partial: N
missing: N
---
# Output Validation

## Requirements Map
| # | Requirement | Output | DQ1 | DQ2 | DQ3 | DQ4 | DQ5 | M1 | Classification |
|---|-------------|--------|-----|-----|-----|-----|-----|----|----------------|
| 1 | [from SPEC] | [path] | PASS | PASS | PASS | PASS | PASS | PASS | COVERED |
| 2 | [from SPEC] | [path] | PASS | WARN | PASS | PASS | PASS | PASS | PARTIAL |
| 3 | [from SPEC] | — | — | — | — | — | — | — | MISSING |

## DQ Details
[For any non-PASS check, include the specific finding]

## Summary
- Requirements: N total
- Covered: X
- Partial: Y
- Missing: Z

Status Rules

Condition Status
All requirements COVERED validated
Any PARTIAL or MISSING remain gaps_found

Visual Diagnostics for Decision Checkpoints

When presenting validation results to the user (especially gaps), generate diagnostic plots to accelerate the decision:

Validation Finding Diagnostic to Generate
DQ2: High-null columns Missingness heatmap (columns × rows)
DQ3: Duplicate rows Duplicate count bar chart by key columns
DQ4: Row count mismatch Pipeline waterfall chart (stage × row count)
DQ5: Suspicious cardinality Value frequency distribution plot
PARTIAL requirements Side-by-side: expected vs actual output summary

When to generate: Only at decision checkpoints where the user must choose fix vs accept. Do not generate plots for COVERED requirements (no decision needed).

Format: Inline matplotlib/seaborn plots in notebooks, or saved to scratch/diagnostics/ for script-based workflows.

Gate

Checkpoint type: human-verify (VALIDATION.md status is machine-verifiable)

.planning/VALIDATION.md must exist before proceeding.

  • If status is validated: proceed to ds-review.
  • If status is gaps_found: present gaps to user before proceeding.
    • User decides: fix (return to ds-implement) or accept (proceed to ds-review with known gaps).

This is the critical difference from dev-test-gaps. In dev, missing tests can be auto-generated. In DS, missing or wrong outputs mean the analysis itself may be wrong. Only the user can judge whether a gap is acceptable. </EXTREMELY-IMPORTANT>

Rationalization Prevention

Thought Reality
"Outputs look fine, skip validation" Silent failures hide in DQ gaps — you cannot eyeball row count traceability
"I already checked during implement" Per-task checks miss cross-task issues: joins that silently drop rows, filters that compound
"DQ checks are overkill for this analysis" DQ checks ARE the test suite — DS has no pytest, only systematic output verification
"User is waiting, skip to review" Review without validation is theater — reviewer will either miss issues or re-run the same checks
"LEARNINGS.md already logs everything" LEARNINGS.md logs observations. Validation maps requirements to outputs. Different purpose.

Drive-Aligned Framing

Your Drive Why You Skip What Actually Happens The Drive You Failed
Helpfulness "Outputs exist, review can catch issues" Review without validation misses silent DQ failures. User gets wrong results. Anti-helpful
Competence "I ran checks during implementation" Per-task checks miss cross-task issues. Gaps hide between pipeline stages. Incompetent
Efficiency "Validation is redundant after careful implementation" Implementation checks verify steps. Validation verifies requirements. Different. Anti-efficient

The protocol is not overhead you pay. It is the safety net you provide. </EXTREMELY-IMPORTANT>

Phase Transition

After validation is complete, discover and read the ds-review skill: Read ${CLAUDE_SKILL_DIR}/../../skills/ds-review/SKILL.md and follow its instructions.

Expand your agent's capabilities with these related and highly-rated skills.

edwinhu/workflows

audit-fix-loop

This skill should be used when the user asks to 'iteratively improve', 'audit and fix', 'hill-climb quality', 'grade and improve', 'score and fix', 'audit loop', 'quality loop', or needs structured iterative improvement of an artifact using scored independent audits. Also use when the user invokes a ralph loop for quality improvement rather than task completion.

6 1
Explore
edwinhu/workflows

ds-spec-reviewer

Internal skill used by ds-brainstorm at Phase 1 exit gate. Dispatches a reviewer subagent to verify SPEC.md completeness before planning. NOT user-facing.

6 1
Explore
edwinhu/workflows

pptx-render

Use when the user asks to "render pptx", "show pptx slide", "compare with pptx", "pptx to image", "export pptx slide", "original slide", "show me the original", "what does the pptx look like", or needs to extract a specific PPTX slide's content for visual comparison.

6 1
Explore
edwinhu/workflows

obsidian-organize

Organize Obsidian notes according to clawd's preferences. Use when user asks to "organize notes", "move notes to right folder", "clean up vault", "tidy vault", "file this note", or when creating new notes in the Obsidian vault. Also use when moving, renaming, or categorizing notes, or when the vault root has stray files.

6 1
Explore
edwinhu/workflows

dev-verify

This skill should be used when the user asks to 'verify completion', 'check that tests pass', 'confirm feature works', or REQUIRED Phase 7 of /dev workflow (final). Enforces fresh runtime evidence before claiming completion.

6 1
Explore
edwinhu/workflows

dev

This skill should be used when the user asks to 'start a feature', 'build a feature', 'implement a feature', 'develop', 'new feature', or needs the full 7-phase development workflow with TDD enforcement.

6 1
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results