Agent skill
ds-validate
Validate analysis outputs against SPEC.md requirements using DQ checks.
Install this agent skill to your Project
npx add-skill https://github.com/edwinhu/workflows/tree/main/skills/ds-validate
SKILL.md
Announce: "Using ds-validate (Phase 3.5) to validate analysis outputs against SPEC.md requirements."
Contents
- The Iron Law of Validation
- Red Flags - STOP Immediately
- Key Difference from Dev
- The Process
- Validation Levels
- Classification
- VALIDATION.md Template
- Gate
- Rationalization Prevention
- Drive-Aligned Framing
- Phase Transition
Output Validation Against SPEC.md
Phase 3.5 of the DS workflow (between implement and review). Maps every SPEC.md requirement to an output artifact and runs data quality checks.
NO REVIEW WITHOUT VALIDATION. This is not negotiable.
ds-review MUST NOT start until .planning/VALIDATION.md confirms all requirements have outputs. Validation is the DS equivalent of test coverage — without it, review is theater.
</EXTREMELY-IMPORTANT>
| Thought | Why It's Wrong | Do Instead |
|---|---|---|
| "Outputs look fine, skip validation" | Silent failures hide in DQ gaps | Run every check systematically |
| "I already checked during implement" | Per-task checks miss cross-task issues | Validate requirement-to-output mapping end-to-end |
| "DQ checks are overkill for this analysis" | DQ checks ARE the test suite for DS | Run them all. Report results. |
| "User is waiting, skip to review" | Review without validation is theater | Validate first — it catches what review won't |
| "LEARNINGS.md already logs everything" | Logs are not a systematic requirement-to-output map | Run the full mapping process |
| </EXTREMELY-IMPORTANT> |
Key Difference from Dev
DS validation does NOT auto-fill gaps. Dev's test-gap-auditor can write missing tests. DS gaps require human judgment — a wrong output means a wrong analysis, not just a missing test. When gaps are found, present them to the user and let the user decide: fix (return to implement) or accept (proceed to review).
Static Analysis (Constraint Check Scripts)
Before running runtime DQ checks, run the static analysis constraint check suite:
bash "${CLAUDE_SKILL_DIR}/../../scripts/check-all-ds.sh" "$(pwd)"
This runs all DS constraint check scripts (determinism, join audits, idempotency, error handling, schema contracts, standard errors, visualization integrity).
If any check FAILS: Report the failures in LEARNINGS.md. These are code quality issues in the analysis scripts that must be fixed before proceeding. Dispatch a fix subagent if needed.
If all checks PASS: Proceed to runtime DQ checks.
The Process
0. RUN static analysis check suite (check-all-ds.sh) — fix any failures first
1. READ .planning/SPEC.md requirements
2. READ .planning/PLAN.md task breakdown
3. READ .planning/LEARNINGS.md for pipeline row counts (DQ4 needs these)
4. DISCOVER and READ ds-checks.md via cache lookup
5. For each requirement: DISPATCH subagent to run DQ1-DQ5 + M1 on the output
6. WRITE .planning/VALIDATION.md
Step 1: Read Requirements
Read .planning/SPEC.md and extract every requirement:
For each requirement in SPEC.md:
- Extract the requirement description
- Note the success criteria
- Note the expected output (table, figure, file, etc.)
Step 2: Read Plan
Read .planning/PLAN.md and extract:
- Task-to-requirement mapping
- Output file locations mentioned
- Key columns and data structure decisions
Step 3: Read Learnings
Read .planning/LEARNINGS.md and extract:
- Pipeline row counts at each stage (needed for DQ4 traceability)
- Data quality observations from implementation
- Any known issues or caveats
Step 4: Load DQ Check Definitions
Read ${CLAUDE_SKILL_DIR}/../../skills/ds-implement/references/ds-checks.md and follow its instructions.
Step 5: Dispatch Validation Subagents
For each SPEC.md requirement, spawn a subagent:
Agent prompt template:
You are a data quality validator. Your job is to verify that an analysis output
meets a specific requirement from SPEC.md.
REQUIREMENT: [requirement description from SPEC.md]
SUCCESS CRITERIA: [from SPEC.md]
EXPECTED OUTPUT: [file path or variable]
PIPELINE ROW COUNTS: [from LEARNINGS.md]
Run the following checks on the output:
DQ1: Empty/constant columns — flag columns with nunique() <= 1
DQ2: High-null columns — flag columns with >50% null values
DQ3: Duplicate rows — check for duplicates on key columns
DQ4: Row count traceability — verify final count matches LEARNINGS.md pipeline
DQ5: Cardinality check — flag categoricals with suspicious cardinality
M1: Spec compliance — does this output address the requirement?
For each check, report: PASS / WARN / FAIL with details.
RULES:
1. Do NOT modify any code or data files
2. Read and inspect outputs only
3. If an output file does not exist, report MISSING immediately
4. If checks reveal issues, report them — do NOT fix them
Step 6: Write VALIDATION.md
Compile all subagent results into .planning/VALIDATION.md using the template below.
Validation Levels
Each requirement is validated at four levels, in order:
| Level | Check | Example |
|---|---|---|
| 1. Exists | Output file/variable present | output/results.csv exists |
| 2. Substantive | Real data, not empty | >0 rows, expected columns present |
| 3. DQ Passes | DQ1-DQ5 pass | No dupes on key, nulls handled, row counts trace |
| 4. Answers Question | Addresses SPEC.md requirement | Table includes specified variables |
Classification
For each requirement, assign a classification:
| Classification | Criteria |
|---|---|
| COVERED | All 4 validation levels pass |
| PARTIAL | Output exists but DQ issues found or doesn't fully address requirement |
| MISSING | No output found for this requirement |
VALIDATION.md Template
---
status: validated | gaps_found
date: [ISO 8601]
requirements_total: N
covered: N
partial: N
missing: N
---
# Output Validation
## Requirements Map
| # | Requirement | Output | DQ1 | DQ2 | DQ3 | DQ4 | DQ5 | M1 | Classification |
|---|-------------|--------|-----|-----|-----|-----|-----|----|----------------|
| 1 | [from SPEC] | [path] | PASS | PASS | PASS | PASS | PASS | PASS | COVERED |
| 2 | [from SPEC] | [path] | PASS | WARN | PASS | PASS | PASS | PASS | PARTIAL |
| 3 | [from SPEC] | — | — | — | — | — | — | — | MISSING |
## DQ Details
[For any non-PASS check, include the specific finding]
## Summary
- Requirements: N total
- Covered: X
- Partial: Y
- Missing: Z
Status Rules
| Condition | Status |
|---|---|
| All requirements COVERED | validated |
| Any PARTIAL or MISSING remain | gaps_found |
Visual Diagnostics for Decision Checkpoints
When presenting validation results to the user (especially gaps), generate diagnostic plots to accelerate the decision:
| Validation Finding | Diagnostic to Generate |
|---|---|
| DQ2: High-null columns | Missingness heatmap (columns × rows) |
| DQ3: Duplicate rows | Duplicate count bar chart by key columns |
| DQ4: Row count mismatch | Pipeline waterfall chart (stage × row count) |
| DQ5: Suspicious cardinality | Value frequency distribution plot |
| PARTIAL requirements | Side-by-side: expected vs actual output summary |
When to generate: Only at decision checkpoints where the user must choose fix vs accept. Do not generate plots for COVERED requirements (no decision needed).
Format: Inline matplotlib/seaborn plots in notebooks, or saved to scratch/diagnostics/ for script-based workflows.
Gate
Checkpoint type: human-verify (VALIDATION.md status is machine-verifiable)
.planning/VALIDATION.md must exist before proceeding.
- If status is
validated: proceed to ds-review. - If status is
gaps_found: present gaps to user before proceeding.- User decides: fix (return to ds-implement) or accept (proceed to ds-review with known gaps).
This is the critical difference from dev-test-gaps. In dev, missing tests can be auto-generated. In DS, missing or wrong outputs mean the analysis itself may be wrong. Only the user can judge whether a gap is acceptable. </EXTREMELY-IMPORTANT>
Rationalization Prevention
| Thought | Reality |
|---|---|
| "Outputs look fine, skip validation" | Silent failures hide in DQ gaps — you cannot eyeball row count traceability |
| "I already checked during implement" | Per-task checks miss cross-task issues: joins that silently drop rows, filters that compound |
| "DQ checks are overkill for this analysis" | DQ checks ARE the test suite — DS has no pytest, only systematic output verification |
| "User is waiting, skip to review" | Review without validation is theater — reviewer will either miss issues or re-run the same checks |
| "LEARNINGS.md already logs everything" | LEARNINGS.md logs observations. Validation maps requirements to outputs. Different purpose. |
Drive-Aligned Framing
| Your Drive | Why You Skip | What Actually Happens | The Drive You Failed |
|---|---|---|---|
| Helpfulness | "Outputs exist, review can catch issues" | Review without validation misses silent DQ failures. User gets wrong results. | Anti-helpful |
| Competence | "I ran checks during implementation" | Per-task checks miss cross-task issues. Gaps hide between pipeline stages. | Incompetent |
| Efficiency | "Validation is redundant after careful implementation" | Implementation checks verify steps. Validation verifies requirements. Different. | Anti-efficient |
The protocol is not overhead you pay. It is the safety net you provide. </EXTREMELY-IMPORTANT>
Phase Transition
After validation is complete, discover and read the ds-review skill:
Read ${CLAUDE_SKILL_DIR}/../../skills/ds-review/SKILL.md and follow its instructions.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
audit-fix-loop
This skill should be used when the user asks to 'iteratively improve', 'audit and fix', 'hill-climb quality', 'grade and improve', 'score and fix', 'audit loop', 'quality loop', or needs structured iterative improvement of an artifact using scored independent audits. Also use when the user invokes a ralph loop for quality improvement rather than task completion.
ds-spec-reviewer
Internal skill used by ds-brainstorm at Phase 1 exit gate. Dispatches a reviewer subagent to verify SPEC.md completeness before planning. NOT user-facing.
pptx-render
Use when the user asks to "render pptx", "show pptx slide", "compare with pptx", "pptx to image", "export pptx slide", "original slide", "show me the original", "what does the pptx look like", or needs to extract a specific PPTX slide's content for visual comparison.
obsidian-organize
Organize Obsidian notes according to clawd's preferences. Use when user asks to "organize notes", "move notes to right folder", "clean up vault", "tidy vault", "file this note", or when creating new notes in the Obsidian vault. Also use when moving, renaming, or categorizing notes, or when the vault root has stray files.
dev-verify
This skill should be used when the user asks to 'verify completion', 'check that tests pass', 'confirm feature works', or REQUIRED Phase 7 of /dev workflow (final). Enforces fresh runtime evidence before claiming completion.
dev
This skill should be used when the user asks to 'start a feature', 'build a feature', 'implement a feature', 'develop', 'new feature', or needs the full 7-phase development workflow with TDD enforcement.
Didn't find tool you were looking for?