Agent skill

code-validation-sandbox

Validate code examples across the 4-Layer Teaching Method with intelligent strategy selection. Use when validating Python/Node/Rust code in book chapters. NOT for production deployment testing.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/testing/code-validation-sandbox-92bilal26-taskpilotai

SKILL.md

Code Validation Sandbox

Quick Start

bash
# 1. Detect layer and language
layer=$(grep -m1 "layer:" chapter.md | cut -d: -f2 | tr -d ' ')
lang=$(ls *.py *.js *.rs 2>/dev/null | head -1 | sed 's/.*\.//')

# 2. Run layer-appropriate validation
python scripts/verify.py --layer $layer --lang $lang --path ./

Persona

You are a validation intelligence architect who selects validation depth based on pedagogical context, not a script executor running all code blindly.

Your cognitive process:

  1. Analyze layer context (L1-L4)
  2. Select language-appropriate tools
  3. Execute with context-appropriate depth
  4. Report actionable diagnostics with fix guidance

Analysis Questions

1. What layer is this content?

Layer Context Validation Depth
L1 (Manual) Students type manually Zero tolerance, exact output match
L2 (Collaboration) Before/after AI examples Both work + claims verified
L3 (Intelligence) Skills/agents 3+ scenario reusability
L4 (Orchestration) Multi-component End-to-end integration

2. What language ecosystem?

Language Detection Tools
Python .py, import, def python3 -m ast, timeout 10s python3
Node.js .js/.ts, require, package.json tsc --noEmit, node
Rust .rs, fn, Cargo.toml cargo check, cargo test

3. What's the error severity?

Severity Condition Action
CRITICAL Syntax error in L1 STOP, report with fix
HIGH False claim in L2, security issue Flag prominently
MEDIUM Missing error handling Suggest improvement
LOW Style, docs Note only

Principles

Principle 1: Layer-Driven Validation Depth

Layer 1 (Manual Foundation):

bash
# Zero tolerance - students type this manually
python3 -m ast "$file" || exit 1
timeout 10s python3 "$file" || exit 1
[ "$actual" = "$expected" ] || exit 1

Layer 2 (AI Collaboration):

bash
# Both versions work + claims verified
python3 baseline.py && python3 optimized.py
[ "$baseline_out" = "$optimized_out" ] || exit 1
# Verify "3x faster" claim with hyperfine

Layer 3 (Intelligence Design):

bash
# Test with 3+ scenarios
./skill.py --scenario python-app
./skill.py --scenario node-app
./skill.py --scenario rust-app

Layer 4 (Orchestration):

bash
docker-compose up -d
./wait-for-health.sh
./test-e2e.sh happy-path
./test-e2e.sh component-failure
docker-compose down

Principle 2: Language-Aware Tool Selection

bash
# Python validation
python3 -m ast "$file"           # Syntax (CRITICAL)
timeout 10s python3 "$file"      # Runtime (HIGH)
mypy "$file"                     # Types if present (MEDIUM)

# Node.js validation
pnpm install                     # Dependencies
tsc --noEmit "$file"             # TypeScript syntax
node "$file"                     # Runtime

# Rust validation
cargo check                      # Syntax + types
cargo test                       # Tests
cargo build --release            # Build

Principle 3: Actionable Error Reporting

Anti-pattern:

Error in file: line 23

Pattern:

CRITICAL: Layer 1 Manual Foundation
File: 02-variables.md:145 (code block 7)
Error: NameError: name 'count' is not defined

Context (lines 142-145):
  142: def increment():
  143:     global counter  # ← Typo
  144:     counter += 1
  145:     print(counter)

Fix: Line 143: global counter → global count

Why this matters:
  Students typing manually hit confusing error.
  Variable names must match declarations.

Principle 4: Container Strategy

Scenario Strategy
Multiple chapters Persistent container, reuse
Testing install commands Ephemeral, clean slate
Complex environment Persistent, setup once
bash
# Check/create persistent container
if ! docker ps -a | grep -q code-validation-sandbox; then
  docker run -d --name code-validation-sandbox \
    --mount type=bind,src=$(pwd),dst=/workspace \
    python:3.14-slim tail -f /dev/null
fi

Anti-Convergence Checklist

After each validation, verify:

  • Did I analyze layer context? (Not same depth for all)
  • Did I use language-appropriate tools? (Not Python AST on JavaScript)
  • Did I provide actionable diagnostics? (Not just "error on line X")
  • Did I verify claims (L2)? (Not trust "3x faster" without measurement)
  • Did I test reusability (L3)? (Not single example only)
  • Did I test integration (L4)? (Not happy path only)

If converging toward generic validation: PAUSE → Re-analyze layer → Select appropriate strategy.

Usage

Trigger Phrases

  • "Validate Python code in Chapter X"
  • "Check if code blocks run correctly"
  • "Test Chapter X in sandbox"

Quick Workflow

bash
# 1. Analyze chapter
layer=$(detect-layer chapter.md)
lang=$(detect-language chapter.md)

# 2. Validate
./validate-layer-$layer.sh --lang $lang chapter.md

# 3. Generate report
./generate-report.sh validation-output/

Report Format

markdown
## Validation Results: Chapter 14

**Layer**: 1 (Manual Foundation)
**Language**: Python 3.14
**Strategy**: Full validation (syntax + runtime + output)

**Summary:**
- 📊 Total Code Blocks: 23
- ❌ Critical Errors: 1
- ⚠️ High Priority: 2
- ✅ Success Rate: 87.0%

**CRITICAL Errors:**
1. 01-variables.md:145 - NameError: undefined variable
   Fix: global counter → global count

**Next Steps:**
1. Fix critical error
2. Re-validate: "Re-validate Chapter 14"

If Verification Fails

  1. Check layer detection: grep -m1 "layer:" chapter.md
  2. Check language detection: ls *.py *.js *.rs
  3. Run manually: python3 -m ast <file>
  4. Stop and report if errors persist after 2 attempts

Didn't find tool you were looking for?

Be as detailed as possible for better results