Agent skill
nw-agent-testing
5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance
Install this agent skill to your Project
npx add-skill https://github.com/nWave-ai/nWave/tree/main/plugins/nw/skills/nw-agent-testing
SKILL.md
Agent Testing Framework
5-Layer Testing Approach
Layer 1: Output Quality (Unit-Level)
Validate agent produces correct, well-structured outputs for typical inputs.
Test: Agent follows workflow phases | Outputs match expected format/structure | Domain-specific rules correctly applied | Token efficiency within bounds
How: Manual invocation with representative inputs. Check against acceptance criteria in agent description.
Layer 2: Integration / Handoff Validation
Validate correct input/output between agents in workflows.
Test: Input parsing handles upstream format | Output format matches downstream expectations | Error signals propagate correctly | Subagent mode activation works (skip greet, execute autonomously)
How: End-to-end workflow execution through full agent chain (e.g., DISCUSS -> DESIGN -> DELIVER).
Layer 3: Adversarial Output Validation
Challenge validity of agent outputs rather than accepting at face value.
Test: Source verification (cited sources real and accurate?) | Bias detection (favors one approach without evidence?) | Edge case coverage | Completeness (required sections present?)
How: Peer review by -reviewer agent using structured critique dimensions.
Layer 4: Adversarial Verification (Peer Review)
Independent review to catch biases and blind spots in agent design.
Test: Definition follows validation checklist? | Redundant Claude default instructions? | Over/under-specified? | Could simpler agent achieve same results?
How: @nw-agent-builder validates via 11-point checklist or @agent-builder-reviewer runs structured review.
Layer 5: Security Validation
Test resilience against misuse and prompt injection.
Test: Tool restriction enforcement | maxTurns respected | Permission mode correctly scoped | Agent stays within declared scope
How: Frontmatter fields enforce at platform level. Verify configuration.
Prompt Injection Resistance
Claude Code platform provides injection resistance through: subagent isolation (own context, no sub-subagents) | Tool restriction via frontmatter tools | Permission modes via permissionMode | Hook-based validation (PreToolUse, PostToolUse)
Do NOT add prose-based injection defense. Configure platform features:
---
tools: Read, Glob, Grep # Only tools this agent needs
maxTurns: 30 # Prevents runaway execution
permissionMode: default # User approves dangerous actions
---
Security Validation Checklist
-
toolsrestricted to minimum necessary (least privilege) -
maxTurnsset to prevent runaway execution -
permissionModeappropriate for risk level - No
Bashunless agent requires command execution - No
Writeunless agent creates/modifies files - Description accurately describes scope
- Subagent mode handles autonomous execution correctly
- No sensitive data hardcoded in definition
Testing Workflow for New Agents
- Create with minimal definition
- Layer 1: Invoke with 2-3 representative inputs, check outputs
- Layer 2: Run in workflow chain if applicable
- Fix failures observed
- Validate: Run 11-point checklist
- Iterate: Add instructions only for observed failure modes
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
nw-research
Gathers knowledge from web and files, cross-references across multiple sources, and produces cited research documents. Use when investigating technologies, patterns, or decisions that need evidence backing.
nw-distill
Acceptance test creation methodology for the DISTILL wave. Domain knowledge for the acceptance designer agent: port-to-port principle, prior wave reading, wave-decision reconciliation, graceful degradation, and document back-propagation.
nw-review-output-format
YAML output format and approval criteria for platform design reviews. Load when generating review feedback.
nw-ddd-tactical
Tactical DDD — aggregate design rules, entities, value objects, domain events, repositories, domain services, and anti-pattern detection
nw-infrastructure-and-observability
Infrastructure as Code patterns (Terraform, Kubernetes), observability design (SLOs, metrics, alerting, dashboards), and pipeline security stages. Load when designing infrastructure, observability, or security scanning.
nw-par-critique-dimensions
Platform design review critique dimensions and severity levels. Load when reviewing CI/CD pipelines, infrastructure, deployment strategies, observability, or security designs.
Didn't find tool you were looking for?