Agent skill
nw-property-based-testing
Property-based testing strategies, mutation testing, shrinking, and combined PBT+mutation workflow for test quality validation
Install this agent skill to your Project
npx add-skill https://github.com/nWave-ai/nWave/tree/main/nWave/skills/nw-property-based-testing
SKILL.md
Property-Based Testing and Mutation Testing
Deferred to Phase 2.25: Mutation testing runs ONCE per feature as final quality gate at orchestrator Phase 2.25 (after all steps complete). Do NOT run mutation testing during inner TDD loop.
Property-Based Testing (PBT)
Instead of examples ("given X, expect Y"), write properties ("for all valid inputs, condition Z holds"). Framework generates hundreds/thousands of inputs checking property. Dramatically expands test coverage.
Property Patterns
- Invariants: "for all inputs, condition holds" (sorted list is ordered, balance >= 0)
- Roundtrip: "encode then decode = original" (serialize/deserialize, compress/decompress)
- Oracle: "compare against reference implementation" (optimized vs correct-but-slow)
- Metamorphic: "different operations, same result" (add(a,b)==add(b,a), filter can't increase size)
Shrinking
When property fails, framework auto-finds minimal failing input. Dramatically accelerates debugging. Algorithm: find failing input -> try simpler variants -> if still fails, use as new candidate -> repeat.
PBT Tools by Language
| Language | Framework |
|---|---|
| Python | Hypothesis |
| JavaScript/TypeScript | fast-check |
| Haskell | QuickCheck |
| Rust | quickcheck |
| Java | jqwik |
| C# | FsCheck |
Adopted by Amazon, Volvo, Stripe, Jane Street (ICSE 2024 study).
When PBT Adds Value
HIGH value: algorithms | data structures | serialization | business rules (validation, calculations) | protocols/state machines. LOW value: simple CRUD | UI logic | external API integrations. PBT complements example-based testing, doesn't replace it.
PBT + TDD Integration
- Start with example-based TDD for specific cases (drives detailed design)
- Once basic implementation works, write properties to generalize
- If property fails: found bug or need refined implementation
- Refactor freely - properties verify behavior preservation
Properties = higher-level spec that survives refactoring better than examples.
Mutation Testing
Evaluates test suite quality by introducing artificial bugs (mutations) and checking if tests catch them. Mutation score = killed mutants / total mutants. Stronger metric than code coverage.
Mutation Score Targets
| Score | Quality |
|---|---|
| < 60% | Weak suite, significant gaps |
| 60-80% | Moderate, some gaps |
| > 80% | Strong, few gaps |
Target: 75-80% minimum. Not all survivors indicate bad tests (equivalent mutants exist).
Mutation Operators
Change == to != | + to - | remove method call | change constant | modify loop boundary | alter comparison.
Mutation Testing Tools
| Language | Tool |
|---|---|
| Java | PIT |
| JavaScript/TypeScript/C# | Stryker |
| Python | mutmut, Cosmic Ray |
Computationally expensive. Use incremental: on changed code in PRs, full codebase weekly.
Combined PBT + Mutation Workflow
- Write example-based tests (TDD) -> cover known scenarios
- Apply mutation testing -> identify assertion gaps -> write more tests
- Add PBT for complex logic -> cover input space systematically
- Mutation testing again -> verify properties are comprehensive
Quality ratchet: each technique exposes gaps others miss. Prioritize critical paths and complex algorithms.
PBT Performance Guidance
- Fast feedback: ~100 examples | CI/CD: ~1000 examples | Nightly builds: ~10000+ examples
Modern frameworks allow configuring example count per context.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
nw-research
Gathers knowledge from web and files, cross-references across multiple sources, and produces cited research documents. Use when investigating technologies, patterns, or decisions that need evidence backing.
nw-distill
Acceptance test creation methodology for the DISTILL wave. Domain knowledge for the acceptance designer agent: port-to-port principle, prior wave reading, wave-decision reconciliation, graceful degradation, and document back-propagation.
nw-review-output-format
YAML output format and approval criteria for platform design reviews. Load when generating review feedback.
nw-ddd-tactical
Tactical DDD — aggregate design rules, entities, value objects, domain events, repositories, domain services, and anti-pattern detection
nw-infrastructure-and-observability
Infrastructure as Code patterns (Terraform, Kubernetes), observability design (SLOs, metrics, alerting, dashboards), and pipeline security stages. Load when designing infrastructure, observability, or security scanning.
nw-par-critique-dimensions
Platform design review critique dimensions and severity levels. Load when reviewing CI/CD pipelines, infrastructure, deployment strategies, observability, or security designs.
Didn't find tool you were looking for?