/sw:qa - Quality Assessment Command

IMPORTANT: You MUST invoke the CLI specweave qa command using the Bash tool. The slash command provides guidance and orchestration only.

Purpose

Run comprehensive quality assessment on an increment using:

✅ Gate 1: Rule-based validation (130+ automated checks)
✅ Gate 2: LLM-as-Judge (AI quality assessment with chain-of-thought reasoning)
✅ Gate 3: Risk scoring (BMAD Probability × Impact quantitative assessment)
✅ Quality gate decisions (PASS/CONCERNS/FAIL)

LLM-as-Judge Pattern

This command implements the LLM-as-Judge pattern - an established AI/ML evaluation technique where an LLM evaluates outputs using structured reasoning.

How it works:

┌─────────────────────────────────────────────────────────────┐
│                    LLM-as-Judge Gate                        │
├─────────────────────────────────────────────────────────────┤
│  Input: spec.md, plan.md, tasks.md                         │
│                                                             │
│  Process:                                                   │
│  1. Chain-of-thought analysis (7 dimensions)               │
│  2. Evidence-based scoring (0-100 per dimension)           │
│  3. Risk identification (BMAD P×I formula)                 │
│  4. Formal verdict (PASS/CONCERNS/FAIL)                    │
│                                                             │
│  Output: Structured quality report with:                   │
│  - Blockers (MUST fix)                                     │
│  - Concerns (SHOULD fix)                                   │
│  - Recommendations (NICE to fix)                           │
└─────────────────────────────────────────────────────────────┘

Why LLM-as-Judge?

Consistency: Applies uniform evaluation criteria
Depth: Catches nuanced issues humans might miss
Speed: ~30 seconds vs hours of manual review
Documented reasoning: Explains WHY something is an issue

Usage

bash

/sw:qa <increment-id> [options]

Examples

bash

# Quick mode (default)
/sw:qa 0008

# Pre-implementation check
/sw:qa 0008 --pre

# Quality gate check (comprehensive)
/sw:qa 0008 --gate

# Export blockers to tasks.md
/sw:qa 0008 --export

# CI mode (exit 1 on FAIL)
/sw:qa 0008 --ci

# Skip AI assessment (rule-based only)
/sw:qa 0008 --no-ai

# Force run even if rule-based fails
/sw:qa 0008 --force

Options

--quick - Quick mode (default) - Fast assessment with core checks
--pre - Pre-implementation mode - Check before starting work
--gate - Quality gate mode - Comprehensive check before closing
--full - Full multi-agent mode (Phase 3)
--ci - CI mode - Exit 1 on FAIL (for automation)
--no-ai - Skip AI assessment - Rule-based validation only (free, fast)
--export - Export blockers/concerns to tasks.md
--force - Force run even if rule-based validation fails
-v, --verbose - Show recommendations in addition to blockers/concerns

What It Does

Step 1: Rule-Based Validation (Always First, Always Free)

The command runs 120+ validation checks on increment files:

✅ File existence (spec.md, plan.md, tasks.md)
✅ YAML frontmatter structure
✅ AC-ID traceability (spec.md → tasks.md)
✅ Link integrity
✅ Format consistency

If rule-based fails → Stop (don't waste AI tokens) unless --force flag used

Step 2: AI Quality Assessment (Optional, skip with `--no-ai`)

IMPORTANT: This step uses the increment-quality-judge-v2 skill (auto-activated).

The skill provides guidance and the CLI handles execution:

bash

# CLI invokes quality assessment directly
specweave qa 0008 --pre

DO NOT spawn agents for quality assessment - use the CLI command which handles everything internally.

The assessment evaluates:

7 Dimensions:
1. Clarity (18% weight)
2. Testability (22% weight)
3. Completeness (18% weight)
4. Feasibility (13% weight)
5. Maintainability (9% weight)
6. Edge Cases (9% weight)
7. Risk Assessment (11% weight)

Risk Assessment uses quantitative method:

Probability (0.0-1.0) × Impact (1-10) = Risk Score (0.0-10.0)
4 categories: Security, Technical, Implementation, Operational
Severity: CRITICAL (≥9.0), HIGH (6.0-8.9), MEDIUM (3.0-5.9), LOW (<3.0)

Step 3: Quality Gate Decision

Based on thresholds:

FAIL if any:

Risk score ≥ 9.0 (CRITICAL)
Test coverage < 60%
Spec quality < 50
Critical security vulnerabilities ≥ 1

CONCERNS if any:

Risk score 6.0-8.9 (HIGH)
Test coverage < 80%
Spec quality < 70
High security vulnerabilities ≥ 1

PASS otherwise

Step 4: Display Report

Show results with:

🟢 PASS / 🟡 CONCERNS / 🔴 FAIL decision
Blockers (MUST fix)
Concerns (SHOULD fix)
Recommendations (NICE to fix, with --verbose)
Spec quality scores (7 dimensions)
Summary (duration, tokens, cost)

Step 5: Export (Optional)

If --export flag provided:

Append blockers/concerns to tasks.md
Add priority (P0 for blockers, P1 for concerns)
Include mitigation strategies

Implementation

When user runs /qa <increment-id>:

Parse and normalize arguments

typescript

let incrementId = args[0]; // e.g., "0008" or "0008-feature-name"

// Normalize increment ID
if (incrementId.includes('-')) {
  // Extract numeric portion: "0008-feature-name" → "0008"
  incrementId = incrementId.split('-')[0];
}
// Convert to 4-digit format: "8" → "0008"
incrementId = incrementId.padStart(4, '0');

const options = parseOptions(args.slice(1));

Both formats work: /sw:qa 0153 or /sw:qa 0153-feature-name

Invoke CLI command via Bash tool
bash
```
specweave qa 0008 --pre --export
```
CLI handles everything:
- Rule-based validation
- AI assessment invocation
- Quality gate decision
- Report display
- Export to tasks.md
Return result to user
- Show CLI output (already formatted)
- Suggest next steps based on decision

Modes Explained

Quick Mode (Default)

Use when: Quick check during development Checks: Rule-based + AI spec quality + risk assessment Time: ~30 seconds Cost: ~$0.025-$0.050

Pre-Implementation Mode (`--pre`)

Use when: Before starting increment work Checks: All quick mode checks + architecture review Time: ~1 minute Cost: ~$0.05-$0.10

Quality Gate Mode (`--gate`)

Use when: Before closing increment (via /sw:done) Checks: All pre-implementation checks + test coverage + security audit Time: ~2-3 minutes Cost: ~$0.10-$0.20

Full Multi-Agent Mode (`--full`, Phase 3)

Use when: Comprehensive audit for critical increments Checks: 6 specialized subagents in parallel Time: ~5 minutes Cost: ~$0.50-$1.00

Cost Breakdown

Mode	Tokens	Cost (USD)	Time
Quick	~2,500	~$0.025	30s
Pre	~5,000	~$0.050	1m
Gate	~10,000	~$0.100	2-3m
Full	~50,000	~$0.500	5m

Optimization: Use Haiku model by default (cheapest, fastest)

Exit Codes (for CI)

When --ci flag used:

Exit 0: PASS or CONCERNS (warning, but not blocking)
Exit 1: FAIL (blocking issues found)

CI Integration Example:

yaml

# .github/workflows/qa-check.yml
- name: Run QA Check
  run: specweave qa ${{ env.INCREMENT_ID }} --gate --ci

Error Handling

Common errors:

❌ Increment not found → Check ID format (4 digits: 0001, 0008)
❌ Missing files → Run /sw:inc to create increment first
❌ Rule-based fails → Fix validation errors before AI assessment
❌ AI timeout → Retry with --quick mode or --no-ai

Integration Points

Auto-invoked by:

/sw:done - Runs --gate mode before closing increment
Post-task-completion hook (optional) - Runs --quick mode after tasks complete

Manual invocation:

During development - /qa 0008 for quick checks
Before commit - /qa 0008 --pre to catch issues early
Before PR - /qa 0008 --gate --export for comprehensive check

Best Practices

Run early and often - Use --quick during development
Fix blockers immediately - Don't proceed with FAIL decision
Address concerns before release - CONCERNS = should fix
Use risk scores to prioritize - Fix CRITICAL (≥9.0) risks first
Export to tasks.md - Convert blockers/concerns to actionable tasks
CI integration - Block PRs with FAIL decision

Skill: increment-quality-judge-v2 (7 dimensions with risk assessment)
Command: /sw:done (auto-runs QA gate)
CLI: specweave qa (direct invocation)
Types: src/core/qa/types.ts (TypeScript definitions)
Tests: tests/unit/qa/ (58 test cases, 100% passing)

Example Session

User: /sw:qa 0008

Search AI Tools

qa

Install this agent skill to your Project

SKILL.md

/sw:qa - Quality Assessment Command

Purpose

LLM-as-Judge Pattern

Usage

Examples

Options

What It Does

Step 1: Rule-Based Validation (Always First, Always Free)

Step 2: AI Quality Assessment (Optional, skip with `--no-ai`)

Step 3: Quality Gate Decision

Step 4: Display Report

Step 5: Export (Optional)

Implementation

Modes Explained

Quick Mode (Default)

Pre-Implementation Mode (`--pre`)

Quality Gate Mode (`--gate`)

Full Multi-Agent Mode (`--full`, Phase 3)

Cost Breakdown

Exit Codes (for CI)

Error Handling

Integration Points

Best Practices

Related

Example Session

Search AI Tools

Install this agent skill to your Project

SKILL.md

/sw:qa - Quality Assessment Command

Purpose

LLM-as-Judge Pattern

Usage

Examples

Options

What It Does

Step 1: Rule-Based Validation (Always First, Always Free)

Step 2: AI Quality Assessment (Optional, skip with --no-ai)

Step 3: Quality Gate Decision

Step 4: Display Report

Step 5: Export (Optional)

Implementation

Modes Explained

Quick Mode (Default)

Pre-Implementation Mode (--pre)

Quality Gate Mode (--gate)

Full Multi-Agent Mode (--full, Phase 3)

Cost Breakdown

Exit Codes (for CI)

Error Handling

Integration Points

Best Practices

Related

Example Session

Step 2: AI Quality Assessment (Optional, skip with `--no-ai`)

Pre-Implementation Mode (`--pre`)

Quality Gate Mode (`--gate`)

Full Multi-Agent Mode (`--full`, Phase 3)