Agent skill
quality-capture-baseline
Captures quality metrics baseline (tests, coverage, type errors, linting, dead code) by running quality gates and storing results in memory for regression detection. Use at feature start, before refactor work, or after major changes to establish baseline. Triggers on "capture baseline", "establish baseline", or PROACTIVELY at start of any feature/refactor work. Works with pytest output, pyright errors, ruff warnings, vulture results, and memory MCP server for baseline storage.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/quality-capture-baseline
SKILL.md
Capture Quality Baseline
Table of Contents
Core Sections
- Purpose - Baseline establishment for regression detection
- Quick Start - When to invoke and basic usage
- Instructions - Complete implementation guide
- Step 1: Prepare Context - Feature name, git commit, timestamp
- Step 2: Run Quality Checks - Primary and fallback methods
- Step 3: Parse Metrics - Extract tests, coverage, type errors, linting, dead code
- Step 4: Create Memory Entity - Store baseline in memory system
- Step 5: Return Baseline Reference - Success output format
- Agent Integration - @planner, @implementer, @statuser workflows
- Edge Cases - Quality check failures, no git, missing scripts, memory unavailable
- Anti-Patterns - What to avoid and what to do instead
- Integration with Other Skills - run-quality-gates, validate-refactor-adr, manage-refactor-markers, manage-todo
- Examples - Comprehensive scenarios reference
- Reference - Parsing, lifecycle, quality gates documentation
- Success Criteria - Validation checklist
- Troubleshooting - Common problems and solutions
Purpose
Establish a quality metrics baseline at the start of a feature or refactor so that subsequent quality gate runs can detect regressions. This skill is a critical component of the project's regression detection strategy.
When to Use
Use this skill when:
- Starting any new feature (before writing code)
- Beginning refactor work (to document pre-state)
- After major changes (dependency upgrades, architecture shifts)
- When @planner creates a new todo.md
- When @implementer starts task 0 of a feature
- User asks to "capture baseline" or "establish baseline metrics"
- Before major architectural changes to track impact
Quick Start
When to invoke this skill:
- ✅ At the START of any new feature (before writing code)
- ✅ Before beginning refactor work (to document pre-state)
- ✅ After major changes (dependency upgrades, architecture shifts)
- ❌ NOT after you've already started coding (too late)
Basic usage:
1. Run quality checks: ./scripts/check_all.sh
2. Parse metrics (tests, coverage, type errors, linting, dead code)
3. Create memory entity: baseline_{feature}_{date}
4. Return baseline reference for future comparison
Instructions
Step 1: Prepare Context
Before running quality checks, gather:
- Feature name or refactor identifier (e.g., "auth", "service-result-migration")
- Current git commit hash:
git rev-parse HEAD - Current timestamp:
date +"%Y-%m-%d %H:%M:%S"
Example:
FEATURE="user_profile"
GIT_COMMIT=$(git rev-parse HEAD)
TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S")
DATE=$(date +"%Y-%m-%d")
Step 2: Run Quality Checks
Primary method (recommended):
./scripts/check_all.sh
Fallback (if check_all.sh fails):
# Run individual checks
uv run pytest tests/ -v --cov=src --cov-report=term-missing
uv run pyright src/
uv run ruff check src/
uv run vulture src/
Capture output:
- Store stdout and stderr
- Measure execution time
- Note any failures (document them, don't skip)
Step 3: Parse Metrics
Use the parsing patterns from references/parsing.md. Quick reference:
Tests:
# Pattern: "X passed" or "X failed" or "X skipped"
PASSED=$(grep -oE "[0-9]+ passed" output.txt | grep -oE "[0-9]+")
FAILED=$(grep -oE "[0-9]+ failed" output.txt | grep -oE "[0-9]+")
SKIPPED=$(grep -oE "[0-9]+ skipped" output.txt | grep -oE "[0-9]+")
Coverage:
# Pattern: "TOTAL ... X%"
COVERAGE=$(grep "TOTAL" output.txt | grep -oE "[0-9]+%" | tail -1)
Type Errors:
# Pattern: "X errors, Y warnings" or "0 errors"
TYPE_ERRORS=$(grep -oE "[0-9]+ error" output.txt | grep -oE "[0-9]+" | head -1)
Linting:
# Pattern: "Found X errors"
LINTING_ERRORS=$(grep -oE "Found [0-9]+ error" output.txt | grep -oE "[0-9]+")
Dead Code:
# Pattern: Count vulture output lines or "X% unused"
DEAD_CODE=$(vulture src/ 2>&1 | grep -v "^$" | wc -l | xargs)
Execution Time:
# Measure with time command or calculate from timestamps
EXEC_TIME=$(echo "scale=1; $END_TIME - $START_TIME" | bc)
Step 4: Create Memory Entity
Entity structure:
name: quality-capture-baseline
type: quality_baseline
observations:
- "Tests: {passed} passed, {failed} failed, {skipped} skipped"
- "Coverage: {coverage}%"
- "Type errors: {type_errors}"
- "Linting errors: {linting_errors}"
- "Dead code: {dead_code} items"
- "Execution time: {exec_time}s"
- "Git commit: {git_hash}"
- "Timestamp: {timestamp}"
- "Feature: {feature_name}"
Example:
mcp__memory__create_entities({
"entities": [{
"name": f"baseline_{feature}_{date}",
"type": "quality_baseline",
"observations": [
f"Tests: {passed} passed, {failed} failed, {skipped} skipped",
f"Coverage: {coverage}%",
f"Type errors: {type_errors}",
f"Linting errors: {linting_errors}",
f"Dead code: {dead_code} items",
f"Execution time: {exec_time}s",
f"Git commit: {git_commit}",
f"Timestamp: {timestamp}",
f"Feature: {feature}"
]
}]
})
If pre-existing issues exist: Add additional observation:
"Pre-existing issues: 3 type errors in legacy code (documented, will not be fixed)"
Step 5: Return Baseline Reference
Success output format:
✅ Baseline captured: baseline_{feature}_{date}
Metrics:
- Tests: {passed} passed ({coverage}% coverage)
- Type safety: {status} ({type_errors} errors)
- Code quality: {status} ({linting_errors} linting, {dead_code} dead code items)
- Execution: {exec_time}s
- Git: {git_hash}
Use this baseline for regression detection.
With pre-existing issues:
✅ Baseline captured: baseline_{feature}_{date}
Metrics:
- Tests: 152 passed (89% coverage)
- Type safety: 3 errors (pre-existing, documented)
- Code quality: Clean
- Git: abc123f
⚠️ Note: 3 pre-existing type errors documented, will not cause regression failures.
Use this baseline for regression detection.
Agent Integration
@planner Usage
When: At plan creation, BEFORE creating todo.md
Workflow:
- Receive feature request from user
- Analyze requirements
- → Invoke capture-quality-baseline
- Receive baseline reference
- Create todo.md with baseline in header
- Create memory entities linking to baseline
Example:
# Todo: User Profile Feature
Baseline: baseline_user_profile_2025-10-16
Created: 2025-10-16
## Tasks
- [ ] Task 1 (references baseline for quality gates)
...
@implementer Usage
When: At feature start (task 0) or before refactor work
Workflow:
- Validate ADR (if refactor) using validate-refactor-adr
- → Invoke capture-quality-baseline
- Receive baseline reference
- Reference baseline in quality gate runs
- Compare final metrics against baseline
Example:
User: "Start implementing authentication"
@implementer:
1. Invoke capture-quality-baseline
2. Receives: baseline_auth_2025-10-16
3. Begin task 1 implementation
4. After each task: Run quality gates, compare to baseline
5. Report: "All metrics maintained or improved from baseline"
@statuser Usage
When: User requests baseline recapture or progress check
Workflow:
- Receive progress check request
- If major changes detected: → Invoke capture-quality-baseline
- Compare new baseline to original
- Report delta and recommend re-baseline if appropriate
Example:
User: "Check progress after dependency upgrade"
@statuser:
1. Detects major change (dependency upgrade)
2. Invokes capture-quality-baseline
3. Receives: baseline_post_upgrade_2025-10-16
4. Compares to original baseline
5. Reports: "2 tests removed (deprecated APIs), coverage maintained, type errors resolved"
Edge Cases
Quality Checks Fail
Scenario: Some checks fail during baseline capture
Handling:
- Still capture the baseline (document failures)
- Add observation: "Baseline captured WITH failures"
- Flag in output: "⚠️ Baseline has failures - must fix before feature work"
- Return baseline reference (still usable for comparison)
Example:
✅ Baseline captured: baseline_auth_2025-10-16
⚠️ WARNING: Baseline has failures
Metrics:
- Tests: 140 passed, 5 FAILED, 2 skipped
- Type safety: 2 errors
- Code quality: 3 linting errors
🛑 FIX these failures before starting feature work.
No Git Repository
Scenario: Not in a git repository (or git not available)
Handling:
- Skip git commit capture
- Continue with other metrics
- Add observation: "No git commit (not in repository)"
- Note in output: "Git: Not available"
Script Not Found
Scenario: ./scripts/check_all.sh doesn't exist
Handling:
- Try individual checks (pytest, pyright, ruff, vulture)
- If any work: Use those results
- If none work: Report error and abort
Error message:
❌ Cannot capture baseline: Quality check scripts not found
Tried:
- ./scripts/check_all.sh (not found)
- uv run pytest (not found)
- uv run pyright (not found)
Cannot establish baseline without quality checks.
Memory Service Unavailable
Scenario: Cannot create memory entity (MCP server down)
Handling:
- Store baseline in local file:
.quality-baseline-{feature}-{date}.json - Return file path as reference
- Note in output: "Baseline stored locally (memory unavailable)"
Fallback file format:
{
"baseline_name": "baseline_auth_2025-10-16",
"metrics": {
"tests": {"passed": 145, "failed": 0, "skipped": 2},
"coverage": 87,
"type_errors": 0,
"linting_errors": 0,
"dead_code": 3,
"execution_time": 8.3,
"git_commit": "abc123f",
"timestamp": "2025-10-16 10:30:00"
}
}
Anti-Patterns
❌ DON'T:
- Skip baseline capture (regression detection requires it)
- Reuse old baselines for new features (each feature needs its own)
- Capture baseline after starting work (too late)
- Ignore failing checks in baseline (document them)
- Forget to reference baseline in todo.md
✅ DO:
- Capture baseline BEFORE any feature work
- Create unique baseline per feature
- Document pre-existing issues in baseline
- Reference baseline in todo.md and memory
- Re-baseline after major changes (upgrades, migrations)
Integration with Other Skills
Skill: run-quality-gates
- capture-quality-baseline: Establishes the reference point
- run-quality-gates: Compares current metrics against baseline
Skill: validate-refactor-adr
- validate-refactor-adr: Validates ADR completeness
- capture-quality-baseline: Documents pre-refactor state
Skill: manage-refactor-markers
- capture-quality-baseline: Establishes pre-refactor metrics
- manage-refactor-markers: Tracks refactor progress
Skill: manage-todo
- capture-quality-baseline: Provides baseline reference
- manage-todo: Stores baseline name in todo.md header
Examples
See examples.md for comprehensive scenarios:
- Feature start baseline (clean state)
- Refactor start baseline (with pre-existing issues)
- Re-baseline after major change
- Baseline with failures
- Agent auto-invocation patterns
- Cross-agent baseline sharing
- Baseline comparison workflows
- Recovery from failed baseline capture
Reference
- Metrics parsing: See
references/parsing.mdfor detailed parsing logic - Baseline lifecycle: See
references/lifecycle.mdfor management best practices - Quality gates: See project's
../run-quality-gates/references/shared-quality-gates.mdfor gate definitions
Success Criteria
- Quality checks execute successfully (or failures documented)
- All 5 metrics extracted (tests, coverage, type errors, linting, dead code)
- Memory entity created with baseline data
- Baseline name returned for reference
- Execution time < 30 seconds (typically ~8s)
- Git commit captured (if available)
- Timestamp recorded
- Pre-existing issues documented (if any)
Troubleshooting
Problem: Metrics parsing fails (empty values)
Solution:
- Check quality check output format (may have changed)
- Review parsing patterns in
references/parsing.md - Update regex patterns if needed
- Fallback: Manually parse critical metrics
Problem: Baseline creation succeeds but can't be found later
Solution:
- Verify memory entity name format:
baseline_{feature}_{date} - Search memory:
mcp__memory__search_memories(query="baseline") - Check fallback file:
.quality-baseline-*.json
Problem: Quality checks take too long (> 30s)
Solution:
- Check if Neo4j is running (connection timeout)
- Skip slow tests:
pytest -m "not slow" - Run checks in parallel (check_all.sh does this)
- Report timing issue to user
Last Updated: 2025-10-16 Priority: High (critical for regression detection) Dependencies: Quality gate scripts, memory MCP server
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?