Agent skill
silent-degradation-audit
Production-ready skill for detecting silent degradation across codebases. Uses multi-wave audit system with 6 specialized category agents, multi-agent validation panel, and convergence detection.
Install this agent skill to your Project
npx add-skill https://github.com/rysweet/amplihack/tree/main/.claude/skills/silent-degradation-audit
SKILL.md
Silent Degradation Audit Skill
Overview
Production-ready skill for detecting silent degradation across codebases. Uses multi-wave audit system with 6 specialized category agents, multi-agent validation panel, and convergence detection. Battle-tested on CyberGym codebase (~250 bugs found).
When to Use This Skill
Use this skill when:
- Code has reliability issues but unclear where
- Systems fail silently without operator visibility
- Error handling exists but effectiveness unknown
- Need comprehensive audit across multiple failure modes
- Preparing for production deployment
- Post-mortem analysis after silent failures
Don't use for:
- Code style or formatting issues (use linters)
- Performance optimization (use profilers)
- Security vulnerabilities (use security scanners)
- Simple one-off code reviews (use /analyze)
Key Features
Multi-Wave Progressive Audit
- Wave 1: Broad scan, finds obvious issues (40-50% of total)
- Wave 2-3: Deeper analysis, finds hidden issues (30-40%)
- Wave 4-6: Edge cases and subtleties (10-20%)
- Convergence: Stops when < 10 new findings or < 5% of Wave 1
6 Category Agents
- Dependency Failures (Category A): "What happens when X is down?"
- Config Errors (Category B): "What happens when config is wrong?"
- Background Work (Category C): "What happens when background work fails?"
- Test Effectiveness (Category D): "Do tests actually detect failures?"
- Operator Visibility (Category E): "Is the error visible to operators?"
- Functional Stubs (Category F): "Does this code actually do what its name says?"
Multi-Agent Validation Panel
- 3 agents review findings: Security, Architect, Builder
- 2/3 consensus required to validate finding
- Prevents false positives and unnecessary changes
- Tracks strong vs weak consensus
Language-Agnostic
Supports 9 languages with language-specific patterns:
- Python, JavaScript, TypeScript
- Rust, Go, Java, C#
- Ruby, PHP
Integration Modes
Standalone Invocation
Direct skill invocation for focused audit:
/silent-degradation-audit path/to/codebase
Sub-Loop in Quality Audit Workflow
Integrated as Phase 2 of quality-audit-workflow:
quality-audit-workflow calls silent-degradation-audit
→ Returns findings to quality workflow
→ Quality workflow applies fixes
→ Continues to next phase
Usage
Basic Usage
# Audit entire codebase
/silent-degradation-audit .
# Audit specific directory
/silent-degradation-audit ./src
# With custom exclusions
/silent-degradation-audit . --exclusions .my-exclusions.json
Configuration
Create .silent-degradation-config.json in codebase root:
{
"convergence": {
"absolute_threshold": 10,
"relative_threshold": 0.05
},
"max_waves": 6,
"exclusions": {
"patterns": ["*.test.js", "test_*.py", "**/__tests__/**"]
},
"categories": {
"enabled": [
"dependency-failures",
"config-errors",
"background-work",
"test-effectiveness",
"operator-visibility",
"functional-stubs"
]
}
}
Exclusion Lists
Global Exclusions
Edit ~/.amplihack/.claude/skills/silent-degradation-audit/exclusions-global.json:
[
{
"pattern": "*.test.*",
"reason": "Test files excluded from production audits",
"category": "*"
},
{
"pattern": "**/vendor/**",
"reason": "Third-party code",
"category": "*"
}
]
Repository-Specific Exclusions
Create .silent-degradation-exclusions.json in repository root:
[
{
"pattern": "src/legacy/*.py",
"reason": "Legacy code being replaced",
"category": "*",
"wave": 1
},
{
"pattern": "api/endpoints.py:42",
"reason": "Empty dict is valid API response",
"category": "functional-stubs",
"type": "exact"
}
]
Output
Report Format
Generates .silent-degradation-report.md:
# Silent Degradation Audit Report
## Summary
- **Total Waves**: 4
- **Total Findings**: 137
- **Converged**: Yes
- **Convergence Ratio**: 4.2%
## Convergence Progress
Wave 1: ██████████████████████████████████████████████████ 120
Wave 2: ███████████████████████████ 65 (54.2% of Wave 1)
Wave 3: ████████ 18 (15.0% of Wave 1)
Wave 4: ██ 5 (4.2% of Wave 1)
Status: ✓ CONVERGED
Reason: Relative threshold met: 4.2% < 5.0%
## Findings by Category
### dependency-failures (42 findings)
- High: 15
- Medium: 20
- Low: 7
[... continues for all 6 categories ...]
Findings Format
Generates .silent-degradation-findings.json:
[
{
"id": "dep-001",
"category": "dependency-failures",
"severity": "high",
"file": "src/payments.py",
"line": 89,
"description": "Payment API failure silently falls back to mock",
"impact": "Production system using mock payments, no real charges",
"visibility": "None - no logs or metrics",
"recommendation": "Add explicit failure logging and metric, or fail fast",
"wave": 1,
"validation": {
"result": "VALIDATED",
"consensus": "strong",
"votes": {
"security": "APPROVE",
"architect": "APPROVE",
"builder": "APPROVE"
}
}
},
...
]
Workflow Details
Phase 1: Initialization
- Create convergence tracker with thresholds
- Initialize exclusion manager
- Set up audit state
Phase 2: Language Detection
- Scan codebase for file extensions
- Identify languages (> 5 files or > 5% threshold)
- Load language-specific patterns
Phase 3: Load Exclusions
- Load global exclusions from skill directory
- Load repository-specific exclusions
- Merge into single exclusion list
Phase 4: Wave Loop
For each wave (until convergence):
-
Category Analysis (6 agents in parallel)
- Each agent scans for category-specific issues
- Uses language-specific patterns
- Excludes previous findings
-
Validation Panel (3 agents in parallel)
- Security agent reviews security implications
- Architect agent reviews design impact
- Builder agent reviews implementation feasibility
-
Vote Tallying
- Require 2/3 consensus (APPROVE)
- Track strong vs weak consensus
- Flag inconclusive for human review
-
Exclusion Filtering
- Apply global and repo-specific exclusions
- Filter out duplicates
-
State Update
- Add new findings to total
- Record wave metrics
-
Convergence Check
- Absolute: < 10 new findings
- Relative: < 5% of Wave 1 findings
- Break if converged
Phase 5: Report Generation
- Generate convergence plot
- Calculate metrics summary
- Categorize findings by type and severity
- Write markdown report
- Write JSON findings
Architecture
Directory Structure
.claude/skills/silent-degradation-audit/
├── SKILL.md # This file
├── reference.md # Detailed patterns and examples
├── examples.md # Usage examples
├── patterns.md # Language-specific patterns
├── README.md # Quick start
├── category_agents/ # 6 category agent definitions
│ ├── dependency-failures.md
│ ├── config-errors.md
│ ├── background-work.md
│ ├── test-effectiveness.md
│ ├── operator-visibility.md
│ └── functional-stubs.md
├── validation_panel/ # Validation panel specs
│ ├── panel-spec.md
│ └── voting-rules.md
├── recipe/ # Recipe-based workflow
│ └── audit-workflow.yaml
└── tools/ # Python utilities
├── exclusion_manager.py
├── language_detector.py
├── convergence_tracker.py
└── __init__.py
Component Responsibilities
Category Agents:
- Scan codebase for category-specific issues
- Use language-specific patterns
- Produce findings with severity, impact, recommendation
Validation Panel:
- Review findings from multiple perspectives
- Vote APPROVE/REJECT/ABSTAIN
- Require 2/3 consensus
Convergence Tracker:
- Track findings per wave
- Calculate convergence metrics
- Determine when to stop
Exclusion Manager:
- Load and merge exclusion lists
- Filter findings against patterns
- Add new exclusions
Language Detector:
- Identify languages in codebase
- Load language-specific patterns
- Support 9 languages
Best Practices
Running First Audit
- Start with small scope: Audit single service/module first
- Review Wave 1 carefully: Establishes baseline
- Tune exclusions: Add false positives to exclusion list
- Verify fixes: Test fixes before applying broadly
Exclusion Management
When to add exclusions:
- False positives (finding not actually an issue)
- Intentional design (behavior is correct as-is)
- Legacy code (not worth fixing right now)
- Third-party code (can't modify)
When NOT to add exclusions:
- Real issues you don't want to fix
- Issues without time to fix now
- Issues that seem hard
Better approach: Fix real issues, prioritize by severity.
Validation Tuning
If too many false positives:
- Review validation panel prompts
- Increase consensus threshold (require unanimous)
- Add category-specific validation rules
If missing real issues:
- Review category agent patterns
- Add language-specific patterns
- Decrease consensus threshold (1/3 approval)
Wave Management
Typical wave characteristics:
- Wave 1: 40-50% of findings (obvious issues)
- Wave 2: 25-30% (deeper issues)
- Wave 3: 15-20% (subtle issues)
- Wave 4+: < 10% each (edge cases)
If waves not converging:
- Check for duplicate findings (exclusion not working)
- Review category agent overlap (agents finding same things)
- Consider lowering convergence threshold
Metrics and Monitoring
Success Metrics
Track these over time:
Audit Success:
- Convergence reached: Yes/No
- Waves to convergence: 4 (target: 3-5)
- Total findings: 137 (varies by codebase)
- Validation rate: 75% (target: 60-80%)
Finding Distribution:
- High severity: 15% (target: < 20%)
- Medium severity: 45% (target: 40-60%)
- Low severity: 40% (target: 30-50%)
Panel Effectiveness:
- Strong consensus: 60% (target: > 50%)
- Weak consensus: 30% (target: 20-40%)
- Inconclusive: 10% (target: < 10%)
- Abstention rate: 5% (target: < 10%)
Quality Indicators
Healthy audit:
- Converges in 3-5 waves
- Validation rate 60-80%
- Strong consensus > 50%
- Abstention rate < 10%
Warning signs:
- Doesn't converge after 6 waves (agents finding same things)
- Validation rate > 95% (rubber stamping)
- Validation rate < 40% (too strict)
- Inconclusive rate > 20% (poor context)
Troubleshooting
"Audit not converging"
Symptoms: Reaches max waves without convergence
Causes:
- Category agents finding duplicate issues
- Exclusion filtering not working
- Convergence threshold too tight
Solutions:
- Review findings for duplicates
- Check exclusion patterns are matching
- Increase relative threshold to 10%
- Reduce max waves to 5
"Too many false positives"
Symptoms: Validation rate > 95%, many non-issues
Causes:
- Category agents too aggressive
- Validation panel too permissive
- Patterns not tuned for codebase
Solutions:
- Review category agent patterns
- Add exclusions for false positive patterns
- Require unanimous validation (3/3)
- Tune language-specific patterns
"Missing real issues"
Symptoms: Known issues not in findings
Causes:
- Category agent gaps
- Exclusion too broad
- Validation panel too strict
Solutions:
- Check if issue matches any category
- Review exclusion list for overly broad patterns
- Lower consensus threshold to 1/3
- Add specific patterns for missed issues
"Validation panel abstaining"
Symptoms: High abstention rate (> 20%)
Causes:
- Insufficient context in findings
- Agent prompts unclear
- Findings outside agent expertise
Solutions:
- Include more code context in findings
- Review and improve agent prompts
- Add fourth "generalist" agent
- Improve finding descriptions
Advanced Configuration
Custom Category Agents
Create custom category agent in category_agents/custom.md:
# Category Custom: My Special Cases
## Core Question
"What happens when [specific scenario]?"
## Detection Focus
[Patterns to detect...]
## Language-Specific Patterns
[Language examples...]
Then enable in config:
{
"categories": {
"enabled": [
"dependency-failures",
"config-errors",
"background-work",
"test-effectiveness",
"operator-visibility",
"functional-stubs",
"custom"
]
}
}
Custom Validation Panel
Override validation panel with different agents:
# In recipe/audit-workflow.yaml
validation_panel:
agents:
- security
- architect
- builder
- domain-expert # Add domain-specific agent
consensus:
required: 0.75 # Require 3/4 approval
Staged Rollout
Audit codebase incrementally:
# Phase 1: Critical services only
/silent-degradation-audit ./services/payments ./services/auth
# Phase 2: All services
/silent-degradation-audit ./services
# Phase 3: Full codebase
/silent-degradation-audit .
See Also
reference.md- Detailed technical referenceexamples.md- Real-world usage examplespatterns.md- Language-specific degradation patternsREADME.md- Quick start guidecategory_agents/- Individual category agent documentationvalidation_panel/- Validation panel specifications
Changelog
Version 1.0.0 (2025-02-24)
- Initial release
- 6 category agents (A-F)
- Multi-agent validation panel (2/3 consensus)
- Convergence detection (dual thresholds)
- Language-agnostic (9 languages)
- Battle-tested on CyberGym (~250 bugs)
- Integration modes: standalone + sub-loop
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
learning-path-builder
Creates personalized learning paths for technologies, frameworks, or concepts. Use for user-interactive session only for onboarding new technologies, hackathon skill-building, or personal development planning. Not for use in automated development or investigation. Sequences resources (docs, tutorials, exercises) based on current skill level and learning goals. Adapts to learning style: hands-on, theory-first, project-based.
gh-work-report
Generates comprehensive GitHub activity reports across all authenticated accounts. Gathers repos, PRs, features, and themes for configurable time periods (1/5/7/30/90 days). Produces shareable markdown with tables, mermaid charts, and executive summaries. Can create a private repo with GitHub Actions automation and GitHub Pages aggregation site. Use when: "github report", "work report", "activity summary", "what did I work on", "gh-work-report", "show my github activity".
pr-review-assistant
Philosophy-aware PR reviews checking alignment with amplihack principles. Use when reviewing PRs to ensure ruthless simplicity, modular design, and zero-BS implementation. Suggests simplifications, identifies over-engineering, verifies brick module structure. Posts detailed, constructive review comments with specific file:line references.
code-smell-detector
Identifies anti-patterns specific to amplihack philosophy. Use when reviewing code for quality issues or refactoring. Detects: over-abstraction, complex inheritance, large functions (>50 lines), tight coupling, missing __all__ exports. Provides specific fixes and explanations for each smell.
biologist-analyst
Analyzes living systems and biological phenomena through biological lens using evolution, molecular biology, ecology, and systems biology frameworks. Provides insights on mechanisms, adaptations, interactions, and life processes. Use when: Biological systems, health issues, evolutionary questions, ecological problems, biotechnology. Evaluates: Function, structure, heredity, evolution, interactions, molecular mechanisms.
Didn't find tool you were looking for?