Agent skill
kubernetes-health
Comprehensive Kubernetes cluster health diagnostics using dynamic API discovery. Use when checking cluster health, troubleshooting K8s issues, or running health assessments.
Install this agent skill to your Project
npx add-skill https://github.com/nodnarbnitram/claude-code-extensions/tree/main/.claude/skills/kubernetes-health
SKILL.md
Kubernetes Health Diagnostics
Dynamic, discovery-driven health checks for any Kubernetes cluster configuration
BEFORE YOU START
| Impact | Value |
|---|---|
| Token Savings | ~70% vs manual kubectl exploration |
| Setup Time | 0 min (uses existing kubectl config) |
| Coverage | Adapts to installed operators automatically |
Known Issues Prevented
| Problem | Root Cause | How This Skill Helps |
|---|---|---|
| Missing operator health | Static checklists miss CRDs | Dynamic API discovery detects all installed operators |
| Stale diagnostics | Manual checks become outdated | Real-time cluster API interrogation |
| Incomplete coverage | Unknown cluster configuration | Automatically activates relevant sub-agents |
Quick Start
- Verify cluster access: Ensure
kubectlis configured and can reach your cluster - Run discovery: Execute
discover_apis.pyto detect installed operators - Dispatch agents: Use the orchestrator to run health checks based on discovery
# Step 1: Verify kubectl context
kubectl config current-context
kubectl cluster-info
# Step 2: Run API discovery
uv run .claude/skills/kubernetes-health/scripts/discover_apis.py
# Step 3: Review detected operators and dispatch health agents
Critical Rules
Always
- Verify kubectl context before running health checks
- Use read-only kubectl commands (get, describe, logs)
- Run core health checks before operator-specific checks
- Aggregate results using the provided scoring methodology
Never
- Modify cluster resources during health checks
- Expose secret values in health reports (metadata only)
- Skip context verification for production clusters
- Assume operator presence without API discovery
Common Mistakes
| Mistake | Why It's Wrong | Correct Approach |
|---|---|---|
| Hardcoding operator checks | Misses installed operators, checks missing ones | Use API discovery to detect what's installed |
| Sequential agent dispatch | Slow for multi-operator clusters | Run operator agents in parallel (same priority) |
| Raw kubectl output | Token inefficient, hard to parse | Use scripts for condensed JSON output |
Bundled Resources
Scripts
| Script | Purpose |
|---|---|
scripts/discover_apis.py |
Discovers all API groups and detects installed operators |
scripts/health_orchestrator.py |
Maps discovered APIs to specialized health agents |
scripts/aggregate_report.py |
Aggregates multi-agent results into unified report |
References
| File | Contents |
|---|---|
references/operator-checks.md |
Detailed health checks for each supported operator |
references/health-scoring.md |
Scoring methodology and weight assignments |
Templates
| File | Purpose |
|---|---|
templates/health-report.json |
JSON schema for health report output |
Dependencies
Required
| Package | Version | Purpose |
|---|---|---|
| kubectl | Latest | Cluster interaction |
| Python | >= 3.11 | Script execution |
| uv | Latest | Python script runner |
Optional
| Package | Version | Purpose |
|---|---|---|
| kubernetes | >= 28.1.0 | Python client (for advanced discovery) |
Supported Operators
The skill automatically detects and dispatches specialized agents for:
| Operator | API Group | Agent |
|---|---|---|
| Core K8s | (always) | k8s-core-health-agent |
| Crossplane | crossplane.io | k8s-crossplane-health-agent |
| ArgoCD | argoproj.io | k8s-argocd-health-agent |
| Cert-Manager | cert-manager.io | k8s-certmanager-health-agent |
| Prometheus | monitoring.coreos.com | k8s-prometheus-health-agent |
Health Scoring
| Status | Score Range | Criteria |
|---|---|---|
| HEALTHY | 90-100 | All checks pass, no warnings |
| DEGRADED | 60-89 | Some warnings, no critical issues |
| CRITICAL | 0-59 | Critical issues affecting availability |
Troubleshooting
kubectl connection issues
# Verify context
kubectl config current-context
# Test connectivity
kubectl cluster-info
# Check permissions
kubectl auth can-i get pods --all-namespaces
Discovery returns empty results
- Ensure cluster is reachable
- Check RBAC permissions for API discovery
- Verify kubectl version compatibility
Agent dispatch failures
- Confirm discovered API group matches agent trigger
- Check agent file exists in
.claude/agents/specialized/kubernetes/ - Review agent tool restrictions
Setup Checklist
- kubectl configured and connected to cluster
- Python 3.11+ installed
- uv installed for script execution
- Read permissions on cluster resources
- Agent files present in
.claude/agents/specialized/kubernetes/
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
skill-skeleton
ha-automation
Create and debug Home Assistant automations, scripts, blueprints, and Jinja2 templates. Use when working with triggers, conditions, actions, automation YAML, scripts, blueprints, or template expressions. Activates on keywords: automation, trigger, condition, action, blueprint, script, template, jinja2.
ha-addon
Develop Home Assistant add-ons with Docker, Supervisor API, and multi-arch builds. Use when creating add-ons, configuring Dockerfiles, setting up ingress, or publishing to repositories. Activates on keywords: add-on, addon, supervisor, hassio, ingress, bashio, docker.
cloudflare-vpc-services
Diagnose and create Cloudflare VPC Services for Workers to access private APIs in AWS, Azure, GCP, or on-premise networks. Use when troubleshooting dns_error, configuring cloudflared tunnels, setting up VPC service bindings, or routing Workers to internal services.
code-reviewer
Review code for best practices, security issues, and potential bugs. Use when reviewing code changes, checking PRs, analyzing code quality, or performing security audits.
ha-energy
Set up Home Assistant energy monitoring with dashboards, solar, grid, and device tracking. Use when configuring energy sensors, utility meters, statistics, or analyzing consumption. Activates on keywords: energy dashboard, solar, grid, consumption, kWh, utility meter, power monitoring, state_class, device_class: energy.
Didn't find tool you were looking for?