Agent skill

kubernetes-health

Comprehensive Kubernetes cluster health diagnostics using dynamic API discovery. Use when checking cluster health, troubleshooting K8s issues, or running health assessments.

Stars 5
Forks 3

Install this agent skill to your Project

npx add-skill https://github.com/nodnarbnitram/claude-code-extensions/tree/main/.claude/skills/kubernetes-health

SKILL.md

Kubernetes Health Diagnostics

Dynamic, discovery-driven health checks for any Kubernetes cluster configuration

BEFORE YOU START

Impact Value
Token Savings ~70% vs manual kubectl exploration
Setup Time 0 min (uses existing kubectl config)
Coverage Adapts to installed operators automatically

Known Issues Prevented

Problem Root Cause How This Skill Helps
Missing operator health Static checklists miss CRDs Dynamic API discovery detects all installed operators
Stale diagnostics Manual checks become outdated Real-time cluster API interrogation
Incomplete coverage Unknown cluster configuration Automatically activates relevant sub-agents

Quick Start

  1. Verify cluster access: Ensure kubectl is configured and can reach your cluster
  2. Run discovery: Execute discover_apis.py to detect installed operators
  3. Dispatch agents: Use the orchestrator to run health checks based on discovery
bash
# Step 1: Verify kubectl context
kubectl config current-context
kubectl cluster-info

# Step 2: Run API discovery
uv run .claude/skills/kubernetes-health/scripts/discover_apis.py

# Step 3: Review detected operators and dispatch health agents

Critical Rules

Always

  • Verify kubectl context before running health checks
  • Use read-only kubectl commands (get, describe, logs)
  • Run core health checks before operator-specific checks
  • Aggregate results using the provided scoring methodology

Never

  • Modify cluster resources during health checks
  • Expose secret values in health reports (metadata only)
  • Skip context verification for production clusters
  • Assume operator presence without API discovery

Common Mistakes

Mistake Why It's Wrong Correct Approach
Hardcoding operator checks Misses installed operators, checks missing ones Use API discovery to detect what's installed
Sequential agent dispatch Slow for multi-operator clusters Run operator agents in parallel (same priority)
Raw kubectl output Token inefficient, hard to parse Use scripts for condensed JSON output

Bundled Resources

Scripts

Script Purpose
scripts/discover_apis.py Discovers all API groups and detects installed operators
scripts/health_orchestrator.py Maps discovered APIs to specialized health agents
scripts/aggregate_report.py Aggregates multi-agent results into unified report

References

File Contents
references/operator-checks.md Detailed health checks for each supported operator
references/health-scoring.md Scoring methodology and weight assignments

Templates

File Purpose
templates/health-report.json JSON schema for health report output

Dependencies

Required

Package Version Purpose
kubectl Latest Cluster interaction
Python >= 3.11 Script execution
uv Latest Python script runner

Optional

Package Version Purpose
kubernetes >= 28.1.0 Python client (for advanced discovery)

Supported Operators

The skill automatically detects and dispatches specialized agents for:

Operator API Group Agent
Core K8s (always) k8s-core-health-agent
Crossplane crossplane.io k8s-crossplane-health-agent
ArgoCD argoproj.io k8s-argocd-health-agent
Cert-Manager cert-manager.io k8s-certmanager-health-agent
Prometheus monitoring.coreos.com k8s-prometheus-health-agent

Health Scoring

Status Score Range Criteria
HEALTHY 90-100 All checks pass, no warnings
DEGRADED 60-89 Some warnings, no critical issues
CRITICAL 0-59 Critical issues affecting availability

Troubleshooting

kubectl connection issues

bash
# Verify context
kubectl config current-context

# Test connectivity
kubectl cluster-info

# Check permissions
kubectl auth can-i get pods --all-namespaces

Discovery returns empty results

  • Ensure cluster is reachable
  • Check RBAC permissions for API discovery
  • Verify kubectl version compatibility

Agent dispatch failures

  • Confirm discovered API group matches agent trigger
  • Check agent file exists in .claude/agents/specialized/kubernetes/
  • Review agent tool restrictions

Setup Checklist

  • kubectl configured and connected to cluster
  • Python 3.11+ installed
  • uv installed for script execution
  • Read permissions on cluster resources
  • Agent files present in .claude/agents/specialized/kubernetes/

Expand your agent's capabilities with these related and highly-rated skills.

nodnarbnitram/claude-code-extensions

skill-skeleton

5 3
Explore
nodnarbnitram/claude-code-extensions

ha-automation

Create and debug Home Assistant automations, scripts, blueprints, and Jinja2 templates. Use when working with triggers, conditions, actions, automation YAML, scripts, blueprints, or template expressions. Activates on keywords: automation, trigger, condition, action, blueprint, script, template, jinja2.

5 3
Explore
nodnarbnitram/claude-code-extensions

ha-addon

Develop Home Assistant add-ons with Docker, Supervisor API, and multi-arch builds. Use when creating add-ons, configuring Dockerfiles, setting up ingress, or publishing to repositories. Activates on keywords: add-on, addon, supervisor, hassio, ingress, bashio, docker.

5 3
Explore
nodnarbnitram/claude-code-extensions

cloudflare-vpc-services

Diagnose and create Cloudflare VPC Services for Workers to access private APIs in AWS, Azure, GCP, or on-premise networks. Use when troubleshooting dns_error, configuring cloudflared tunnels, setting up VPC service bindings, or routing Workers to internal services.

5 3
Explore
nodnarbnitram/claude-code-extensions

code-reviewer

Review code for best practices, security issues, and potential bugs. Use when reviewing code changes, checking PRs, analyzing code quality, or performing security audits.

5 3
Explore
nodnarbnitram/claude-code-extensions

ha-energy

Set up Home Assistant energy monitoring with dashboards, solar, grid, and device tracking. Use when configuring energy sensors, utility meters, statistics, or analyzing consumption. Activates on keywords: energy dashboard, solar, grid, consumption, kWh, utility meter, power monitoring, state_class, device_class: energy.

5 3
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results