Agent skill
k8s
Kubernetes ops skill for deploying, operating, and troubleshooting services on Kubernetes. Use for tasks like writing manifests/Helm, configuring deployments/services/ingress, autoscaling, observability, RBAC, secrets/configmaps, rollout/rollback, incident debugging, and production readiness checks.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/k8s
SKILL.md
k8s
Use this skill for Kubernetes 运维与发布相关工作。
Defaults / assumptions to confirm
- Cluster type: managed (EKS/GKE/ACK) vs self-hosted
- Packaging: raw YAML vs Helm vs Kustomize
- Ingress: NGINX/ALB/APISIX/Istio
- Observability stack: Prometheus/Grafana, Loki/ELK, tracing
Workflow
- Understand service requirements
- Ports, protocols, health checks, resources (CPU/mem), storage needs.
- SLOs: latency, availability, RPO/RTO.
- Dependencies: DB, cache, MQ, external APIs.
- Deployment design
- Use
Deploymentfor stateless;StatefulSetfor stable identities/storage. - Define
readinessProbeandlivenessProbe(andstartupProbeif needed). - Set
resources.requests/limitsand choose appropriate QoS. - Use
PodDisruptionBudgetfor availability during maintenance.
- Config & secrets
- Config:
ConfigMap(non-sensitive), mounted or env. - Secrets:
Secret(sensitive) + external secret manager if available. - Never commit plaintext secrets; prefer sealed/external secrets.
- Networking
Servicetypes and DNS.Ingress/Gateway routing, TLS termination, timeouts.- NetworkPolicy if cluster enforces it.
- Scaling & resilience
HPAbased on CPU/memory/custom metrics.- Graceful shutdown (
preStop, terminationGracePeriodSeconds). - Retry/backoff at client; avoid retry storms.
- Observability
- Standard logs with correlation IDs.
- Metrics: RPS, p95 latency, error rate, saturation.
- Alerts and dashboards; runbook links.
- Release operations
- Rolling updates, canary/blue-green if needed.
kubectl rollout status+ rollback plan.- Post-deploy verification checks and smoke tests.
- Troubleshooting checklist
kubectl get/describepods, events, andlogs.- Check probes, image pull, env/config, DNS, network, and resource throttling.
- For performance: node pressure, HPA behavior, GC/heap, connection pool limits.
Output expectations when making changes
- Provide manifests (or Helm values/templates) + brief deployment notes.
- Include resource sizing rationale and probe settings.
- Include rollback instructions and verification steps.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?