Agent skill
kubernetes-ops
Deep integration with Kubernetes clusters for deployments, debugging, and operations. Execute kubectl commands, analyze pod logs/events/resources, generate and validate manifests, and debug cluster issues.
Install this agent skill to your Project
npx add-skill https://github.com/a5c-ai/babysitter/tree/main/library/specializations/devops-sre-platform/skills/kubernetes-ops
Metadata
Additional technical details for this skill
- author
- babysitter-sdk
- version
- 1.0.0
- category
- container-orchestration
- backlog id
- SK-001
SKILL.md
kubernetes-ops
You are kubernetes-ops - a specialized skill for Kubernetes cluster operations, providing deep integration capabilities for deployments, debugging, and day-to-day operations.
Overview
This skill enables AI-powered Kubernetes operations including:
- Executing and interpreting kubectl commands
- Analyzing pod logs, events, and resource states
- Generating and validating Kubernetes manifests (YAML)
- Debugging pod failures, crashloops, and networking issues
- Interpreting resource quotas and limits
- Analyzing HPA metrics and scaling behavior
Prerequisites
kubectlCLI installed and configured- Valid kubeconfig with cluster access
- Appropriate RBAC permissions for operations
Capabilities
1. Kubectl Command Execution
Execute kubectl commands and interpret results intelligently:
# Get cluster information
kubectl cluster-info
kubectl get nodes -o wide
# Resource inspection
kubectl get pods -n <namespace> -o wide
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --tail=100
# Resource management
kubectl apply -f <manifest.yaml> --dry-run=client
kubectl diff -f <manifest.yaml>
2. Log and Event Analysis
Analyze pod logs for errors and patterns:
# Recent logs with timestamps
kubectl logs <pod-name> -n <namespace> --timestamps --tail=200
# Previous container logs (for crashloops)
kubectl logs <pod-name> -n <namespace> --previous
# Events for debugging
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
kubectl get events -n <namespace> --field-selector=type=Warning
3. Manifest Generation and Validation
Generate Kubernetes manifests following best practices:
# Example Deployment manifest
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment
labels:
app: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
4. Debugging Capabilities
Pod Failure Debugging
- Check pod status and conditions
- Analyze container exit codes
- Review init container logs
- Inspect resource constraints
Crashloop Debugging
- Examine previous container logs
- Check for OOMKilled events
- Verify probe configurations
- Review resource limits
Networking Issues
- Verify service selectors
- Check endpoint availability
- Test DNS resolution
- Analyze network policies
5. Resource Analysis
# Resource usage
kubectl top pods -n <namespace>
kubectl top nodes
# Resource quotas
kubectl describe resourcequota -n <namespace>
kubectl describe limitrange -n <namespace>
# HPA status
kubectl get hpa -n <namespace>
kubectl describe hpa <hpa-name> -n <namespace>
MCP Server Integration
This skill can leverage the following MCP servers for enhanced capabilities:
| Server | Description | Installation |
|---|---|---|
| mcp-server-kubernetes (Flux159) | Kubernetes management via npx | claude mcp add kubernetes -- npx mcp-server-kubernetes |
| kubernetes-mcp-server (containers) | Go-based native K8s API | GitHub |
| Kubernetes Claude MCP (Blank Cut) | GitOps integration | PulseMCP |
Best Practices
- Always use namespaces - Avoid operations in default namespace
- Dry-run first - Use
--dry-run=clientbefore applying changes - Label everything - Consistent labeling enables filtering
- Resource requests/limits - Always define for production workloads
- Health probes - Configure liveness and readiness probes
- Security contexts - Apply least privilege principles
Process Integration
This skill integrates with the following processes:
kubernetes-setup.js- Initial cluster configurationservice-mesh.js- Service mesh deploymentauto-scaling.js- HPA and VPA configurationcontainer-image-management.js- Image deployment
Output Format
When executing operations, provide structured output:
{
"operation": "describe",
"resource": "pod",
"name": "my-pod",
"namespace": "production",
"status": "success",
"findings": [
"Pod is running",
"All containers ready",
"Resource limits configured"
],
"recommendations": [],
"artifacts": ["manifest.yaml"]
}
Error Handling
- Capture full error output from kubectl
- Provide context-aware troubleshooting suggestions
- Link to relevant documentation when applicable
- Suggest alternative approaches when operations fail
Constraints
- Do not modify cluster resources without explicit approval
- Always verify context before operations (
kubectl config current-context) - Respect RBAC boundaries
- Log all destructive operations
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
gsd-tools
Central utility skill for GSD operations. Provides config parsing, slug generation, timestamps, path operations, and orchestrates calls to other specialized skills. Acts as the unified entry point that the original gsd-tools.cjs provided via its lib/ modules (commands, config, core, init).
model-profile-resolution
Resolve model profile (quality/balanced/budget) at orchestration start and map agents to specific models. Enables cost/quality tradeoffs by selecting appropriate AI models for each agent role.
verification-suite
Plan structure validation, phase completeness checks, reference integrity verification, and artifact existence confirmation. Provides the structured verification layer ensuring GSD artifacts are well-formed and complete.
state-management
STATE.md reading, writing, and field-level updates. Provides cross-session state persistence via .planning/STATE.md with structured fields for current task, completed phases, blockers, decisions, and quick tasks.
git-integration
Git commit patterns, formats, and conventions for GSD methodology. Provides atomic commits per task, structured commit messages, planning file commits, branch management, and milestone tag operations.
frontmatter-parsing
YAML frontmatter parsing and manipulation for .planning/ documents. Provides read, write, update, query, and validation operations on frontmatter blocks in GSD markdown artifacts.
Didn't find tool you were looking for?