Agent skill
kubernetes
Install this agent skill to your Project
npx add-skill https://github.com/FortiumPartners/ai-mesh/tree/main/skills/kubernetes
SKILL.md
Kubernetes Quick Reference
Version: 1.0.0 | Target Size: <100KB | Purpose: Fast reference for Kubernetes manifest development and deployment
Overview
Kubernetes is a container orchestration platform for automating deployment, scaling, and management of containerized applications. This quick reference provides essential patterns for creating production-ready Kubernetes manifests with security hardening and best practices.
When to Load This Skill:
- Detected:
*.yamlwithapiVersion: v1|apps/v1,kind: Deployment|Service|Pod,kustomization.yaml - Manual:
--tools=kubernetesflag - Use Case: Container orchestration and production deployments
Progressive Disclosure:
- This file (SKILL.md): Quick reference for immediate use
- REFERENCE.md: Comprehensive guide with advanced patterns and 20+ production examples
Table of Contents
- Core Resources Quick Reference
- Security Hardening Checklist
- Resource Requests and Limits Guidelines
- Networking Basics
- Storage Overview
- RBAC Basics
- Common kubectl Commands
- Health Checks and Probes
- Configuration Management
- Troubleshooting Quick Guide
Core Resources Quick Reference
Pod
Basic unit of deployment - one or more containers running together:
apiVersion: v1
kind: Pod
metadata:
name: webapp
labels:
app: webapp
spec:
containers:
- name: app
image: nginx:1.21
ports:
- containerPort: 80
Key Concepts:
- Smallest deployable unit in Kubernetes
- Containers in same pod share network and storage
- Typically managed by higher-level controllers (Deployment, StatefulSet)
- Use for debugging, not production deployments
Deployment
Declarative pod management with rolling updates and rollbacks:
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
labels:
app: webapp
spec:
replicas: 3
selector:
matchLabels:
app: webapp
template:
metadata:
labels:
app: webapp
spec:
containers:
- name: app
image: nginx:1.21
ports:
- containerPort: 80
Key Features:
- Manages ReplicaSets for pod scaling
- Rolling updates with zero downtime
- Rollback to previous versions
- Self-healing (restarts failed pods)
Service
Network abstraction providing stable endpoint for pods:
ClusterIP (internal only):
apiVersion: v1
kind: Service
metadata:
name: webapp
spec:
type: ClusterIP # Default
selector:
app: webapp
ports:
- port: 80 # Service port
targetPort: 80 # Container port
NodePort (external access via node port):
apiVersion: v1
kind: Service
metadata:
name: webapp
spec:
type: NodePort
selector:
app: webapp
ports:
- port: 80
targetPort: 80
nodePort: 30080 # External port (30000-32767)
LoadBalancer (cloud load balancer):
apiVersion: v1
kind: Service
metadata:
name: webapp
spec:
type: LoadBalancer
selector:
app: webapp
ports:
- port: 80
targetPort: 80
Service Types:
- ClusterIP: Internal access only (default)
- NodePort: External access via node IP:port
- LoadBalancer: Cloud provider load balancer
- ExternalName: DNS CNAME record
ConfigMap
Non-sensitive configuration data:
apiVersion: v1
kind: ConfigMap
metadata:
name: webapp-config
data:
app.conf: |
[server]
port = 8080
timeout = 30
database.host: postgres.default.svc.cluster.local
cache.ttl: "3600"
Usage in Pod:
spec:
containers:
- name: app
envFrom:
- configMapRef:
name: webapp-config
# Or individual keys
env:
- name: DB_HOST
valueFrom:
configMapKeyRef:
name: webapp-config
key: database.host
# Or mount as volume
volumeMounts:
- name: config
mountPath: /etc/config
volumes:
- name: config
configMap:
name: webapp-config
Secret
Sensitive data storage (base64 encoded):
apiVersion: v1
kind: Secret
metadata:
name: webapp-secrets
type: Opaque
data:
# Base64 encoded values
db-password: cGFzc3dvcmQxMjM=
api-key: YWJjZGVmZ2hpamts
stringData:
# Plain text (auto-encoded)
admin-password: changeme
Usage in Pod:
spec:
containers:
- name: app
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: webapp-secrets
key: db-password
# Or mount as volume
volumeMounts:
- name: secrets
mountPath: /etc/secrets
readOnly: true
volumes:
- name: secrets
secret:
secretName: webapp-secrets
Secret Types:
Opaque: Generic secret (default)kubernetes.io/dockerconfigjson: Docker registry credentialskubernetes.io/tls: TLS certificate and keykubernetes.io/service-account-token: Service account token
Ingress
HTTP/HTTPS routing to services:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: webapp
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: nginx
tls:
- hosts:
- www.example.com
secretName: webapp-tls
rules:
- host: www.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: webapp
port:
number: 80
Path Types:
Prefix: Matches path prefix (/appmatches/app,/app/page)Exact: Exact path match onlyImplementationSpecific: Ingress controller-specific
Security Hardening Checklist
Essential Security Settings
From infrastructure-developer best practices (production-validated):
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: webapp
template:
metadata:
labels:
app: webapp
spec:
# Pod-level security context
securityContext:
runAsNonRoot: true # ✅ Prevent root execution
runAsUser: 1000 # ✅ Specific non-root user
fsGroup: 2000 # ✅ File system group
seccompProfile:
type: RuntimeDefault # ✅ Seccomp profile
containers:
- name: app
image: myapp:1.2.3 # ✅ Pinned version (not :latest)
# Container-level security context
securityContext:
allowPrivilegeEscalation: false # ✅ No privilege escalation
readOnlyRootFilesystem: true # ✅ Immutable filesystem
capabilities:
drop:
- ALL # ✅ Drop all capabilities
# Resource limits (prevent DoS)
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
# Health checks
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Security Context Fields
Pod-level:
securityContext:
runAsNonRoot: true # Enforce non-root user
runAsUser: 1000 # UID to run as
runAsGroup: 3000 # GID to run as
fsGroup: 2000 # Volume ownership group
fsGroupChangePolicy: "OnRootMismatch"
seccompProfile:
type: RuntimeDefault # Seccomp profile
supplementalGroups: [4000]
Container-level:
securityContext:
allowPrivilegeEscalation: false
privileged: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE # Only if needed for ports <1024
seccompProfile:
type: RuntimeDefault
Common Security Anti-Patterns
❌ INSECURE (avoid):
spec:
containers:
- name: app
image: myapp:latest # ❌ Mutable tag
# No security context # ❌ Running as root
# No resource limits # ❌ Resource exhaustion risk
# No health checks # ❌ No failure detection
✅ SECURE (production-ready):
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
containers:
- name: app
image: myapp:1.2.3 # ✅ Immutable tag
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: [ALL]
resources:
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
Resource Requests and Limits Guidelines
Resource Specification
resources:
requests: # Guaranteed allocation
memory: "256Mi"
cpu: "250m"
limits: # Maximum allocation
memory: "512Mi"
cpu: "500m"
CPU Units:
1= 1 vCPU/core500m= 0.5 vCPU (500 millicores)100m= 0.1 vCPU
Memory Units:
Mi= Mebibytes (1024^2 bytes)Gi= Gibibytes (1024^3 bytes)M= Megabytes (1000^2 bytes)G= Gigabytes (1000^3 bytes)
Resource Guidelines by Workload
Small web service:
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
Medium application:
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
Large application:
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
Database/stateful:
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
QoS Classes
Kubernetes assigns QoS class based on resources:
-
Guaranteed (highest priority):
- Requests = Limits for all containers
- Last to be evicted
-
Burstable (medium priority):
- Requests < Limits
- Evicted after BestEffort
-
BestEffort (lowest priority):
- No requests or limits
- First to be evicted
Networking Basics
Service Types Comparison
| Type | Use Case | External Access | Cloud Cost |
|---|---|---|---|
| ClusterIP | Internal only | No | Free |
| NodePort | Development/testing | Yes (node:port) | Free |
| LoadBalancer | Production | Yes (load balancer) | Paid |
| ExternalName | External DNS | N/A | Free |
Service DNS
Services accessible via DNS:
# Within same namespace
<service-name>
# Across namespaces
<service-name>.<namespace>.svc.cluster.local
# Full FQDN
<service-name>.<namespace>.svc.cluster.local
Example:
# Service in namespace "default"
apiVersion: v1
kind: Service
metadata:
name: postgres
namespace: default
---
# Access from pod in any namespace
env:
- name: DB_HOST
value: postgres.default.svc.cluster.local
Headless Service
For direct pod access (StatefulSets):
apiVersion: v1
kind: Service
metadata:
name: postgres-headless
spec:
clusterIP: None # Headless
selector:
app: postgres
ports:
- port: 5432
Pod DNS:
<pod-name>.<service-name>.<namespace>.svc.cluster.local
Storage Overview
PersistentVolume (PV)
Cluster-level storage resource:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-data
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: gp3
hostPath:
path: /mnt/data
PersistentVolumeClaim (PVC)
Pod's request for storage:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: webapp-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: gp3
Usage in Pod:
spec:
containers:
- name: app
volumeMounts:
- name: data
mountPath: /app/data
volumes:
- name: data
persistentVolumeClaim:
claimName: webapp-pvc
Access Modes
ReadWriteOnce(RWO): Single node read-writeReadOnlyMany(ROX): Multiple nodes read-onlyReadWriteMany(RWX): Multiple nodes read-write
StorageClass
Dynamic provisioning template:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3
iopsPerGB: "50"
encrypted: "true"
reclaimPolicy: Delete
allowVolumeExpansion: true
RBAC Basics
Service Account
Pod identity:
apiVersion: v1
kind: ServiceAccount
metadata:
name: webapp-sa
namespace: production
Usage in Pod:
spec:
serviceAccountName: webapp-sa
containers:
- name: app
image: myapp:1.0.0
Role and RoleBinding
Namespace-scoped permissions:
# Role: Define permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-reader
namespace: production
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
# RoleBinding: Grant permissions to user/SA
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: production
subjects:
- kind: ServiceAccount
name: webapp-sa
namespace: production
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
ClusterRole and ClusterRoleBinding
Cluster-wide permissions:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cluster-reader
rules:
- apiGroups: [""]
resources: ["nodes", "namespaces"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: read-cluster
subjects:
- kind: ServiceAccount
name: admin-sa
namespace: kube-system
roleRef:
kind: ClusterRole
name: cluster-reader
apiGroup: rbac.authorization.k8s.io
RBAC Verbs
get: Retrieve individual resourcelist: List resourceswatch: Watch for changescreate: Create new resourcesupdate: Update existing resourcespatch: Partially update resourcesdelete: Delete resourcesdeletecollection: Delete multiple resources
Common kubectl Commands
Pod Management
# List pods
kubectl get pods
kubectl get pods -n production
kubectl get pods --all-namespaces
kubectl get pods -o wide # Show node and IP
# Describe pod
kubectl describe pod webapp-abc123
# View logs
kubectl logs webapp-abc123
kubectl logs webapp-abc123 -f # Follow logs
kubectl logs webapp-abc123 --previous # Previous container
kubectl logs webapp-abc123 -c container-name # Multi-container
# Execute command
kubectl exec -it webapp-abc123 -- /bin/sh
kubectl exec webapp-abc123 -- env
# Port forward
kubectl port-forward webapp-abc123 8080:80
kubectl port-forward svc/webapp 8080:80
# Delete pod
kubectl delete pod webapp-abc123
Deployment Management
# Create deployment
kubectl create deployment webapp --image=nginx:1.21
kubectl apply -f deployment.yaml
# Scale deployment
kubectl scale deployment webapp --replicas=5
# Update image
kubectl set image deployment/webapp app=nginx:1.22
# Rollout status
kubectl rollout status deployment/webapp
# Rollout history
kubectl rollout history deployment/webapp
# Rollback
kubectl rollout undo deployment/webapp
kubectl rollout undo deployment/webapp --to-revision=2
# Pause/resume
kubectl rollout pause deployment/webapp
kubectl rollout resume deployment/webapp
Service Management
# List services
kubectl get services
kubectl get svc
# Describe service
kubectl describe svc webapp
# Expose deployment
kubectl expose deployment webapp --port=80 --type=LoadBalancer
# Get endpoints
kubectl get endpoints webapp
Resource Information
# Get all resources
kubectl get all
kubectl get all -n production
# Get resource YAML
kubectl get deployment webapp -o yaml
kubectl get pod webapp-abc123 -o json
# Get resource with labels
kubectl get pods -l app=webapp
kubectl get pods --selector app=webapp,env=prod
# Get events
kubectl get events
kubectl get events --sort-by='.lastTimestamp'
# Top (resource usage)
kubectl top nodes
kubectl top pods
kubectl top pods -n production
Debugging
# Describe resource
kubectl describe pod webapp-abc123
kubectl describe node node-1
# Get cluster info
kubectl cluster-info
kubectl version
# API resources
kubectl api-resources
kubectl api-versions
# Explain resource
kubectl explain pod
kubectl explain pod.spec.containers
Health Checks and Probes
Liveness Probe
Determines if container should be restarted:
livenessProbe:
httpGet:
path: /health
port: 8080
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 30 # Wait before first check
periodSeconds: 10 # Check every 10s
timeoutSeconds: 5 # Timeout after 5s
failureThreshold: 3 # Restart after 3 failures
successThreshold: 1 # Success after 1 check
Readiness Probe
Determines if container should receive traffic:
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
successThreshold: 1
Startup Probe
For slow-starting containers:
startupProbe:
httpGet:
path: /startup
port: 8080
initialDelaySeconds: 0
periodSeconds: 10
failureThreshold: 30 # 5 minutes (30 * 10s)
Probe Types
HTTP GET:
httpGet:
path: /health
port: 8080
scheme: HTTP # or HTTPS
TCP Socket:
tcpSocket:
port: 8080
Exec Command:
exec:
command:
- cat
- /tmp/healthy
Probe Timing
initialDelaySeconds: Wait before first probe
periodSeconds: Time between probes
timeoutSeconds: Probe timeout
failureThreshold: Failures before action
successThreshold: Successes to recover
Configuration Management
Environment Variables
Direct values:
env:
- name: LOG_LEVEL
value: "info"
- name: APP_ENV
value: "production"
From ConfigMap:
env:
- name: DB_HOST
valueFrom:
configMapKeyRef:
name: app-config
key: database.host
From Secret:
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: app-secrets
key: db-password
From field:
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
EnvFrom (all keys)
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: app-secrets
- prefix: DB_
configMapRef:
name: database-config
Volume Mounts
ConfigMap as volume:
volumes:
- name: config
configMap:
name: app-config
items:
- key: app.conf
path: app.conf
volumeMounts:
- name: config
mountPath: /etc/config
readOnly: true
Secret as volume:
volumes:
- name: secrets
secret:
secretName: app-secrets
defaultMode: 0400 # Read-only
volumeMounts:
- name: secrets
mountPath: /etc/secrets
readOnly: true
Troubleshooting Quick Guide
Pod Issues
Pod Pending:
# Check events
kubectl describe pod <pod-name>
# Common causes:
# - Insufficient resources
# - PVC not bound
# - Node selector mismatch
# - Image pull errors
CrashLoopBackOff:
# Check logs
kubectl logs <pod-name>
kubectl logs <pod-name> --previous
# Common causes:
# - Application error
# - Missing environment variables
# - Failed liveness probe
# - Incorrect command/args
ImagePullBackOff:
# Describe pod
kubectl describe pod <pod-name>
# Common causes:
# - Wrong image name/tag
# - Private registry without credentials
# - Network issues
# - Rate limiting
Pod Stuck Terminating:
# Force delete
kubectl delete pod <pod-name> --grace-period=0 --force
# Check finalizers
kubectl get pod <pod-name> -o yaml | grep finalizers
Service Issues
Service not reachable:
# Check endpoints
kubectl get endpoints <service-name>
# Check selector labels
kubectl get pods --show-labels
kubectl describe service <service-name>
# Test from another pod
kubectl run test --rm -it --image=curlimages/curl -- \
curl http://<service-name>.<namespace>.svc.cluster.local
Resource Issues
Out of resources:
# Check node resources
kubectl top nodes
kubectl describe nodes
# Check pod resources
kubectl top pods
kubectl describe pod <pod-name>
Debug Commands
# Run debug pod
kubectl run debug --rm -it --image=busybox -- /bin/sh
# Copy files from pod
kubectl cp <pod-name>:/path/to/file ./local-file
# Ephemeral debug container
kubectl debug <pod-name> -it --image=busybox
Next Steps
For Advanced Patterns:
- See REFERENCE.md for comprehensive guide with 20+ production examples
- Covers: StatefulSets, DaemonSets, Jobs, CronJobs, HPA/VPA, Network Policies, advanced RBAC, observability
Common Use Cases:
- Stateful applications → REFERENCE.md § StatefulSets
- Cluster-wide services → REFERENCE.md § DaemonSets
- Batch processing → REFERENCE.md § Jobs and CronJobs
- Auto-scaling → REFERENCE.md § HPA and VPA
- Network isolation → REFERENCE.md § Network Policies
- Production monitoring → REFERENCE.md § Observability
Progressive Disclosure: Start here for quick reference, load REFERENCE.md for comprehensive patterns and production examples.
Performance Target: <100ms skill loading (this file ~70KB)
Last Updated: 2025-10-23 | Version: 1.0.0
Didn't find tool you were looking for?