Agent skill

gke-expert

Expert guidance for Google Kubernetes Engine (GKE) operations including cluster management, workload deployment, scaling, monitoring, troubleshooting, and optimization. Use when working with GKE clusters, Kubernetes deployments on GCP, container orchestration, or when users need help with kubectl commands, GKE networking, autoscaling, workload identity, or GKE-specific features like Autopilot, Binary Authorization, or Config Sync.

View SKILL.md on GitHub Repository

Stars 22

Forks 9

Install this agent skill to your Project

npx add-skill https://github.com/AdminTurnedDevOps/agentic-demo-repo/tree/main/agentregistry/gke-expert/gke-expert

SKILL.md

GKE Expert

Initial Assessment When user requests GKE help, determine:

Cluster type: Autopilot or Standard? Task: Create, Deploy, Scale, Troubleshoot, or Optimize? Environment: Dev, Staging, or Production?

Quick Start Workflows Create Cluster Autopilot (recommended for most): bashgcloud container clusters create-auto CLUSTER_NAME
--region=REGION
--release-channel=regular Standard (for specific node requirements): bashgcloud container clusters create CLUSTER_NAME
--zone=ZONE
--num-nodes=3
--enable-autoscaling
--min-nodes=2
--max-nodes=10 Always authenticate after creation: bashgcloud container clusters get-credentials CLUSTER_NAME --region=REGION Deploy Application

Create deployment manifest:

yamlapiVersion: apps/v1 kind: Deployment metadata: name: APP_NAME spec: replicas: 3 selector: matchLabels: app: APP_NAME template: metadata: labels: app: APP_NAME spec: containers: - name: APP_NAME image: gcr.io/PROJECT_ID/IMAGE:TAG ports: - containerPort: 8080 resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 512Mi

Apply and expose:

bashkubectl apply -f deployment.yaml kubectl expose deployment APP_NAME --type=LoadBalancer --port=80 --target-port=8080 Setup Autoscaling HPA for pods: bashkubectl autoscale deployment APP_NAME --cpu-percent=70 --min=2 --max=100 Cluster autoscaling (Standard only): bashgcloud container clusters update CLUSTER_NAME
--enable-autoscaling --min-nodes=2 --max-nodes=10 --zone=ZONE Configure Workload Identity

Enable on cluster:

bashgcloud container clusters update CLUSTER_NAME
--workload-pool=PROJECT_ID.svc.id.goog

Link service accounts:

bash# Create GCP service account gcloud iam service-accounts create GSA_NAME

Create K8s service account

kubectl create serviceaccount KSA_NAME

Bind them

gcloud iam service-accounts add-iam-policy-binding
GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
--role roles/iam.workloadIdentityUser
--member "serviceAccount:PROJECT_ID.svc.id.goog[default/KSA_NAME]"

Annotate K8s SA

kubectl annotate serviceaccount KSA_NAME
iam.gke.io/gcp-service-account=GSA_NAME@PROJECT_ID.iam.gserviceaccount.com Troubleshooting Guide Pod Issues bash# Pod not starting - check events kubectl describe pod POD_NAME kubectl get events --field-selector involvedObject.name=POD_NAME

Common fixes:

ImagePullBackOff: Check image exists and pull secrets

CrashLoopBackOff: kubectl logs POD_NAME --previous

Pending: kubectl describe nodes (check resources)

OOMKilled: Increase memory limits

Service Issues bash# No endpoints kubectl get endpoints SERVICE_NAME kubectl get pods -l app=APP_NAME # Check if pods match selector

Test connectivity

kubectl run test --image=busybox -it --rm -- wget -O- SERVICE_NAME Performance Issues bash# Check resource usage kubectl top nodes kubectl top pods --all-namespaces

Find bottlenecks

kubectl describe resourcequotas kubectl describe limitranges Production Patterns Ingress with HTTPS yamlapiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: APP_NAME-ingress annotations: networking.gke.io/managed-certificates: "CERT_NAME" spec: rules:

host: example.com http: paths:
- path: / pathType: Prefix backend: service: name: APP_NAME port: number: 80 Pod Disruption Budget yamlapiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: APP_NAME-pdb spec: minAvailable: 1 selector: matchLabels: app: APP_NAME Security Context yamlspec: securityContext: runAsNonRoot: true runAsUser: 1000 containers:
name: app securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: ["ALL"] Cost Optimization

Use Autopilot for automatic right-sizing Enable cluster autoscaling with appropriate limits Use Spot VMs for non-critical workloads:

bashgcloud container node-pools create spot-pool
--cluster=CLUSTER_NAME
--spot
--num-nodes=2

Set resource requests/limits appropriately Use VPA for recommendations: kubectl describe vpa APP_NAME-vpa

Essential Commands bash# Cluster management gcloud container clusters list kubectl config get-contexts kubectl cluster-info

Deployments

kubectl rollout status deployment/APP_NAME kubectl rollout undo deployment/APP_NAME kubectl scale deployment APP_NAME --replicas=5

Debugging

kubectl logs -f POD_NAME --tail=50 kubectl exec -it POD_NAME -- /bin/bash kubectl port-forward pod/POD_NAME 8080:80

Monitoring

kubectl top nodes kubectl top pods kubectl get events --sort-by='.lastTimestamp'

External Documentation

For detailed documentation beyond this skill:

Official GKE Docs: https://cloud.google.com/kubernetes-engine/docs
kubectl Reference: https://kubernetes.io/docs/reference/kubectl/
GKE Best Practices: https://cloud.google.com/kubernetes-engine/docs/best-practices
Workload Identity: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
GKE Pricing Calculator: https://cloud.google.com/products/calculator

Cleanup

kubectl delete all -l app=APP_NAME kubectl drain NODE_NAME --ignore-daemonsets Advanced Topics Reference

For complex scenarios, consult:

Stateful workloads: Use StatefulSets with persistent volumes Batch jobs: Use Jobs/CronJobs with appropriate backoff policies Multi-region: Use Multi-cluster Ingress or Traffic Director Service mesh: Install Anthos Service Mesh for advanced networking GitOps: Implement Config Sync or Flux for declarative management Monitoring: Integrate with Cloud Monitoring or install Prometheus

Maintainer

AdminTurnedDevOps Core maintainer

Source details

Full Name: AdminTurnedDevOps/agentic-demo-repo
Branch: main
Path in repo: agentregistry/gke-expert/gke-expert

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

AdminTurnedDevOps/agentic-demo-repo

agentgateway-expert

Expert guidance for Agent Gateway design, configuration, and troubleshooting across Solo enterprise 2.1.x and OSS Kubernetes latest. Use when Codex needs to create, review, or debug Kubernetes Gateway API and Agent Gateway resources such as Gateway, HTTPRoute, AgentgatewayBackend, AgentgatewayPolicy, and EnterpriseAgentgatewayPolicy; implement LLM routing/failover, prompt guards, MCP connectivity/auth/tool access, and observability; or map requirements to working manifests by reusing examples from this repository plus docs.solo.io/agentgateway/2.1.x and agentgateway.dev/docs/kubernetes/latest.

22 9

Explore

AdminTurnedDevOps/agentic-demo-repo

kagent-platform

Install, configure, use, debug, and troubleshoot kagent OSS and Solo Enterprise for kagent on Kubernetes. Use when Codex needs to author or review kagent manifests, Helm values, model and MCP server configuration, agent prompts or skills, or diagnose runtime, authn, and authz issues across OSS and Enterprise deployments, including AccessPolicy, OIDC, management/workload topology, and repo-versus-doc API drift.

22 9

Explore

mattpocock/skills

edit-article

Edit and improve articles by restructuring sections, improving clarity, and tightening prose. Use when user wants to edit, revise, or improve an article draft.

111,310 9,758

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

GKE Expert

Create K8s service account

Bind them

Annotate K8s SA

Common fixes:

ImagePullBackOff: Check image exists and pull secrets

CrashLoopBackOff: kubectl logs POD_NAME --previous

Pending: kubectl describe nodes (check resources)

OOMKilled: Increase memory limits

Test connectivity

Find bottlenecks

Deployments

Debugging

Monitoring

External Documentation

Cleanup

For complex scenarios, consult:

Recommended Agent Skills

gke-expert

agentgatwayskill

k8sskill

agentgateway-expert

kagent-platform

edit-article