Agent skill
deploying-cloud-k8s
Deploys applications to cloud Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines. Use when deploying to production, setting up GitHub Actions, troubleshooting deployments. Covers build-time vs runtime vars, architecture matching, and battle-tested debugging.
Install this agent skill to your Project
npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/asmayaseen/deploying-cloud-k8s
SKILL.md
Deploying Cloud K8s
Quick Start
- Check cluster architecture:
kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.architecture}' - Match build platform to cluster (arm64 vs amd64)
- Set up GitHub Actions with path filters
- Deploy with Helm, passing secrets via
--set
Critical: Build-Time vs Runtime Variables
The Problem
Next.js NEXT_PUBLIC_* variables are embedded at build time, not runtime:
# WRONG: Runtime ENV does nothing for NEXT_PUBLIC_*
ENV NEXT_PUBLIC_API_URL=https://api.example.com
# RIGHT: Must be build ARG
ARG NEXT_PUBLIC_API_URL=https://api.example.com
ENV NEXT_PUBLIC_API_URL=$NEXT_PUBLIC_API_URL
Build-Time (Next.js)
| Variable | Purpose |
|---|---|
NEXT_PUBLIC_SSO_URL |
SSO endpoint for browser OAuth |
NEXT_PUBLIC_API_URL |
API endpoint for browser fetch |
NEXT_PUBLIC_APP_URL |
App URL for redirects |
Runtime (ConfigMaps/Secrets)
| Variable | Source |
|---|---|
DATABASE_URL |
Secret (Neon/managed DB) |
SSO_URL |
ConfigMap (internal K8s: http://sso:3001) |
BETTER_AUTH_SECRET |
Secret |
Architecture Matching
BEFORE ANY DEPLOYMENT, check architecture:
kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.architecture}'
# Output: arm64 arm64 OR amd64 amd64
Docker Build
- uses: docker/build-push-action@v5
with:
platforms: linux/arm64 # MATCH YOUR CLUSTER!
provenance: false # Avoid manifest issues
no-cache: true # When debugging
Why provenance: false? Buildx attestation creates complex manifest lists that cause "no match for platform" errors.
GitHub Actions CI/CD
Selective Builds with Path Filters
jobs:
changes:
runs-on: ubuntu-latest
outputs:
api: ${{ steps.filter.outputs.api }}
web: ${{ steps.filter.outputs.web }}
steps:
- uses: dorny/paths-filter@v3
id: filter
with:
filters: |
api:
- 'apps/api/**'
web:
- 'apps/web/**'
build-api:
needs: changes
if: needs.changes.outputs.api == 'true'
Next.js Build Args
- name: Build and push (web)
uses: docker/build-push-action@v5
with:
build-args: |
NEXT_PUBLIC_SSO_URL=https://sso.${{ vars.DOMAIN }}
NEXT_PUBLIC_API_URL=https://api.${{ vars.DOMAIN }}
Helm Deployment
- name: Deploy
run: |
helm upgrade --install myapp ./helm/myapp \
--set global.imageTag=${{ github.sha }} \
--set "secrets.databaseUrl=${{ secrets.DATABASE_URL }}" \
--set "secrets.authSecret=${{ secrets.BETTER_AUTH_SECRET }}"
Troubleshooting Guide
Quick Diagnosis Flow
Pod not running?
│
├─► ImagePullBackOff
│ ├─► "not found" ──► Wrong tag or registry
│ ├─► "unauthorized" ──► Auth/imagePullSecrets
│ └─► "no match for platform" ──► Architecture mismatch
│
├─► CrashLoopBackOff
│ ├─► "exec format error" ──► Wrong CPU architecture
│ ├─► Exit code 1 ──► App startup failure
│ └─► OOMKilled ──► Memory limits too low
│
└─► Pending
├─► Insufficient resources ──► Scale cluster
└─► No matching node ──► Check nodeSelector
Diagnostic Commands
kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace> | grep -E "(Image:|Failed|Error)"
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20
kubectl logs <pod-name> -n <namespace> --tail=50
Error: ImagePullBackOff "not found"
Causes:
- Tag doesn't exist (short vs full SHA)
- Wrong registry path
- Builds skipped by path filters
Fix: Verify image was pushed with exact tag used in deployment
Error: "no match for platform in manifest"
Cause: Image built for wrong architecture OR buildx provenance issue
Fix:
platforms: linux/arm64 # Match cluster!
provenance: false # Simple manifest
no-cache: true # Force rebuild
Error: "exec format error"
Cause: Binary architecture doesn't match node
Fix: Rebuild with correct platform, use no-cache: true
Error: Helm comma parsing
failed parsing --set data: key "com" has no value
Cause: Helm interprets commas as array separators
Fix: Use heredoc values file:
- name: Deploy
run: |
cat > /tmp/overrides.yaml << EOF
sso:
env:
ALLOWED_ORIGINS: "https://a.com,https://b.com"
EOF
helm upgrade --install app ./chart --values /tmp/overrides.yaml
Error: Password authentication failed
Cause: Password with special characters (base64 +/=)
Fix: Use hex passwords:
# Wrong
openssl rand -base64 16 # Can have +/=
# Right
openssl rand -hex 16 # Alphanumeric only
Error: Logout redirects to 0.0.0.0
Cause: request.url returns container bind address
Fix:
const APP_URL = process.env.NEXT_PUBLIC_APP_URL || "http://localhost:3000";
const response = NextResponse.redirect(new URL("/", APP_URL));
Pre-Deployment Checklist
Architecture
- Checked cluster node architecture
- Build platform matches cluster
Docker Build
-
provenance: falseset -
platforms: linux/<arch>matches cluster - Image tags consistent between build and deploy
CI/CD
- All
NEXT_PUBLIC_*as build args - Secrets passed via
--set(not in values.yaml) - Path filters configured
Helm
- No commas in
--setvalues - Internal K8s service names for inter-service communication
- Password single source of truth in values.yaml
Production Debugging
Trace Request Path
# 1. Frontend logs
kubectl logs deploy/web -n myapp --tail=50
# 2. API logs
kubectl logs deploy/api -n myapp --tail=100 | grep -i error
# 3. Sidecar logs (Dapr, etc.)
kubectl logs deploy/api -n myapp -c daprd --tail=50
Common Bug Patterns
| Error | Likely Cause |
|---|---|
AttributeError: no attribute 'X' |
Model/schema mismatch |
404 Not Found on internal call |
Wrong endpoint URL |
| Times off by hours | Timezone handling bug |
greenlet_spawn not called |
Async SQLAlchemy pattern |
GitOps with ArgoCD
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
spec:
source:
repoURL: https://github.com/org/repo.git
path: k8s/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: myapp
syncPolicy:
automated:
prune: true # Delete resources not in Git
selfHeal: true # Fix drift automatically
Observability
# ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp
spec:
selector:
matchLabels:
app: myapp
endpoints:
- port: metrics
interval: 30s
Security
# Pod Security Context
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
Resilience
# HPA + PDB
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
---
apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
minAvailable: 1
See references/production-patterns.md for full GitOps, observability, security, and resilience patterns.
Verification
Run: python scripts/verify.py
Related Skills
containerizing-applications- Docker and Helm chartsoperating-k8s-local- Local Kubernetes with Minikubebuilding-nextjs-apps- Next.js patterns
References
- references/production-patterns.md - GitOps, ArgoCD, Prometheus, RBAC, HPA, PDB
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
perigon-backend
Perigon ASP.NET Core + EF Core + Aspire conventions
perigon-agent
Pointers for Copilot/agents to apply Perigon conventions
perigon-angular
Angular 21+ standalone/Material/signal conventions for Perigon WebApp
fastapi-mastery
Comprehensive FastAPI development skill covering REST API creation, routing, request/response handling, validation, authentication, database integration, middleware, and deployment. Use when working with FastAPI projects, building APIs, implementing CRUD operations, setting up authentication/authorization, integrating databases (SQL/NoSQL), adding middleware, handling WebSockets, or deploying FastAPI applications. Triggered by requests involving .py files with FastAPI code, API endpoint creation, Pydantic models, or FastAPI-specific features.
context7-efficient
Token-efficient library documentation fetcher using Context7 MCP with 86.8% token savings through intelligent shell pipeline filtering. Fetches code examples, API references, and best practices for JavaScript, Python, Go, Rust, and other libraries. Use when users ask about library documentation, need code examples, want API usage patterns, are learning a new framework, need syntax reference, or troubleshooting with library-specific information. Triggers include questions like "Show me React hooks", "How do I use Prisma", "What's the Next.js routing syntax", or any request for library/framework documentation.
browser-use
Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.
Didn't find tool you were looking for?