Agent skill
aks-deployment
Deploying and debugging Toygres on AKS (Azure Kubernetes Service). Use when deploying, debugging pods, viewing logs, troubleshooting SSL, or managing Kubernetes resources.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/aks-deployment
SKILL.md
AKS Deployment & Debugging
Deployment
# Full deploy with HTTPS
./deploy/deploy-to-aks.sh --https
# Just restart to pick up new images
kubectl rollout restart deployment/toygres-server -n toygres-system
kubectl rollout status deployment/toygres-server -n toygres-system
Viewing Logs
# Server logs
kubectl logs -n toygres-system -l app.kubernetes.io/component=server -f
# UI logs
kubectl logs -n toygres-system -l app.kubernetes.io/component=ui -f
# Previous crashed pod
kubectl logs -n toygres-system <pod-name> --previous
Pod Management
# List pods
kubectl get pods -n toygres-system
# Describe pod (see events, errors)
kubectl describe pod <pod-name> -n toygres-system
# Exec into pod
kubectl exec -it <pod-name> -n toygres-system -- /bin/sh
# Delete pod (will restart)
kubectl delete pod <pod-name> -n toygres-system
Common Issues
Pod CrashLoopBackOff
# Check logs for crash reason
kubectl logs <pod-name> -n toygres-system --previous
# Common causes:
# - DATABASE_URL not set or wrong
# - Missing secrets
# - Port already in use
Image Not Updating
# Force pull latest image
kubectl rollout restart deployment/toygres-server -n toygres-system
# Or delete pod directly
kubectl delete pod -n toygres-system -l app.kubernetes.io/component=server
SSL Certificate Issues
# Check cert-manager
kubectl get certificate -n toygres-system
kubectl describe certificate toygres-tls -n toygres-system
# Check ingress
kubectl get ingress -n toygres-system
kubectl describe ingress toygres-ingress -n toygres-system
Azure Workload Identity / azcopy 403 Errors
If azcopy login --identity succeeds but operations fail with 403 AuthorizationPermissionMismatch:
Root cause: azcopy --identity uses VM-based managed identity (IMDS), not AKS workload identity.
Fix: Use --login-type=workload explicitly:
# Wrong (uses IMDS, fails on AKS)
azcopy login --identity
# Correct (uses federated token)
azcopy login --login-type=workload
Debug workload identity:
# Check env vars are injected
kubectl exec <pod> -- env | grep AZURE_
# Should see:
# AZURE_CLIENT_ID=...
# AZURE_TENANT_ID=...
# AZURE_FEDERATED_TOKEN_FILE=/var/run/secrets/azure/tokens/azure-identity-token
# Test with az cli (uses federated token correctly)
az login --federated-token "$(cat $AZURE_FEDERATED_TOKEN_FILE)" \
--service-principal -u $AZURE_CLIENT_ID -t $AZURE_TENANT_ID
az storage blob list --account-name <acct> --container-name <container> --auth-mode login
Azure LoadBalancer DNS Propagation
Problem: Instance provisioning fails at test_connection even though service is created.
Root cause: Azure DNS propagation for LoadBalancer services takes 60-90+ seconds after IP is assigned.
Timeline:
- LoadBalancer created → IP assigned (10-30s)
- DNS record created → DNS propagates (30-60+ additional seconds)
- Total wait time can be 60-90+ seconds
Fix: Use 120s timeout for connection tests, not 60s:
// In orchestrations
RetryPolicy::new(5)
.with_timeout(Duration::from_secs(120)) // Not 60s!
Debug DNS propagation:
# Check if service has external IP
kubectl get svc -n toygres-managed <svc-name>
# Test DNS resolution
nslookup <dns-label>.westus2.cloudapp.azure.com
# Watch for IP assignment
kubectl get svc -n toygres-managed -w
Local Testing Before Deploy
# Pause AKS server
kubectl scale deployment toygres-server -n toygres-system --replicas=0
# Run locally
./scripts/start-control-plane.sh
# Test at http://localhost:3000
# Resume AKS
kubectl scale deployment toygres-server -n toygres-system --replicas=1
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?