Agent skill
gcp-troubleshoot
Troubleshoot GCP services using tool-first access (via MCP when available), falling back to the CLI only when necessary. Focus on Firestore, Cloud Run, networking, load balancers, IAM, Pub/Sub, Cloud SQL, and Storage.
Install this agent skill to your Project
npx add-skill https://github.com/timbuchinger/loadout/tree/main/skills/gcp-troubleshoot
SKILL.md
GCP Troubleshooting Skill
General Guidance
Always attempt to investigate issues using tool-based access first (MCP tools if configured).
Only fall back to the GCP CLI (gcloud) when the tool cannot access required logs, metrics, or audit data.
All investigations should:
- Scope queries by service/resource type
- Restrict by time window
- Prefer targeted logs/metrics, not full dumps
- Diagnose root cause based on error type
- Suggest minimal, safe remediation steps
Core Services Covered
Firestore
Common issues:
- PERMISSION_DENIED
- Missing indexes
- Transaction contention
- Quota exceeded
Investigations:
- Query Firestore logs filtered by
resource.type="firestore_database" - Check latency, retries, aborted transactions
Cloud Run
Common issues:
- Startup failures
- Crash loops
- IAM failures calling other services
- Cold starts
Investigations:
- Query Cloud Run logs (
resource.type="cloud_run_revision") - Check revision rollout history
- Look for Cloud SQL connector errors or storage access failures
Networking & Load Balancers
Common issues:
- 5xx responses
- Backend connection errors
- Firewall denies
Investigations:
- Query load balancer logs (
resource.type="http_load_balancer") - Inspect backend health logs
- Check VPC routes + firewall rules
IAM
Common issues:
- PermissionDenied
- Missing service account roles
Investigations:
- Query audit logs:
protoPayload.status.code != 0 - Identify the principal, resource, and role mismatch
Pub/Sub
Common issues:
- Failed push deliveries
- Ack deadlines exceeded
- DLQ accumulation
Investigations:
- Filter subscription logs
- Inspect subscriber errors and endpoint failures
Cloud SQL
Common issues:
- Connection limit reached
- Auth failures
- Private network routing failures
Investigations:
- Cloud SQL logs (
resource.type="cloudsql_database") - Check database flags, failover events, connection counts
Storage Buckets
Common issues:
- 403 Forbidden
- Precondition checks
- Signed URL failures
Investigations:
- Inspect Storage logs (
resource.type="gcs_bucket") - Check IAM, bucket policies, object existence
Workflow
- Identify target resource
- Query scoped logs
- Query metrics
- Query audit logs if access or permission failures occur
- Interpret patterns
- Suggest actionable fixes
When to Warn
- Unscoped log queries
- Very wide time ranges
- Requests requiring IAM escalation
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
brainstorming
Use when creating or developing, before writing code or implementation plans - refines rough ideas into fully-formed designs through collaborative questioning, alternative exploration, and incremental validation. Don't use during clear 'mechanical' processes
add-note
Use this skill whenever important information is learned during a task or when the user explicitly asks to store something. Use when users ask to remember. Triggers on "remember this", "update memory", "share" or any persistent storage request.
user-story
Creates well-structured user stories for software development and project management. Use when the user asks to write, create, or format a user story, or needs to document requirements, features, or tasks in user story format.
test-driven-development
Use when implementing any feature or bugfix, before writing implementation code - write the test first, watch it fail, write minimal code to pass; ensures tests actually verify behavior by requiring failure first
kubernetes-troubleshoot
Troubleshoot and manage Kubernetes clusters, including resource inspection, debugging, pod logs, events, and cluster operations. Use when the user needs to diagnose issues, inspect workloads, analyze pod failures, or perform Kubernetes cluster operations.
writing-plans
Use when design is complete and you need detailed implementation tasks - creates comprehensive implementation plans with exact file paths, complete code examples, and verification steps assuming minimal codebase familiarity
Didn't find tool you were looking for?