Agent skill

gcp-troubleshoot

Troubleshoot GCP services using tool-first access (via MCP when available), falling back to the CLI only when necessary. Focus on Firestore, Cloud Run, networking, load balancers, IAM, Pub/Sub, Cloud SQL, and Storage.

Stars 0
Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/timbuchinger/loadout/tree/main/skills/gcp-troubleshoot

SKILL.md

GCP Troubleshooting Skill

General Guidance

Always attempt to investigate issues using tool-based access first (MCP tools if configured).
Only fall back to the GCP CLI (gcloud) when the tool cannot access required logs, metrics, or audit data.

All investigations should:

  1. Scope queries by service/resource type
  2. Restrict by time window
  3. Prefer targeted logs/metrics, not full dumps
  4. Diagnose root cause based on error type
  5. Suggest minimal, safe remediation steps

Core Services Covered

Firestore

Common issues:

  • PERMISSION_DENIED
  • Missing indexes
  • Transaction contention
  • Quota exceeded

Investigations:

  • Query Firestore logs filtered by resource.type="firestore_database"
  • Check latency, retries, aborted transactions

Cloud Run

Common issues:

  • Startup failures
  • Crash loops
  • IAM failures calling other services
  • Cold starts

Investigations:

  • Query Cloud Run logs (resource.type="cloud_run_revision")
  • Check revision rollout history
  • Look for Cloud SQL connector errors or storage access failures

Networking & Load Balancers

Common issues:

  • 5xx responses
  • Backend connection errors
  • Firewall denies

Investigations:

  • Query load balancer logs (resource.type="http_load_balancer")
  • Inspect backend health logs
  • Check VPC routes + firewall rules

IAM

Common issues:

  • PermissionDenied
  • Missing service account roles

Investigations:

  • Query audit logs: protoPayload.status.code != 0
  • Identify the principal, resource, and role mismatch

Pub/Sub

Common issues:

  • Failed push deliveries
  • Ack deadlines exceeded
  • DLQ accumulation

Investigations:

  • Filter subscription logs
  • Inspect subscriber errors and endpoint failures

Cloud SQL

Common issues:

  • Connection limit reached
  • Auth failures
  • Private network routing failures

Investigations:

  • Cloud SQL logs (resource.type="cloudsql_database")
  • Check database flags, failover events, connection counts

Storage Buckets

Common issues:

  • 403 Forbidden
  • Precondition checks
  • Signed URL failures

Investigations:

  • Inspect Storage logs (resource.type="gcs_bucket")
  • Check IAM, bucket policies, object existence

Workflow

  1. Identify target resource
  2. Query scoped logs
  3. Query metrics
  4. Query audit logs if access or permission failures occur
  5. Interpret patterns
  6. Suggest actionable fixes

When to Warn

  • Unscoped log queries
  • Very wide time ranges
  • Requests requiring IAM escalation

Expand your agent's capabilities with these related and highly-rated skills.

timbuchinger/loadout

brainstorming

Use when creating or developing, before writing code or implementation plans - refines rough ideas into fully-formed designs through collaborative questioning, alternative exploration, and incremental validation. Don't use during clear 'mechanical' processes

0 0
Explore
timbuchinger/loadout

add-note

Use this skill whenever important information is learned during a task or when the user explicitly asks to store something. Use when users ask to remember. Triggers on "remember this", "update memory", "share" or any persistent storage request.

0 0
Explore
timbuchinger/loadout

user-story

Creates well-structured user stories for software development and project management. Use when the user asks to write, create, or format a user story, or needs to document requirements, features, or tasks in user story format.

0 0
Explore
timbuchinger/loadout

test-driven-development

Use when implementing any feature or bugfix, before writing implementation code - write the test first, watch it fail, write minimal code to pass; ensures tests actually verify behavior by requiring failure first

0 0
Explore
timbuchinger/loadout

kubernetes-troubleshoot

Troubleshoot and manage Kubernetes clusters, including resource inspection, debugging, pod logs, events, and cluster operations. Use when the user needs to diagnose issues, inspect workloads, analyze pod failures, or perform Kubernetes cluster operations.

0 0
Explore
timbuchinger/loadout

writing-plans

Use when design is complete and you need detailed implementation tasks - creates comprehensive implementation plans with exact file paths, complete code examples, and verification steps assuming minimal codebase familiarity

0 0
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results