Agent skill

loki-troubleshoot

Help craft efficient Grafana Loki / LogQL queries for debugging logs — with label‑based filtering, narrow time windows, and best‑practice guidance to avoid expensive or overly broad scans.

Stars 0
Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/timbuchinger/loadout/tree/main/skills/loki-troubleshoot

SKILL.md

Loki Troubleshooting & Query‑Builder Skill

What this Skill does

When a user asks you (Claude) for help inspecting logs via Loki — e.g. “find errors in service X”, “search for request_id=abc123”, “show me 500 responses between 2‑3 PM last night” — use this skill to:

  • Ask clarifying questions if needed (e.g. which service / label context, approx time window, any known label names).
  • Build a recommended LogQL query that:
    • uses low‑cardinality labels (service, env, host, etc.) whenever possible
    • confines to a narrow time window (minutes — hours, not days) by default
    • uses fast filters (|= or |!=) rather than regex or heavy parsing
    • performs JSON/structured parsing only after filtering
  • Provide human-readable explanations and warnings for expensive queries
  • Suggest alternative strategies (metrics, recording rules, structured logs)

Best Practices

  • Prefer stable labels (app, env, namespace, cluster, etc.)
  • Avoid high-cardinality labels (user_id, request_id, UUIDs)
  • Always restrict the time window
  • Use |= before regex or parsing
  • Provide warnings for:
    • No label selector
    • Very wide time windows
    • JSON parse without pre-filter
    • Regex-heavy queries

Example Queries

User Request Resulting LogQL
"Show me all 500 errors in the orders service in the last 2 hours." {app="orders", env="prod"}[2h] |= "500"
"Find logs for request_id abc123 in staging for payments service over past 30 minutes." {app="payments", env="staging"}[30m] |= "abc123"

Limitations

  • This skill does not execute queries

Didn't find tool you were looking for?

Be as detailed as possible for better results