loki-troubleshoot

Help craft efficient Grafana Loki / LogQL queries for debugging logs — with label‑based filtering, narrow time windows, and best‑practice guidance to avoid expensive or overly broad scans.

Stars 0

Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/timbuchinger/loadout/tree/main/skills/loki-troubleshoot

SKILL.md

Loki Troubleshooting & Query‑Builder Skill

What this Skill does

When a user asks you (Claude) for help inspecting logs via Loki — e.g. “find errors in service X”, “search for request_id=abc123”, “show me 500 responses between 2‑3 PM last night” — use this skill to:

Ask clarifying questions if needed (e.g. which service / label context, approx time window, any known label names).
Build a recommended LogQL query that:
- uses low‑cardinality labels (service, env, host, etc.) whenever possible
- confines to a narrow time window (minutes — hours, not days) by default
- uses fast filters (|= or |!=) rather than regex or heavy parsing
- performs JSON/structured parsing only after filtering
Provide human-readable explanations and warnings for expensive queries
Suggest alternative strategies (metrics, recording rules, structured logs)

Best Practices

Prefer stable labels (app, env, namespace, cluster, etc.)
Avoid high-cardinality labels (user_id, request_id, UUIDs)
Always restrict the time window
Use |= before regex or parsing
Provide warnings for:
- No label selector
- Very wide time windows
- JSON parse without pre-filter
- Regex-heavy queries

Example Queries

User Request	Resulting LogQL
"Show me all 500 errors in the `orders` service in the last 2 hours."	`{app="orders", env="prod"}[2h] \|= "500"`
"Find logs for request_id `abc123` in staging for `payments` service over past 30 minutes."	`{app="payments", env="staging"}[30m] \|= "abc123"`

Limitations

This skill does not execute queries