Agent skill
observability-alert-manager
Configure Grafana alerts for Claude Code anomalies and thresholds. Use when setting up monitoring alerts for sessions, errors, context usage, or subagents.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/observability-alert-manager
SKILL.md
Observability Alert Manager
Configure and manage Grafana alerts for Claude Code monitoring using enhanced telemetry.
Data Source
Primary: {job="claude_code_enhanced"} in Loki
Operations
create-alert
Define new alert rule. Parameters: name, query (LogQL), threshold, duration, severity, notification.
list-alerts
Show all configured alerts and their status.
test-alert
Simulate alert conditions.
delete-alert
Remove alert rule.
Pre-built Alert Templates
Session Alerts
-
Long Session Duration: Session >1 hour
logql{job="claude_code_enhanced", event_type="session_end"} | json | duration_seconds > 3600 -
High Turn Count: Session >50 turns
logql{job="claude_code_enhanced", event_type="session_end"} | json | turn_count > 50 -
Session Error Spike: >5 errors in session
logql{job="claude_code_enhanced", event_type="session_end"} | json | error_count > 5
Error Alerts
-
High Error Rate: >5 errors/hour
logqlcount_over_time({job="claude_code_enhanced", event_type="tool_result", status="error"} [1h]) > 5 -
Specific Tool Failures: Bash errors
logqlcount_over_time({job="claude_code_enhanced", event_type="tool_result", status="error", tool="Bash"} [1h]) > 3
Context Alerts
-
High Context Usage: >80% context window
logql{job="claude_code_enhanced", event_type="context_utilization"} | json | context_percentage > 80 -
Auto Compaction Triggered: Context full
logql{job="claude_code_enhanced", event_type="context_compact", trigger="auto"}
Subagent Alerts
- Excessive Subagent Spawning: >10 subagents/session
logql
{job="claude_code_enhanced", event_type="session_end"} | json | subagents_spawned > 10
Activity Alerts
-
Telemetry Staleness: No data >10min
logqlabsent_over_time({job="claude_code_enhanced"} [10m]) -
Unusual Activity Spike: >100 tool calls/hour
logqlcount_over_time({job="claude_code_enhanced", event_type="tool_call"} [1h]) > 100
Prompt Pattern Alerts
- Debugging Session Spike: Many debugging prompts
logql
count_over_time({job="claude_code_enhanced", event_type="user_prompt", pattern="debugging"} [1h]) > 10
Example Alert Configurations
Create High Error Rate Alert
create-alert \
--name "High Error Rate" \
--query 'count_over_time({job="claude_code_enhanced", event_type="tool_result", status="error"} [1h]) > 5' \
--severity warning \
--notification slack
Create Context Usage Alert
create-alert \
--name "High Context Usage" \
--query '{job="claude_code_enhanced", event_type="context_utilization"} | json | context_percentage > 80' \
--severity info \
--notification email
Create Session Duration Alert
create-alert \
--name "Long Session Warning" \
--query '{job="claude_code_enhanced", event_type="session_end"} | json | duration_seconds > 3600' \
--severity info \
--notification dashboard
Grafana Alert Setup
Via Grafana UI
- Navigate to Alerting → Alert rules
- Create new rule with Loki data source
- Enter LogQL query from templates above
- Configure conditions and notifications
Via API
curl -X POST http://localhost:3000/api/ruler/grafana/api/v1/rules/claude-code \
-H "Content-Type: application/json" \
-u admin:admin \
-d '{
"name": "claude-code-alerts",
"rules": [
{
"alert": "HighErrorRate",
"expr": "count_over_time({job=\"claude_code_enhanced\", status=\"error\"} [1h]) > 5",
"for": "5m",
"labels": {"severity": "warning"},
"annotations": {"summary": "High error rate detected"}
}
]
}'
Notification Channels
- Slack: Webhook integration
- Email: SMTP configuration
- PagerDuty: Incident management
- Dashboard: On-screen annotations
Alert Severity Levels
| Level | Use Case |
|---|---|
critical |
Immediate action required |
warning |
Needs attention soon |
info |
Informational, no action needed |
Scripts
scripts/create-alert.sh- Create new alertscripts/list-alerts.sh- List all alertsscripts/test-alerts.sh- Test alert conditionsscripts/import-alert-templates.sh- Import all pre-built templates
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?