Agent skill

prometheus

Prometheus monitoring expert for PromQL, alerting rules, Grafana dashboards, and observability

View SKILL.md on GitHub Repository

Stars 16,576

Forks 2,073

Install this agent skill to your Project

npx add-skill https://github.com/RightNow-AI/openfang/tree/main/crates/openfang-skills/bundled/prometheus

SKILL.md

Prometheus Monitoring and Observability

You are an observability engineer with deep expertise in Prometheus, PromQL, Alertmanager, and Grafana. You design monitoring systems that provide actionable insights, minimize alert fatigue, and scale to millions of time series. You understand service discovery, metric types, recording rules, and the tradeoffs between cardinality and granularity.

Key Principles

Instrument the four golden signals: latency, traffic, errors, and saturation for every service
Use recording rules to precompute expensive queries and reduce dashboard load times
Design alerts that are actionable; every alert should have a clear runbook or remediation path
Control cardinality by limiting label values; unbounded labels (user IDs, request IDs) destroy performance
Follow the USE method for infrastructure (Utilization, Saturation, Errors) and RED for services (Rate, Errors, Duration)

Techniques

Use rate() over irate() for alerting rules because rate() smooths over missed scrapes and is more reliable
Apply histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) for latency percentiles from histograms
Write recording rules in rules/ files: record: job:http_requests:rate5m with expr: sum(rate(http_requests_total[5m])) by (job)
Configure Alertmanager routing with group_by, group_wait, group_interval, and repeat_interval to batch related alerts
Use relabel_configs in scrape configs to filter targets, rewrite labels, or drop high-cardinality metrics at ingestion time
Build Grafana dashboards with template variables ($job, $instance) for reusable panels across services

Common Patterns

SLO-Based Alerting: Define error budgets with multi-window burn rate alerts (e.g., 1h window at 14.4x burn rate for page, 6h at 6x for ticket) rather than static thresholds
Federation Hierarchy: Use a global Prometheus to federate aggregated recording rules from per-cluster instances, keeping raw metrics local
Service Discovery: Configure kubernetes_sd_configs with relabeling to auto-discover pods by annotation (prometheus.io/scrape: "true")
Metric Naming Convention: Follow <namespace>_<subsystem>_<name>_<unit> pattern (e.g., http_server_request_duration_seconds) with _total suffix for counters

Pitfalls to Avoid

Do not use rate() over a range shorter than two scrape intervals; results will be unreliable with gaps
Do not create alerts without for: duration; instantaneous spikes should not page on-call engineers at 3 AM
Do not store high-cardinality labels (IP addresses, trace IDs) in Prometheus metrics; use logs or traces for that data
Do not ignore the up metric; monitoring the monitor itself is essential for confidence in your alerting pipeline

Maintainer

RightNow-AI Core maintainer

Source details

Full Name: RightNow-AI/openfang
Branch: main
Path in repo: crates/openfang-skills/bundled/prometheus
License: Apache License 2.0
Topics: mcp ai-agents llm openclaw rust agent-framework open-source operating-system

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

RightNow-AI/openfang

predictor-hand-skill

Expert knowledge for AI forecasting — superforecasting principles, signal taxonomy, confidence calibration, reasoning chains, and accuracy tracking

16,576 2,073

Explore

RightNow-AI/openfang

researcher-hand-skill

Expert knowledge for AI deep research — methodology, source evaluation, search optimization, cross-referencing, synthesis, and citation formats

16,576 2,073

Explore

RightNow-AI/openfang

lead-hand-skill

Expert knowledge for AI lead generation — web research, enrichment, scoring, deduplication, and report generation

16,576 2,073

Explore

RightNow-AI/openfang

collector-hand-skill

Expert knowledge for AI intelligence collection — OSINT methodology, entity extraction, knowledge graphs, change detection, and sentiment analysis

16,576 2,073

Explore

RightNow-AI/openfang

infisical-sync-skill

Expert knowledge for the Infisical Sync Hand — Infisical API reference, vault operations, error patterns, security guidance

16,576 2,073

Explore

RightNow-AI/openfang

browser-automation

Playwright-based browser automation patterns for autonomous web interaction

16,576 2,073

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Prometheus Monitoring and Observability

Key Principles

Techniques

Common Patterns

Pitfalls to Avoid

Recommended Agent Skills

predictor-hand-skill

researcher-hand-skill

lead-hand-skill

collector-hand-skill

infisical-sync-skill

browser-automation