Agent skill
prometheus-metrics-specialist
Instrument services with Prometheus metrics and write PromQL queries. Guides HuleEdu naming conventions, metrics middleware setup, and business vs operational metrics. Integrates with Context7 for latest Prometheus documentation.
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/prometheus-metrics
SKILL.md
Prometheus Metrics Specialist
Compact skill for instrumenting HuleEdu services with Prometheus metrics and querying them with PromQL.
When to Use
Activate when the user:
- Needs to add metrics instrumentation to a service
- Wants to write PromQL queries for dashboards or alerts
- Asks about metrics naming conventions
- Needs help with metrics middleware setup
- Wants to understand business vs operational metrics
- Mentions Prometheus, PromQL, metrics, instrumentation, or monitoring
- Needs to expose
/metricsendpoint
I Need To...
Get Started (First Time)
- Learn Prometheus basics → fundamentals.md - Counter, Histogram, Gauge concepts
- Set up HuleEdu service → huleedu-patterns.md - metrics.py module, /metrics endpoint
- Pick an example → See "Add Metrics to Service" below
Add Metrics to Service
| I need to instrument... | Read this file |
|---|---|
| HTTP endpoints | examples/http-and-kafka.md |
| Kafka event processing | examples/http-and-kafka.md |
| LLM API calls | examples/llm-and-batch.md |
| Batch processing | examples/llm-and-batch.md |
| Database queries | examples/database-and-business.md |
| Business logic | examples/database-and-business.md |
Choose Naming Pattern
→ huleedu-patterns.md § Naming Patterns
Write PromQL Queries
| I need to... | Read this section |
|---|---|
| Learn PromQL syntax | promql-guide.md § Basics |
| Calculate error rate | promql-guide.md § Error Rates |
| Get P95 latency | promql-guide.md § Percentiles |
| Troubleshoot issues | promql-guide.md § Troubleshooting |
Reference
| What I need to know | Where to find it |
|---|---|
| Metric types (Counter/Histogram/Gauge) | fundamentals.md § Metric Types |
| Label design principles | fundamentals.md § Labels |
| Histogram bucket design | fundamentals.md § Buckets |
| HuleEdu architecture | huleedu-patterns.md § Architecture |
Quick Metric Type Cheatsheet
python
# Counter: Events that only increase (resets to 0 on restart)
operations_total = Counter("spell_checker_operations_total", "desc", ["status"])
operations_total.labels(status="success").inc()
# Histogram: Distributions (latency, sizes, scores)
duration_seconds = Histogram("spell_checker_duration_seconds", "desc")
duration_seconds.observe(0.23) # 230ms
# Gauge: Point-in-time values that go up/down
active_connections = Gauge("spell_checker_active_connections", "desc")
active_connections.set(15)
active_connections.inc() # 16
HuleEdu Naming Cheatsheet
python
# Pattern 1: Service-Prefixed (most common, operational metrics)
spell_checker_operations_total
spell_checker_http_request_duration_seconds
# Pattern 2: Business Metrics (cross-service, business logic)
huleedu_llm_prompt_tokens_total
huleedu_spellcheck_corrections_made
# Pattern 3: Legacy (being phased out)
request_count
Decision: Service-specific metric? → Pattern 1. Cross-service business metric? → Pattern 2.
Core Capabilities
- Naming Conventions: 3 patterns (service-prefixed, business, legacy)
- Instrumentation: Counter, Histogram, Gauge, Summary usage
- Middleware Setup: Standard HTTP metrics middleware
- Service-Specific Metrics: Business logic patterns
- PromQL Queries: Troubleshooting and dashboard queries
- Scrape Configuration: Prometheus targets and discovery
- Best Practices: Metric types, cardinality, aggregation
- Context7 Integration: Latest Prometheus/PromQL docs
Documentation Structure
- fundamentals.md: Universal Prometheus concepts + prometheus_client API
- huleedu-patterns.md: HuleEdu naming, architecture, setup patterns
- promql-guide.md: PromQL syntax + HuleEdu query patterns
- examples/: Real-world instrumentation examples
Didn't find tool you were looking for?