Agent skill
grafana-generator
Reads the entire codebase, detects Java/Spring Boot, Rust/Actix+Axum, Python/Django, or Go/Gin, discovers all HTTP endpoints, and generates a working Grafana + Prometheus + node_exporter observability stack with podman-compose. Dashboard includes machine metrics (CPU, memory, disk, I/O, network) and application metrics (request counters, error counters, latency p50/p70/p90/p95/p99/p99.9 per endpoint).
Install this agent skill to your Project
npx add-skill https://github.com/diegopacheco/ai-playground/tree/main/pocs/agent-skill-grafana-generator
SKILL.md
Grafana Dashboard Generator
You are a Grafana observability stack generator agent. When invoked, you scan the entire codebase to detect the stack, discover all HTTP endpoints, and generate a complete working Grafana + Prometheus + node_exporter setup with podman-compose. The generated dashboard has machine-level metrics and per-endpoint application metrics.
Global Context
- User request: $ARGUMENTS
- Output locations:
grafana/,prometheus/,podman-compose.yaml,run-grafana.sh,stop-grafana.sh,test-grafana.shat project root - Output summary: printed to user at the end
Rules
- Read-only scan of the codebase, only write generated files.
- Do not add comments to any generated files.
- Do not start or manage the user's application. Only generate the observability stack.
- Use podman-compose, never docker-compose.
- Sleep values in scripts never exceed 1 second.
- Use loops with condition checks for readiness, never fixed sleeps.
- Compact formatting in all generated files — no unnecessary blank lines.
- All generated JSON must be valid.
- All PromQL queries must use the correct metric names for the detected stack.
Supported Stacks
Only these stacks are supported:
| Language | Versions | Framework | Build File | Metrics Library |
|---|---|---|---|---|
| Java | 8, 11, 17, 19, 21, 25+ | Spring Boot | pom.xml / build.gradle | Micrometer + Prometheus registry |
| Rust | stable | Tokio + Actix-web | Cargo.toml | actix-web-prom |
| Rust | stable | Tokio + Axum | Cargo.toml | axum-prometheus or metrics-exporter-prometheus |
| Python | 3.x | Django | requirements.txt / pyproject.toml | django-prometheus |
| Go | 1.x | Gin Gonic | go.mod | prometheus/client_golang + gin middleware |
If the detected stack does not match any of these, stop and tell the user:
Unsupported stack. This skill supports:
- Java 8/11/17/19/21/25+ with Spring Boot
- Rust with Tokio + Actix-web or Tokio + Axum
- Python 3.x with Django
- Go with Gin Gonic
Step 1 — Detect the Stack
Use Glob to find build files at project root and subdirectories:
**/pom.xml,**/build.gradle,**/build.gradle.kts→ Java. Read file to confirmspring-boot-starter-web. Read Java version from<java.version>,sourceCompatibility, orreleaseproperty.**/Cargo.toml→ Rust. Read file to check foractix-web+tokioORaxum+tokioin dependencies.**/requirements.txt,**/pyproject.toml→ Python. Check fordjangoin dependencies.**/go.mod→ Go. Check forgithub.com/gin-gonic/ginin requires.
Skip these paths: node_modules/, vendor/, target/, .git/, dist/, build/, __pycache__/, grafana/, prometheus/.
Store the detected stack as one of: spring-boot, actix-web, axum, django, gin.
Step 2 — Detect the Application Port
| Stack | Where to Find |
|---|---|
| spring-boot | application.properties → server.port or application.yml → server.port (default: 8080) |
| actix-web | Search **/*.rs for HttpServer::new then .bind("...:<port>") (default: 8080) |
| axum | Search **/*.rs for TcpListener::bind("...:<port>") (default: 3000) |
| django | Search settings.py or manage.py for port (default: 8000) |
| gin | Search **/*.go for .Run(":<port>") or .Run(":port") (default: 8080) |
Store the detected port for prometheus.yml generation.
Step 3 — Detect the Metrics Endpoint Path
| Stack | Metrics Path |
|---|---|
| spring-boot | /actuator/prometheus |
| actix-web | /metrics |
| axum | /metrics |
| django | /metrics |
| gin | /metrics |
For Spring Boot, also check that both spring-boot-starter-actuator AND micrometer-registry-prometheus are in the dependencies.
Step 4 — Check Metrics Library Presence
Check if the required metrics dependencies exist:
| Stack | Required Dependencies |
|---|---|
| spring-boot | spring-boot-starter-actuator + micrometer-registry-prometheus in pom.xml/build.gradle |
| actix-web | actix-web-prom in Cargo.toml |
| axum | axum-prometheus or metrics-exporter-prometheus in Cargo.toml |
| django | django-prometheus in requirements.txt/pyproject.toml |
| gin | github.com/prometheus/client_golang in go.mod |
If missing, print a warning with the exact dependency to add:
For spring-boot (Maven):
WARNING: Missing metrics dependencies. Add to pom.xml:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
And add to application.properties:
management.endpoints.web.exposure.include=prometheus,health
management.metrics.distribution.percentiles-histogram.http.server.requests=true
management.metrics.distribution.minimum-expected-value.http.server.requests=1ms
management.metrics.distribution.maximum-expected-value.http.server.requests=10s
For spring-boot (Gradle):
WARNING: Missing metrics dependencies. Add to build.gradle:
implementation 'org.springframework.boot:spring-boot-starter-actuator'
implementation 'io.micrometer:micrometer-registry-prometheus'
And add to application.properties:
management.endpoints.web.exposure.include=prometheus,health
management.metrics.distribution.percentiles-histogram.http.server.requests=true
management.metrics.distribution.minimum-expected-value.http.server.requests=1ms
management.metrics.distribution.maximum-expected-value.http.server.requests=10s
For actix-web:
WARNING: Missing metrics dependency. Add to Cargo.toml [dependencies]:
actix-web-prom = "0.8"
For axum:
WARNING: Missing metrics dependency. Add to Cargo.toml [dependencies]:
axum-prometheus = "0.7"
For django:
WARNING: Missing metrics dependency. Run:
pip install django-prometheus
Add to settings.py INSTALLED_APPS:
'django_prometheus'
Add to settings.py MIDDLEWARE (first and last):
'django_prometheus.middleware.PrometheusBeforeMiddleware' (first)
'django_prometheus.middleware.PrometheusAfterMiddleware' (last)
Add to urls.py:
path('', include('django_prometheus.urls'))
For gin:
WARNING: Missing metrics dependency. Run:
go get github.com/prometheus/client_golang/prometheus
go get github.com/prometheus/client_golang/prometheus/promhttp
Add to your main.go:
import "github.com/prometheus/client_golang/prometheus/promhttp"
router.GET("/metrics", gin.WrapH(promhttp.Handler()))
Also check histogram bucket granularity. For all stacks, print:
NOTE: Default histogram buckets may be too coarse for p99.9 accuracy.
Consider configuring finer buckets in your metrics setup.
For Spring Boot specifically, histogram buckets are DISABLED by default. The http_server_requests_seconds_bucket metric will NOT exist without explicit configuration. The skill MUST always include these properties in the warning, even if the dependencies already exist:
management.metrics.distribution.percentiles-histogram.http.server.requests=true
management.metrics.distribution.minimum-expected-value.http.server.requests=1ms
management.metrics.distribution.maximum-expected-value.http.server.requests=10s
Without these properties, the Latency Percentiles and Latency Heatmap panels will show "No data".
Step 5 — Discover All HTTP Endpoints
Use Grep and Read to find all endpoint definitions.
Spring Boot
Search **/*.java for these patterns:
@GetMapping@PostMapping@PutMapping@DeleteMapping@PatchMapping@RequestMapping
For each controller class:
- Find
@RequestMappingat class level for path prefix - Find all method-level mappings
- Combine class prefix + method path
- Store: HTTP method + full path
Actix-web
Search **/*.rs for:
#[get("...")]#[post("...")]#[put("...")]#[delete("...")]web::resource("...").route(web::get()...)web::scope("...")
For each endpoint:
- Extract path from macro or resource definition
- Resolve scope prefixes
- Store: HTTP method + full path
Axum
Search **/*.rs for:
.route("...", get(...)).route("...", post(...)).route("...", put(...)).route("...", delete(...)).route("...", patch(...)).nest("...", ...)
For each endpoint:
- Extract path from .route() first argument
- Resolve .nest() prefixes
- Store: HTTP method + full path
Django
Search **/*.py for:
path("...", ...)re_path("...", ...)urlpatternsinclude("...")
For each endpoint:
- Read urls.py files
- Follow include() references
- Resolve full path
- Store: HTTP method (from view) + full path
Gin Gonic
Search **/*.go for:
router.GET("...", ...)router.POST("...", ...)router.PUT("...", ...)router.DELETE("...", ...)router.PATCH("...", ...).GET("...", ...).POST("...", ...).Group("...")
For each endpoint:
- Extract path from route registration
- Resolve Group() prefixes
- Store: HTTP method + full path
Store all discovered endpoints as a list of {method, path} pairs for dashboard generation.
Step 6 — Metric Names per Stack
Use these metric names in the dashboard PromQL queries:
Request Counter
| Stack | Metric | Labels |
|---|---|---|
| spring-boot | http_server_requests_seconds_count |
method, uri, status |
| actix-web | http_requests_total |
method, path, status |
| axum | http_requests_total |
method, path, status |
| django | django_http_requests_total_by_view_transport_method |
view, method, transport |
| gin | gin_requests_total |
method, path, status |
Error Counter
Derived from request counter filtered by status label:
status=~"5.."for server errorsstatus=~"4.."for client errors
For Django, use django_http_responses_total_by_status with status=~"5..".
Latency Histogram
| Stack | Metric | Labels |
|---|---|---|
| spring-boot | http_server_requests_seconds_bucket |
method, uri, status, le |
| actix-web | http_requests_duration_seconds_bucket |
method, path, le |
| axum | http_requests_duration_seconds_bucket |
method, path, le |
| django | django_http_requests_latency_seconds_by_view_method_bucket |
view, method, le |
| gin | gin_request_duration_seconds_bucket |
method, path, le |
Percentiles via histogram_quantile():
- p50:
histogram_quantile(0.5, ...) - p70:
histogram_quantile(0.7, ...) - p90:
histogram_quantile(0.9, ...) - p95:
histogram_quantile(0.95, ...) - p99:
histogram_quantile(0.99, ...) - p99.9:
histogram_quantile(0.999, ...)
Path Label
The label used to filter by endpoint differs per stack:
- spring-boot:
uri - actix-web:
path - axum:
path - django:
view - gin:
path
Step 7 — Generate prometheus.yml
Write to prometheus/prometheus.yml:
global:
scrape_interval: 5s
evaluation_interval: 5s
scrape_configs:
- job_name: 'app'
metrics_path: '{METRICS_PATH}'
static_configs:
- targets: ['host.containers.internal:{APP_PORT}']
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
Replace {METRICS_PATH} with the detected metrics endpoint path (Step 3).
Replace {APP_PORT} with the detected application port (Step 2).
Step 8 — Generate Grafana Provisioning
datasources.yaml
Write to grafana/provisioning/datasources/datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: true
dashboards.yaml
Write to grafana/provisioning/dashboards/dashboards.yaml:
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: false
Step 9 — Generate dashboard.json
Write to grafana/dashboards/dashboard.json.
The dashboard UID must be deterministic: use the project directory name lowercased with dashes.
The dashboard has two collapsible row panels: "Machine Metrics" and "Application Metrics".
Machine Metrics Row Panels
Use node_exporter metrics. All panels use datasource variable ${DS_PROMETHEUS}.
Panel 1 — CPU Usage %:
- Type: timeseries
- Width: 8, Height: 8
- PromQL:
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100) - Unit: percent
Panel 2 — Memory Usage %:
- Type: timeseries
- Width: 8, Height: 8
- PromQL:
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 - Unit: percent
Panel 3 — Memory Used:
- Type: timeseries
- Width: 8, Height: 8
- PromQL:
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes - Unit: bytes
Panel 4 — Disk Usage %:
- Type: gauge
- Width: 6, Height: 8
- PromQL:
(1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 - Unit: percent
- Thresholds: 0=green, 70=yellow, 85=red
Panel 5 — Disk I/O:
- Type: timeseries
- Width: 9, Height: 8
- Two queries:
- A:
rate(node_disk_read_bytes_total[1m])legend "Read" - B:
rate(node_disk_written_bytes_total[1m])legend "Write"
- A:
- Unit: bytes/sec
Panel 6 — Disk IOPS:
- Type: timeseries
- Width: 9, Height: 8
- Two queries:
- A:
rate(node_disk_reads_completed_total[1m])legend "Reads" - B:
rate(node_disk_writes_completed_total[1m])legend "Writes"
- A:
- Unit: iops
Panel 7 — Network In:
- Type: timeseries
- Width: 12, Height: 8
- PromQL:
rate(node_network_receive_bytes_total{device!="lo"}[1m]) - Unit: bytes/sec
Panel 8 — Network Out:
- Type: timeseries
- Width: 12, Height: 8
- PromQL:
rate(node_network_transmit_bytes_total{device!="lo"}[1m]) - Unit: bytes/sec
Application Metrics Row Panels
Use the stack-specific metric names from Step 6. Replace {REQUEST_COUNT}, {LATENCY_BUCKET}, {PATH_LABEL} with the correct values for the detected stack.
Panel 9 — Total Requests/sec:
- Type: stat
- Width: 6, Height: 4
- PromQL:
sum(rate({REQUEST_COUNT}[1m])) - Unit: reqps
Panel 10 — Total Errors/sec:
- Type: stat
- Width: 6, Height: 4
- PromQL:
sum(rate({REQUEST_COUNT}{status=~"5.."}[1m]))(use appropriate status filter for Django) - Unit: reqps
- Color: red
Panel 11 — Error Rate %:
- Type: timeseries
- Width: 12, Height: 8
- PromQL:
sum(rate({REQUEST_COUNT}{status=~"5.."}[1m])) / sum(rate({REQUEST_COUNT}[1m])) * 100 - Unit: percent
Panel 12 — Requests/sec per Endpoint:
- Type: timeseries
- Width: 12, Height: 8
- PromQL:
sum by ({PATH_LABEL}, method) (rate({REQUEST_COUNT}[1m])) - Legend:
{{method}} {{{PATH_LABEL}}}
Panel 13 — Errors/sec per Endpoint:
- Type: timeseries
- Width: 12, Height: 8
- PromQL:
sum by ({PATH_LABEL}, method) (rate({REQUEST_COUNT}{status=~"5.."}[1m])) - Legend:
{{method}} {{{PATH_LABEL}}}
Panel 14 — Latency Percentiles (all overlaid):
- Type: timeseries
- Width: 24, Height: 10
- Six queries, one per percentile:
- A:
histogram_quantile(0.5, sum by (le) (rate({LATENCY_BUCKET}[1m])))legend "p50" - B:
histogram_quantile(0.7, sum by (le) (rate({LATENCY_BUCKET}[1m])))legend "p70" - C:
histogram_quantile(0.9, sum by (le) (rate({LATENCY_BUCKET}[1m])))legend "p90" - D:
histogram_quantile(0.95, sum by (le) (rate({LATENCY_BUCKET}[1m])))legend "p95" - E:
histogram_quantile(0.99, sum by (le) (rate({LATENCY_BUCKET}[1m])))legend "p99" - F:
histogram_quantile(0.999, sum by (le) (rate({LATENCY_BUCKET}[1m])))legend "p99.9"
- A:
- Unit: seconds
Panel 15 — Latency Heatmap:
- Type: heatmap
- Width: 24, Height: 8
- PromQL:
sum(increase({LATENCY_BUCKET}[1m])) by (le) - Format: heatmap
Dashboard JSON Template Structure
The dashboard.json must follow Grafana's native schema. Key fields:
{
"uid": "{PROJECT_NAME}-grafana",
"title": "{PROJECT_NAME} Observability",
"tags": ["{STACK}", "auto-generated"],
"timezone": "browser",
"refresh": "10s",
"time": { "from": "now-15m", "to": "now" },
"templating": {
"list": [{
"name": "DS_PROMETHEUS",
"type": "datasource",
"query": "prometheus",
"current": { "text": "Prometheus", "value": "Prometheus" }
}]
},
"panels": [ ... all panels with gridPos ... ],
"schemaVersion": 39,
"version": 1
}
Each panel needs a proper gridPos with x, y, w, h values. Layout panels top to bottom:
- Row 1 header at y=0
- Machine panels starting at y=1
- Row 2 header after machine panels
- Application panels after row 2 header
Use the panel widths defined above (Grafana grid is 24 columns wide).
Step 10 — Generate podman-compose.yaml
Write to podman-compose.yaml at project root:
version: "3.8"
services:
grafana:
image: docker.io/grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer
volumes:
- ./grafana/provisioning:/etc/grafana/provisioning
- ./grafana/dashboards:/var/lib/grafana/dashboards
depends_on:
- prometheus
prometheus:
image: docker.io/prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
extra_hosts:
- "host.containers.internal:host-gateway"
node-exporter:
image: docker.io/prom/node-exporter:latest
ports:
- "9100:9100"
pid: "host"
Step 11 — Generate run-grafana.sh
Write to run-grafana.sh at project root:
#!/bin/bash
podman-compose up -d
echo "Waiting for Grafana to start..."
while true; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/health 2>/dev/null)
if [ "$STATUS" = "200" ]; then
break
fi
sleep 1
done
echo "Grafana is ready at http://localhost:3000"
echo "Prometheus is ready at http://localhost:9090"
echo "Your app must be running on port {APP_PORT} with metrics at {METRICS_PATH}"
Replace {APP_PORT} and {METRICS_PATH} with detected values.
Step 12 — Generate stop-grafana.sh
Write to stop-grafana.sh at project root:
#!/bin/bash
podman-compose down
echo "Grafana stack stopped"
Step 13 — Generate test-grafana.sh
Write to test-grafana.sh at project root:
#!/bin/bash
PASS=0
FAIL=0
check() {
if [ "$1" = "200" ]; then
echo "PASS: $2"
PASS=$((PASS + 1))
else
echo "FAIL: $2 (got $1)"
FAIL=$((FAIL + 1))
fi
}
GRAFANA=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/health 2>/dev/null)
check "$GRAFANA" "Grafana health"
PROM=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:9090/-/healthy 2>/dev/null)
check "$PROM" "Prometheus health"
DASH=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/dashboards/uid/{DASHBOARD_UID} 2>/dev/null)
check "$DASH" "Dashboard loaded"
DS=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/datasources 2>/dev/null)
check "$DS" "Datasources configured"
echo ""
echo "Results: $PASS passed, $FAIL failed"
Replace {DASHBOARD_UID} with the deterministic UID from the dashboard.
Step 14 — Make Scripts Executable
Use Bash to run:
chmod +x run-grafana.sh stop-grafana.sh test-grafana.sh
Step 15 — Validation
- Validate dashboard.json is valid JSON using:
python3 -m json.tool grafana/dashboards/dashboard.json > /dev/null 2>&1 - Verify all generated files exist
- Print any validation errors
Step 16 — Output Summary
Print a summary to the user:
Grafana Dashboard Generated
=============================
Stack: {DETECTED_STACK}
App Port: {APP_PORT}
Metrics: {METRICS_PATH}
Endpoints: {ENDPOINT_COUNT} discovered
Machine Metrics Panels:
- CPU Usage %
- Memory Usage %
- Memory Used
- Disk Usage %
- Disk I/O (read/write)
- Disk IOPS
- Network In
- Network Out
Application Metrics Panels:
- Total Requests/sec
- Total Errors/sec
- Error Rate %
- Requests/sec per Endpoint
- Errors/sec per Endpoint
- Latency Percentiles (p50, p70, p90, p95, p99, p99.9)
- Latency Heatmap
Discovered Endpoints:
{METHOD} {PATH}
{METHOD} {PATH}
...
Files Generated:
grafana/dashboards/dashboard.json
grafana/provisioning/datasources/datasources.yaml
grafana/provisioning/dashboards/dashboards.yaml
prometheus/prometheus.yml
podman-compose.yaml
run-grafana.sh
stop-grafana.sh
test-grafana.sh
To start:
1. Start your application on port {APP_PORT}
2. Run: ./run-grafana.sh
3. Open: http://localhost:3000
To stop:
./stop-grafana.sh
To verify:
./test-grafana.sh
If metrics library warnings were printed in Step 4, remind the user at the end:
REMINDER: Add the missing metrics dependencies listed above before starting your app.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
json-formatter
Validate, format, and minify JSON files when users request JSON validation, formatting, or ask to validate their JSONs
bruno-generator
Scans the entire codebase, detects all HTTP/API endpoints across Java/Spring Boot, Node/Express, Go/Gin, Rust/Actix+Axum, Python/Django, and generates a complete Bruno API client project with .bru files, sample requests, and environments.
infra-automation-generator
leak-detect
Scan code for leaked PII, secrets/credentials, and security vulnerabilities that would get you hacked in production.
skill-evaluator
This skill should be used when the user asks to "evaluate a skill", "review skill quality", "score my skill", "check skill best practices", "rate my skills", "evaluate all skills", "compare skills", or wants to assess skill quality across criteria like clarity, token efficiency, anti-cheating, quality gates, determinism, scope discipline, error recovery, observability, and idempotency.
metrics-report
Scan an entire codebase, discover and run all test types, compute hybrid coverage, evaluate quality, and generate a full metrics report website with trends and charts.
Didn't find tool you were looking for?