Agent skill

grafana-generator

Reads the entire codebase, detects Java/Spring Boot, Rust/Actix+Axum, Python/Django, or Go/Gin, discovers all HTTP endpoints, and generates a working Grafana + Prometheus + node_exporter observability stack with podman-compose. Dashboard includes machine metrics (CPU, memory, disk, I/O, network) and application metrics (request counters, error counters, latency p50/p70/p90/p95/p99/p99.9 per endpoint).

Stars 15
Forks 7

Install this agent skill to your Project

npx add-skill https://github.com/diegopacheco/ai-playground/tree/main/pocs/agent-skill-grafana-generator

SKILL.md

Grafana Dashboard Generator

You are a Grafana observability stack generator agent. When invoked, you scan the entire codebase to detect the stack, discover all HTTP endpoints, and generate a complete working Grafana + Prometheus + node_exporter setup with podman-compose. The generated dashboard has machine-level metrics and per-endpoint application metrics.

Global Context

  • User request: $ARGUMENTS
  • Output locations: grafana/, prometheus/, podman-compose.yaml, run-grafana.sh, stop-grafana.sh, test-grafana.sh at project root
  • Output summary: printed to user at the end

Rules

  • Read-only scan of the codebase, only write generated files.
  • Do not add comments to any generated files.
  • Do not start or manage the user's application. Only generate the observability stack.
  • Use podman-compose, never docker-compose.
  • Sleep values in scripts never exceed 1 second.
  • Use loops with condition checks for readiness, never fixed sleeps.
  • Compact formatting in all generated files — no unnecessary blank lines.
  • All generated JSON must be valid.
  • All PromQL queries must use the correct metric names for the detected stack.

Supported Stacks

Only these stacks are supported:

Language Versions Framework Build File Metrics Library
Java 8, 11, 17, 19, 21, 25+ Spring Boot pom.xml / build.gradle Micrometer + Prometheus registry
Rust stable Tokio + Actix-web Cargo.toml actix-web-prom
Rust stable Tokio + Axum Cargo.toml axum-prometheus or metrics-exporter-prometheus
Python 3.x Django requirements.txt / pyproject.toml django-prometheus
Go 1.x Gin Gonic go.mod prometheus/client_golang + gin middleware

If the detected stack does not match any of these, stop and tell the user:

Unsupported stack. This skill supports:
- Java 8/11/17/19/21/25+ with Spring Boot
- Rust with Tokio + Actix-web or Tokio + Axum
- Python 3.x with Django
- Go with Gin Gonic

Step 1 — Detect the Stack

Use Glob to find build files at project root and subdirectories:

  1. **/pom.xml, **/build.gradle, **/build.gradle.kts → Java. Read file to confirm spring-boot-starter-web. Read Java version from <java.version>, sourceCompatibility, or release property.
  2. **/Cargo.toml → Rust. Read file to check for actix-web + tokio OR axum + tokio in dependencies.
  3. **/requirements.txt, **/pyproject.toml → Python. Check for django in dependencies.
  4. **/go.mod → Go. Check for github.com/gin-gonic/gin in requires.

Skip these paths: node_modules/, vendor/, target/, .git/, dist/, build/, __pycache__/, grafana/, prometheus/.

Store the detected stack as one of: spring-boot, actix-web, axum, django, gin.

Step 2 — Detect the Application Port

Stack Where to Find
spring-boot application.propertiesserver.port or application.ymlserver.port (default: 8080)
actix-web Search **/*.rs for HttpServer::new then .bind("...:<port>") (default: 8080)
axum Search **/*.rs for TcpListener::bind("...:<port>") (default: 3000)
django Search settings.py or manage.py for port (default: 8000)
gin Search **/*.go for .Run(":<port>") or .Run(":port") (default: 8080)

Store the detected port for prometheus.yml generation.

Step 3 — Detect the Metrics Endpoint Path

Stack Metrics Path
spring-boot /actuator/prometheus
actix-web /metrics
axum /metrics
django /metrics
gin /metrics

For Spring Boot, also check that both spring-boot-starter-actuator AND micrometer-registry-prometheus are in the dependencies.

Step 4 — Check Metrics Library Presence

Check if the required metrics dependencies exist:

Stack Required Dependencies
spring-boot spring-boot-starter-actuator + micrometer-registry-prometheus in pom.xml/build.gradle
actix-web actix-web-prom in Cargo.toml
axum axum-prometheus or metrics-exporter-prometheus in Cargo.toml
django django-prometheus in requirements.txt/pyproject.toml
gin github.com/prometheus/client_golang in go.mod

If missing, print a warning with the exact dependency to add:

For spring-boot (Maven):

WARNING: Missing metrics dependencies. Add to pom.xml:
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

And add to application.properties:
management.endpoints.web.exposure.include=prometheus,health
management.metrics.distribution.percentiles-histogram.http.server.requests=true
management.metrics.distribution.minimum-expected-value.http.server.requests=1ms
management.metrics.distribution.maximum-expected-value.http.server.requests=10s

For spring-boot (Gradle):

WARNING: Missing metrics dependencies. Add to build.gradle:
implementation 'org.springframework.boot:spring-boot-starter-actuator'
implementation 'io.micrometer:micrometer-registry-prometheus'

And add to application.properties:
management.endpoints.web.exposure.include=prometheus,health
management.metrics.distribution.percentiles-histogram.http.server.requests=true
management.metrics.distribution.minimum-expected-value.http.server.requests=1ms
management.metrics.distribution.maximum-expected-value.http.server.requests=10s

For actix-web:

WARNING: Missing metrics dependency. Add to Cargo.toml [dependencies]:
actix-web-prom = "0.8"

For axum:

WARNING: Missing metrics dependency. Add to Cargo.toml [dependencies]:
axum-prometheus = "0.7"

For django:

WARNING: Missing metrics dependency. Run:
pip install django-prometheus

Add to settings.py INSTALLED_APPS:
'django_prometheus'

Add to settings.py MIDDLEWARE (first and last):
'django_prometheus.middleware.PrometheusBeforeMiddleware' (first)
'django_prometheus.middleware.PrometheusAfterMiddleware' (last)

Add to urls.py:
path('', include('django_prometheus.urls'))

For gin:

WARNING: Missing metrics dependency. Run:
go get github.com/prometheus/client_golang/prometheus
go get github.com/prometheus/client_golang/prometheus/promhttp

Add to your main.go:
import "github.com/prometheus/client_golang/prometheus/promhttp"
router.GET("/metrics", gin.WrapH(promhttp.Handler()))

Also check histogram bucket granularity. For all stacks, print:

NOTE: Default histogram buckets may be too coarse for p99.9 accuracy.
Consider configuring finer buckets in your metrics setup.

For Spring Boot specifically, histogram buckets are DISABLED by default. The http_server_requests_seconds_bucket metric will NOT exist without explicit configuration. The skill MUST always include these properties in the warning, even if the dependencies already exist:

management.metrics.distribution.percentiles-histogram.http.server.requests=true
management.metrics.distribution.minimum-expected-value.http.server.requests=1ms
management.metrics.distribution.maximum-expected-value.http.server.requests=10s

Without these properties, the Latency Percentiles and Latency Heatmap panels will show "No data".

Step 5 — Discover All HTTP Endpoints

Use Grep and Read to find all endpoint definitions.

Spring Boot

Search **/*.java for these patterns:

  • @GetMapping
  • @PostMapping
  • @PutMapping
  • @DeleteMapping
  • @PatchMapping
  • @RequestMapping

For each controller class:

  1. Find @RequestMapping at class level for path prefix
  2. Find all method-level mappings
  3. Combine class prefix + method path
  4. Store: HTTP method + full path

Actix-web

Search **/*.rs for:

  • #[get("...")]
  • #[post("...")]
  • #[put("...")]
  • #[delete("...")]
  • web::resource("...").route(web::get()...)
  • web::scope("...")

For each endpoint:

  1. Extract path from macro or resource definition
  2. Resolve scope prefixes
  3. Store: HTTP method + full path

Axum

Search **/*.rs for:

  • .route("...", get(...))
  • .route("...", post(...))
  • .route("...", put(...))
  • .route("...", delete(...))
  • .route("...", patch(...))
  • .nest("...", ...)

For each endpoint:

  1. Extract path from .route() first argument
  2. Resolve .nest() prefixes
  3. Store: HTTP method + full path

Django

Search **/*.py for:

  • path("...", ...)
  • re_path("...", ...)
  • urlpatterns
  • include("...")

For each endpoint:

  1. Read urls.py files
  2. Follow include() references
  3. Resolve full path
  4. Store: HTTP method (from view) + full path

Gin Gonic

Search **/*.go for:

  • router.GET("...", ...)
  • router.POST("...", ...)
  • router.PUT("...", ...)
  • router.DELETE("...", ...)
  • router.PATCH("...", ...)
  • .GET("...", ...)
  • .POST("...", ...)
  • .Group("...")

For each endpoint:

  1. Extract path from route registration
  2. Resolve Group() prefixes
  3. Store: HTTP method + full path

Store all discovered endpoints as a list of {method, path} pairs for dashboard generation.

Step 6 — Metric Names per Stack

Use these metric names in the dashboard PromQL queries:

Request Counter

Stack Metric Labels
spring-boot http_server_requests_seconds_count method, uri, status
actix-web http_requests_total method, path, status
axum http_requests_total method, path, status
django django_http_requests_total_by_view_transport_method view, method, transport
gin gin_requests_total method, path, status

Error Counter

Derived from request counter filtered by status label:

  • status=~"5.." for server errors
  • status=~"4.." for client errors

For Django, use django_http_responses_total_by_status with status=~"5..".

Latency Histogram

Stack Metric Labels
spring-boot http_server_requests_seconds_bucket method, uri, status, le
actix-web http_requests_duration_seconds_bucket method, path, le
axum http_requests_duration_seconds_bucket method, path, le
django django_http_requests_latency_seconds_by_view_method_bucket view, method, le
gin gin_request_duration_seconds_bucket method, path, le

Percentiles via histogram_quantile():

  • p50: histogram_quantile(0.5, ...)
  • p70: histogram_quantile(0.7, ...)
  • p90: histogram_quantile(0.9, ...)
  • p95: histogram_quantile(0.95, ...)
  • p99: histogram_quantile(0.99, ...)
  • p99.9: histogram_quantile(0.999, ...)

Path Label

The label used to filter by endpoint differs per stack:

  • spring-boot: uri
  • actix-web: path
  • axum: path
  • django: view
  • gin: path

Step 7 — Generate prometheus.yml

Write to prometheus/prometheus.yml:

yaml
global:
  scrape_interval: 5s
  evaluation_interval: 5s
scrape_configs:
  - job_name: 'app'
    metrics_path: '{METRICS_PATH}'
    static_configs:
      - targets: ['host.containers.internal:{APP_PORT}']
  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

Replace {METRICS_PATH} with the detected metrics endpoint path (Step 3). Replace {APP_PORT} with the detected application port (Step 2).

Step 8 — Generate Grafana Provisioning

datasources.yaml

Write to grafana/provisioning/datasources/datasources.yaml:

yaml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: true

dashboards.yaml

Write to grafana/provisioning/dashboards/dashboards.yaml:

yaml
apiVersion: 1
providers:
  - name: 'default'
    orgId: 1
    folder: ''
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /var/lib/grafana/dashboards
      foldersFromFilesStructure: false

Step 9 — Generate dashboard.json

Write to grafana/dashboards/dashboard.json.

The dashboard UID must be deterministic: use the project directory name lowercased with dashes.

The dashboard has two collapsible row panels: "Machine Metrics" and "Application Metrics".

Machine Metrics Row Panels

Use node_exporter metrics. All panels use datasource variable ${DS_PROMETHEUS}.

Panel 1 — CPU Usage %:

  • Type: timeseries
  • Width: 8, Height: 8
  • PromQL: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)
  • Unit: percent

Panel 2 — Memory Usage %:

  • Type: timeseries
  • Width: 8, Height: 8
  • PromQL: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
  • Unit: percent

Panel 3 — Memory Used:

  • Type: timeseries
  • Width: 8, Height: 8
  • PromQL: node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
  • Unit: bytes

Panel 4 — Disk Usage %:

  • Type: gauge
  • Width: 6, Height: 8
  • PromQL: (1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100
  • Unit: percent
  • Thresholds: 0=green, 70=yellow, 85=red

Panel 5 — Disk I/O:

  • Type: timeseries
  • Width: 9, Height: 8
  • Two queries:
    • A: rate(node_disk_read_bytes_total[1m]) legend "Read"
    • B: rate(node_disk_written_bytes_total[1m]) legend "Write"
  • Unit: bytes/sec

Panel 6 — Disk IOPS:

  • Type: timeseries
  • Width: 9, Height: 8
  • Two queries:
    • A: rate(node_disk_reads_completed_total[1m]) legend "Reads"
    • B: rate(node_disk_writes_completed_total[1m]) legend "Writes"
  • Unit: iops

Panel 7 — Network In:

  • Type: timeseries
  • Width: 12, Height: 8
  • PromQL: rate(node_network_receive_bytes_total{device!="lo"}[1m])
  • Unit: bytes/sec

Panel 8 — Network Out:

  • Type: timeseries
  • Width: 12, Height: 8
  • PromQL: rate(node_network_transmit_bytes_total{device!="lo"}[1m])
  • Unit: bytes/sec

Application Metrics Row Panels

Use the stack-specific metric names from Step 6. Replace {REQUEST_COUNT}, {LATENCY_BUCKET}, {PATH_LABEL} with the correct values for the detected stack.

Panel 9 — Total Requests/sec:

  • Type: stat
  • Width: 6, Height: 4
  • PromQL: sum(rate({REQUEST_COUNT}[1m]))
  • Unit: reqps

Panel 10 — Total Errors/sec:

  • Type: stat
  • Width: 6, Height: 4
  • PromQL: sum(rate({REQUEST_COUNT}{status=~"5.."}[1m])) (use appropriate status filter for Django)
  • Unit: reqps
  • Color: red

Panel 11 — Error Rate %:

  • Type: timeseries
  • Width: 12, Height: 8
  • PromQL: sum(rate({REQUEST_COUNT}{status=~"5.."}[1m])) / sum(rate({REQUEST_COUNT}[1m])) * 100
  • Unit: percent

Panel 12 — Requests/sec per Endpoint:

  • Type: timeseries
  • Width: 12, Height: 8
  • PromQL: sum by ({PATH_LABEL}, method) (rate({REQUEST_COUNT}[1m]))
  • Legend: {{method}} {{{PATH_LABEL}}}

Panel 13 — Errors/sec per Endpoint:

  • Type: timeseries
  • Width: 12, Height: 8
  • PromQL: sum by ({PATH_LABEL}, method) (rate({REQUEST_COUNT}{status=~"5.."}[1m]))
  • Legend: {{method}} {{{PATH_LABEL}}}

Panel 14 — Latency Percentiles (all overlaid):

  • Type: timeseries
  • Width: 24, Height: 10
  • Six queries, one per percentile:
    • A: histogram_quantile(0.5, sum by (le) (rate({LATENCY_BUCKET}[1m]))) legend "p50"
    • B: histogram_quantile(0.7, sum by (le) (rate({LATENCY_BUCKET}[1m]))) legend "p70"
    • C: histogram_quantile(0.9, sum by (le) (rate({LATENCY_BUCKET}[1m]))) legend "p90"
    • D: histogram_quantile(0.95, sum by (le) (rate({LATENCY_BUCKET}[1m]))) legend "p95"
    • E: histogram_quantile(0.99, sum by (le) (rate({LATENCY_BUCKET}[1m]))) legend "p99"
    • F: histogram_quantile(0.999, sum by (le) (rate({LATENCY_BUCKET}[1m]))) legend "p99.9"
  • Unit: seconds

Panel 15 — Latency Heatmap:

  • Type: heatmap
  • Width: 24, Height: 8
  • PromQL: sum(increase({LATENCY_BUCKET}[1m])) by (le)
  • Format: heatmap

Dashboard JSON Template Structure

The dashboard.json must follow Grafana's native schema. Key fields:

json
{
  "uid": "{PROJECT_NAME}-grafana",
  "title": "{PROJECT_NAME} Observability",
  "tags": ["{STACK}", "auto-generated"],
  "timezone": "browser",
  "refresh": "10s",
  "time": { "from": "now-15m", "to": "now" },
  "templating": {
    "list": [{
      "name": "DS_PROMETHEUS",
      "type": "datasource",
      "query": "prometheus",
      "current": { "text": "Prometheus", "value": "Prometheus" }
    }]
  },
  "panels": [ ... all panels with gridPos ... ],
  "schemaVersion": 39,
  "version": 1
}

Each panel needs a proper gridPos with x, y, w, h values. Layout panels top to bottom:

  • Row 1 header at y=0
  • Machine panels starting at y=1
  • Row 2 header after machine panels
  • Application panels after row 2 header

Use the panel widths defined above (Grafana grid is 24 columns wide).

Step 10 — Generate podman-compose.yaml

Write to podman-compose.yaml at project root:

yaml
version: "3.8"
services:
  grafana:
    image: docker.io/grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning
      - ./grafana/dashboards:/var/lib/grafana/dashboards
    depends_on:
      - prometheus
  prometheus:
    image: docker.io/prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    extra_hosts:
      - "host.containers.internal:host-gateway"
  node-exporter:
    image: docker.io/prom/node-exporter:latest
    ports:
      - "9100:9100"
    pid: "host"

Step 11 — Generate run-grafana.sh

Write to run-grafana.sh at project root:

bash
#!/bin/bash
podman-compose up -d
echo "Waiting for Grafana to start..."
while true; do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/health 2>/dev/null)
  if [ "$STATUS" = "200" ]; then
    break
  fi
  sleep 1
done
echo "Grafana is ready at http://localhost:3000"
echo "Prometheus is ready at http://localhost:9090"
echo "Your app must be running on port {APP_PORT} with metrics at {METRICS_PATH}"

Replace {APP_PORT} and {METRICS_PATH} with detected values.

Step 12 — Generate stop-grafana.sh

Write to stop-grafana.sh at project root:

bash
#!/bin/bash
podman-compose down
echo "Grafana stack stopped"

Step 13 — Generate test-grafana.sh

Write to test-grafana.sh at project root:

bash
#!/bin/bash
PASS=0
FAIL=0
check() {
  if [ "$1" = "200" ]; then
    echo "PASS: $2"
    PASS=$((PASS + 1))
  else
    echo "FAIL: $2 (got $1)"
    FAIL=$((FAIL + 1))
  fi
}
GRAFANA=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/health 2>/dev/null)
check "$GRAFANA" "Grafana health"
PROM=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:9090/-/healthy 2>/dev/null)
check "$PROM" "Prometheus health"
DASH=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/dashboards/uid/{DASHBOARD_UID} 2>/dev/null)
check "$DASH" "Dashboard loaded"
DS=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/datasources 2>/dev/null)
check "$DS" "Datasources configured"
echo ""
echo "Results: $PASS passed, $FAIL failed"

Replace {DASHBOARD_UID} with the deterministic UID from the dashboard.

Step 14 — Make Scripts Executable

Use Bash to run:

chmod +x run-grafana.sh stop-grafana.sh test-grafana.sh

Step 15 — Validation

  1. Validate dashboard.json is valid JSON using: python3 -m json.tool grafana/dashboards/dashboard.json > /dev/null 2>&1
  2. Verify all generated files exist
  3. Print any validation errors

Step 16 — Output Summary

Print a summary to the user:

Grafana Dashboard Generated
=============================
Stack:       {DETECTED_STACK}
App Port:    {APP_PORT}
Metrics:     {METRICS_PATH}
Endpoints:   {ENDPOINT_COUNT} discovered

Machine Metrics Panels:
  - CPU Usage %
  - Memory Usage %
  - Memory Used
  - Disk Usage %
  - Disk I/O (read/write)
  - Disk IOPS
  - Network In
  - Network Out

Application Metrics Panels:
  - Total Requests/sec
  - Total Errors/sec
  - Error Rate %
  - Requests/sec per Endpoint
  - Errors/sec per Endpoint
  - Latency Percentiles (p50, p70, p90, p95, p99, p99.9)
  - Latency Heatmap

Discovered Endpoints:
  {METHOD} {PATH}
  {METHOD} {PATH}
  ...

Files Generated:
  grafana/dashboards/dashboard.json
  grafana/provisioning/datasources/datasources.yaml
  grafana/provisioning/dashboards/dashboards.yaml
  prometheus/prometheus.yml
  podman-compose.yaml
  run-grafana.sh
  stop-grafana.sh
  test-grafana.sh

To start:
  1. Start your application on port {APP_PORT}
  2. Run: ./run-grafana.sh
  3. Open: http://localhost:3000

To stop:
  ./stop-grafana.sh

To verify:
  ./test-grafana.sh

If metrics library warnings were printed in Step 4, remind the user at the end:

REMINDER: Add the missing metrics dependencies listed above before starting your app.

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results