Agent skill

monitoring

Game server monitoring with metrics, alerting, and performance tracking for production reliability

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/product/monitoring

SKILL.md

Server Monitoring

Monitor game server health with metrics, logs, and alerts.

Key Game Metrics

javascript
const prometheus = require('prom-client');

// Player metrics
const activePlayers = new prometheus.Gauge({
  name: 'game_active_players',
  help: 'Currently connected players',
  labelNames: ['region', 'game_mode']
});

const matchesInProgress = new prometheus.Gauge({
  name: 'game_matches_active',
  help: 'Active matches',
  labelNames: ['game_mode']
});

// Performance metrics
const tickDuration = new prometheus.Histogram({
  name: 'game_tick_duration_seconds',
  help: 'Game loop tick duration',
  buckets: [0.001, 0.005, 0.01, 0.016, 0.033]
});

const networkLatency = new prometheus.Histogram({
  name: 'game_network_latency_ms',
  help: 'Player network latency',
  labelNames: ['region'],
  buckets: [10, 25, 50, 75, 100, 150, 200]
});

Alert Rules

yaml
groups:
- name: game-alerts
  rules:
  - alert: GameServerDown
    expr: up{job="game-servers"} == 0
    for: 1m
    labels:
      severity: critical

  - alert: HighTickLatency
    expr: histogram_quantile(0.99, game_tick_duration_seconds) > 0.02
    for: 5m
    labels:
      severity: high

  - alert: LowPlayerCount
    expr: game_active_players < 10
    for: 10m
    labels:
      severity: warning

Target Thresholds

Metric Target Alert
Tick Rate 60 Hz < 55 Hz
Latency P99 < 100ms > 200ms
Memory < 80% > 90%
CPU < 70% > 85%

Troubleshooting

Common Failure Modes

Error Root Cause Solution
Missing metrics Scrape failure Check targets
Alert storms Too sensitive Tune thresholds
Dashboard slow Too many queries Aggregate
Gaps in data Network issues Add redundancy

Debug Checklist

bash
# Check Prometheus targets
curl localhost:9090/api/v1/targets | jq '.data.activeTargets'

# Check firing alerts
curl localhost:9090/api/v1/alerts | jq '.data.alerts'

# Query metrics
curl 'localhost:9090/api/v1/query?query=game_active_players'

Unit Test Template

javascript
describe('Metrics', () => {
  test('records tick duration', async () => {
    const end = tickDuration.startTimer();
    await sleep(10);
    end();

    const metrics = await prometheus.register.metrics();
    expect(metrics).toContain('game_tick_duration_seconds');
  });
});

Resources

  • assets/ - Dashboard configs
  • references/ - Alerting guides

Didn't find tool you were looking for?

Be as detailed as possible for better results