Agent skill

uptime-monitoring

Implement uptime monitoring and status page systems for tracking service availability. Use when monitoring application uptime, creating status pages, or implementing health checks.

Stars 151
Forks 20

Install this agent skill to your Project

npx add-skill https://github.com/aj-geddes/useful-ai-prompts/tree/main/skills/uptime-monitoring

SKILL.md

Uptime Monitoring

Table of Contents

Overview

Set up comprehensive uptime monitoring with health checks, status pages, and incident tracking to ensure visibility into service availability.

When to Use

  • Service availability tracking
  • Health check implementation
  • Status page creation
  • Incident management
  • SLA monitoring

Quick Start

Minimal working example:

javascript
// Node.js health check
const express = require("express");
const app = express();

app.get("/health", (req, res) => {
  res.json({
    status: "ok",
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
  });
});

app.get("/health/deep", async (req, res) => {
  const health = {
    status: "ok",
    checks: {
      database: "unknown",
      cache: "unknown",
      externalApi: "unknown",
    },
  };

  try {
    const dbResult = await db.query("SELECT 1");
    health.checks.database = dbResult ? "ok" : "error";
// ... (see reference guides for full implementation)

Reference Guides

Detailed implementations in the references/ directory:

Guide Contents
Health Check Endpoints Health Check Endpoints
Python Health Checks Python Health Checks
Uptime Monitor with Heartbeat Uptime Monitor with Heartbeat
Public Status Page API Public Status Page API
Kubernetes Health Probes Kubernetes Health Probes

Best Practices

✅ DO

  • Implement comprehensive health checks
  • Check all critical dependencies
  • Use appropriate timeout values
  • Track response times
  • Store check history
  • Monitor uptime trends
  • Alert on status changes
  • Use standard HTTP status codes

❌ DON'T

  • Check only application process
  • Ignore external dependencies
  • Set timeouts too low
  • Alert on every failure
  • Use health checks for load balancing
  • Expose sensitive information

Didn't find tool you were looking for?

Be as detailed as possible for better results