Agent skill

health-probe

Health probes for both sides of the AGI stack — openclaw + arifOS MCP

Stars 39
Forks 5

Install this agent skill to your Project

npx add-skill https://github.com/ariffazil/arifOS/tree/main/openclaw-workspace/skills/health-probe

SKILL.md

Health Probe — AGI Stack Monitor

Triggers: "health", "probe", "is everything ok", "check stack", "gateway health", "arifos health", "container health", "is arifos sick", "system status"


On Trigger — Run Full Probe

1. arifOS MCP Side

bash
curl -sf http://arifosmcp:8080/health | jq '{status, tools_loaded, version, uptime}'

Expected: status: "healthy", tools_loaded: 13 Alert if: tools_loaded < 13 or status != "healthy"

2. OpenClaw Gateway Self

bash
curl -sf http://localhost:18789/ | head -c 200 2>/dev/null && echo "GATEWAY_UP" || echo "GATEWAY_UNREACHABLE"

3. All Containers Status

bash
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" | grep -v "^NAME"

Flag any container NOT showing healthy or Up:

  • unhealthy → CRITICAL
  • Exited / Restarting → CRITICAL
  • Up X minutes without healthy → WARNING (check if has healthcheck)

4. Disk Check

bash
df -h / | awk 'NR==2 {
  used=$5+0
  if (used > 85) print "DISK_CRITICAL: " used "% used"
  else if (used > 75) print "DISK_WARNING: " used "% used"
  else print "DISK_OK: " used "% used"
}'

5. RAM Check

bash
free -h | awk '/^Mem:/ {
  total=$2; avail=$7
  print "RAM: total=" total " available=" avail
}'
docker stats --no-stream --format "{{.Name}}: {{.MemUsage}}" | sort -t'/' -k1 -rh | head -5

6. Ollama Models Available

bash
docker exec ollama_engine ollama list 2>/dev/null | tail -n +2

7. Log the probe result

bash
echo "{\"ts\":\"$(date -u +%Y-%m-%dT%H:%M:%SZ)\",\"event\":\"health_probe\",\"agent\":\"arifOS_bot\"}" \
  >> ~/.openclaw/workspace/logs/audit.jsonl

Alert Thresholds

Metric WARNING CRITICAL Action
Disk usage >75% >85% Notify Arif on Telegram
RAM available <3 GiB <1.5 GiB Notify + pause heavy tasks
tools_loaded <13 <10 Restart arifosmcp
Container state Restarting Exited/unhealthy docker compose up -d
Model count 0 docker exec ollama_engine ollama pull qwen2.5:3b

Auto-Recovery Actions (within authority)

bash
# Restart unhealthy arifOS
docker compose -f /mnt/arifos/docker-compose.yml restart arifosmcp

# Restart unhealthy openclaw (from host — self-restart)
docker compose -f /mnt/arifos/docker-compose.yml restart openclaw

# Clear disk if >80%
docker builder prune -f
docker image prune -f --filter "dangling=true"

Run this skill on every session start to establish baseline. Alert Arif via Telegram if CRITICAL.


Telegram Alerts (when CRITICAL)

bash
# Send alert to Arif via Telegram bot
send_telegram_alert() {
  local MESSAGE="$1"
  if [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
    curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
      -d "chat_id=${TELEGRAM_CHAT_ID}" \
      -d "text=⚠️ arifOS_bot: ${MESSAGE}" \
      -d "parse_mode=Markdown" > /dev/null
  fi
}

# Example alerts:
# send_telegram_alert "🔴 arifosmcp UNHEALTHY — tools_loaded=$(curl ...)"
# send_telegram_alert "💿 Disk ${DISK_PCT}% — run: docker builder prune -f"
# send_telegram_alert "🧠 RAM critical — available: ${RAM_AVAIL}MiB"

Note: TELEGRAM_CHAT_ID is the numeric chat ID of Arif's chat with @arifOS_bot. To get it: message the bot, then check https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/getUpdates

Expand your agent's capabilities with these related and highly-rated skills.

ariffazil/arifOS

mcp-config-separation

39 5
Explore
ariffazil/arifOS

drift-watcher

Periodic knowledge freshness checker: detects when local configs, runbooks, or agent knowledge have drifted from the latest official docs. Reduces the stale-knowledge paradox over time. Use when: (1) periodic health checks or heartbeat runs, (2) before major operations, (3) user asks 'am I up to date', 'check for updates', 'is anything outdated', (4) after a software upgrade to verify configs still match new docs.

39 5
Explore
ariffazil/arifOS

MCP_CONFIG

39 5
Explore
ariffazil/arifOS

config-guardian

Universal governed config co-pilot. Before ANY change to ANY system: (1) check latest docs and running version (docs-first), (2) propose as diff with risk analysis, never apply directly (propose-only), (3) log every change with evidence and rollback (change ledger). Works for OpenClaw, Docker, PostgreSQL, Nginx, arifOS, or any software. Triggers on: 'change config', 'fix settings', 'update', 'propose patch', 'explain config', 'validate config', 'why did we change X'. Enforces propose-only workflow — human applies via git.

39 5
Explore
ariffazil/arifOS

drift-watcher

Periodic knowledge freshness checker: detects when local configs, runbooks, or agent knowledge have drifted from the latest official docs. Reduces the stale-knowledge paradox over time. Use when: (1) periodic health checks or heartbeat runs, (2) before major operations, (3) user asks 'am I up to date', 'check for updates', 'is anything outdated', (4) after a software upgrade to verify configs still match new docs.

39 5
Explore
ariffazil/arifOS

config-guardian

Universal governed config co-pilot. Before ANY change to ANY system: (1) check latest docs and running version (docs-first), (2) propose as diff with risk analysis, never apply directly (propose-only), (3) log every change with evidence and rollback (change ledger). Works for OpenClaw, Docker, PostgreSQL, Nginx, arifOS, or any software. Triggers on: 'change config', 'fix settings', 'update', 'propose patch', 'explain config', 'validate config', 'why did we change X'. Enforces propose-only workflow — human applies via git.

39 5
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results