Agent skill
activity-monitor
Guardian service that monitors the active runtime agent's state and automatically restarts it if stopped. Use when checking agent liveness state or understanding the auto-restart mechanism.
Install this agent skill to your Project
npx add-skill https://github.com/zylos-ai/zylos-core/tree/main/skills/activity-monitor
SKILL.md
Activity Monitor Skill
PM2 guardian service that monitors the active runtime agent's (Claude or Codex) activity state and ensures it's always running.
When to Use
This is a PM2 service (not directly invoked by Claude). It runs continuously in the background.
What It Does
- Activity Monitoring: Tracks the agent's busy/idle state every second
- Status File: Writes
~/zylos/activity-monitor/agent-status.jsonwith current state (busy/idle, idle_seconds, health) - Guardian Mode: Automatically restarts the agent if it stops or crashes
- Maintenance Awareness: Waits for restart/upgrade scripts to complete before starting the agent
- Heartbeat Liveness Detection: Periodically sends heartbeat probes via the C4 control queue to verify the agent is responsive, triggering recovery when probes fail
- Health Check: Periodically enqueues system health checks (PM2, disk, memory) via the C4 control queue
- Daily Upgrade: Enqueues a Claude Code upgrade via the C4 control queue at 5:00 AM local time daily (disabled by default; enable with
zylos config set daily_upgrade_enabled true) - Context Monitoring: Receives context usage data after every turn; triggers early memory sync and new-session handoff at configurable thresholds (Claude default 70%, Codex default 75%)
Status File Format
{
"state": "idle",
"last_activity": 1738675200,
"last_check": 1738675210,
"last_check_human": "2026-02-04 20:30:10",
"idle_seconds": 5,
"inactive_seconds": 10,
"source": "conv_file",
"health": "ok"
}
state: "busy" | "idle" | "stopped" | "offline"idle_seconds: Time since entering idle state (0 when busy)source: "conv_file" (reliable) | "tmux_activity" (fallback)health: "ok" | "recovering" | "down" — liveness health from the heartbeat engine
PM2 Management
# Start
pm2 start ~/zylos/.claude/skills/activity-monitor/scripts/activity-monitor.js --name activity-monitor
# Restart
pm2 restart activity-monitor
# View logs
pm2 logs activity-monitor
# Check status
pm2 list
Guardian Behavior
- Detection: Checks if the agent is running every second
- Restart Delay: Waits 5 seconds of continuous "not running" before restarting
- Maintenance Wait: Detects restart/upgrade scripts and waits for completion
- Recovery Prompt: Sends catch-up message via C4 after restart
How It Works
-
Monitor Loop (every 1s):
- Check if tmux session exists
- Check if agent process is running
- Detect activity (conversation file mtime for Claude; tmux activity for Codex)
- Calculate idle/busy state
- Write status to ~/zylos/activity-monitor/agent-status.json
-
Guardian Logic:
- If agent not running for 5+ seconds → restart
- Wait for maintenance scripts to complete
- Send recovery prompt via C4 after successful restart
-
Activity Detection:
- Claude primary: Conversation file modification time (reliable)
- Codex / fallback: tmux window activity
- Threshold: 3 seconds without activity = idle
Heartbeat Liveness Detection
The heartbeat engine runs inside the activity monitor and uses the C4 control queue to verify Claude is actually responsive (not just process-alive).
State Machine
heartbeat interval elapsed
┌─────────────────────────────────────┐
│ ▼
│ ok ──[primary fails]──► ok (verify phase)
│ ▲ │
│ │ [verify fails]
│ │ ▼
│ │ recovering ──[recovery fails]──► recovering
│ │ │ │
│ │ [ack received] [max failures reached]
│ │ │ │
│ └──────────────────────────────┘ ▼
│ down
│ │
│ [ack received after manual fix]
└─────────────────────────────────────────────────────────────────┘
Phases
| Phase | Trigger | On Success | On Failure |
|---|---|---|---|
| Primary | Heartbeat interval elapsed (default 30min) | Reset timer, stay ok |
Enter verify phase |
| Verify | Primary probe failed | Return to ok |
Kill tmux, enter recovering |
| Recovery | In recovering state, Claude restarted |
Return to ok, notify pending channels |
Kill tmux, retry (up to max) |
| Down | Max restart failures reached (default 3) | Return to ok, notify pending channels |
Stay down, wait for manual fix |
Ack Deadline
Each heartbeat probe is enqueued with an ack deadline. If Claude does not acknowledge the probe before the deadline expires, the control record transitions to timeout status and the engine treats it as a failure.
Recovery Behavior
When health transitions back to ok, the engine reads ~/zylos/activity-monitor/pending-channels.jsonl and sends a recovery notification to each recorded channel/endpoint via C4, then clears the file.
Health Check
The activity monitor periodically enqueues system health checks via the C4 control queue.
- Interval: Every 24 hours (86400 seconds)
- Persisted state:
~/zylos/activity-monitor/health-check-state.json(survives restarts) - Priority: 3 (normal)
- Gated by health: Only enqueued when
health === 'ok' - Gated by agent: Only enqueued when the agent process is running
The health check control message instructs Claude to:
- Check PM2 services via
pm2 jlist - Check disk space via
df -h - Check memory via
free -m - If issues found, notify the most recent communication channel
- Log results to
~/zylos/logs/health.log - Acknowledge the control message
Daily Upgrade
The activity monitor enqueues a daily Claude Code upgrade via the C4 control queue.
- Enabled: Off by default. Enable with
zylos config set daily_upgrade_enabled true - Schedule: 5:00 AM local time (configured by
DAILY_UPGRADE_HOUR) - Timezone: Loaded from
~/zylos/.envTZfield, falls back toprocess.env.TZ, thenUTC - Persisted state:
~/zylos/activity-monitor/daily-upgrade-state.json(tracks last upgrade date) - Priority: 3 (normal)
- Gated by health: Only enqueued when
health === 'ok' - Gated by config: Only enqueued when
daily_upgrade_enabledistruein config - Once per day: Checks local date to avoid duplicate enqueues
The control message instructs Claude to use the upgrade-claude skill, which handles idle detection, /exit, upgrade, and automatic restart.
Context Monitoring
Event-driven context monitoring via Claude Code's statusLine feature, replacing the old hourly polling.
- Mechanism:
context-monitor.jsruns as a statusLine command — Claude Code pipes context data to it via stdin after every turn (zero turn cost) - Status file: Writes
~/zylos/activity-monitor/statusline.jsonwith full context data (used_percentage, remaining_percentage, cost, session_id) - Two thresholds:
- Early memory sync at 80% of threshold (e.g. 56% for Claude default 70%): Enqueues a prompt for Claude to run memory sync as a background task, so it completes well before the session switch. Only triggers when unsummarized conversation count exceeds the checkpoint threshold (30).
- New-session handoff at threshold (Claude default
70%, Codex default75%, configurable vianew_session_threshold/codex_new_session_thresholdin config.json): Enqueues the new-session trigger. Priority 1, 5-minute cooldown.
- Delivery: Enqueues via C4 control queue with bypass_state — ensures the trigger reaches Claude even during long tasks
- Two-stage design: The trigger message instructs Claude to start the new-session handoff flow; the actual
/clearis gated by block-queue-until-idle in the new-session skill's final step - Log:
~/zylos/activity-monitor/context-monitor.log
The early memory sync decouples memory sync from the session switch. Memory sync is also triggered by the new session's startup hook if unsummarized conversations exceed the threshold, so data is never lost.
Daily Memory Commit
The activity monitor runs a daily git commit of the memory/ directory.
- Schedule: 3:00 AM local time
- Timezone: Same as daily upgrade (from
~/zylos/.envTZ) - Persisted state:
~/zylos/activity-monitor/daily-memory-commit-state.json - Script: Calls
zylos-memory/scripts/daily-commit.jsdirectly (no C4, no Claude needed) - Once per day: Date-based deduplication
- Idempotent: If no memory changes exist, the script skips the commit
Timing Guarantees
Both daily tasks (upgrade, memory commit) use the same DailySchedule class:
- Window: Each target hour provides a ~3600s window; with ~1s loop interval it is virtually impossible to miss entirely
- Dedup: Date-based (
last_date === today) ensures exactly-once per day, even with imprecise timing - Persistence: State files survive activity monitor restarts
Permission Auto-Approve
When Claude Code shows a permission prompt (PermissionRequest event), the hook-auth-prompt.js hook automatically sends an Enter keystroke to approve it.
This is intentional by design. Zylos operates in a full-delegation model where the machine's permissions are entirely granted to the AI agent. Auto-approving permission prompts is the expected behavior, not a security gap.
How It Works
- Claude Code fires a PermissionRequest hook when a tool call requires user approval
hook-auth-prompt.jslogs the event tohook-timing.log- If
auto_approve_permissionis true (default), it enqueues a[KEYSTROKE]Entercontrol at priority 0 with bypass-state and 1-second delay - The C4 dispatcher delivers the Enter keystroke to the tmux session, auto-confirming the prompt
Configuration
# Disable auto-approve (require manual confirmation)
zylos config set auto_approve_permission false
# Re-enable (default)
zylos config set auto_approve_permission true
Config key: auto_approve_permission in ~/.zylos/config.json (default: true).
Log File
Activity log: ~/zylos/activity-monitor/activity.log
- Auto-truncates to 500 lines daily
- Logs state changes and guardian actions
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
web-console
Built-in web interface for communicating with Claude without external services. Use when setting up or configuring the web console channel, or troubleshooting browser-based access.
component-management
Guidelines for managing zylos components via CLI and C4 channels. Use when installing, upgrading, or uninstalling components, or when user asks about available components.
restart-claude
Use when the user asks to restart Claude Code, or after changing settings/hooks/keybindings.
comm-bridge
C4 communication bridge — central gateway for ALL external communication (Telegram, Lark, etc.). Use when replying to users via the "reply via" path, sending proactive messages to external channels, querying recent conversations or checkpoint status (prefer c4-db.js CLI; sqlite3 OK for unsupported queries), fetching conversation history for Memory Sync, or creating checkpoints after sync. Incoming messages are queued by channel bots and delivered to Claude via a PM2 dispatcher daemon. Session-start hooks automatically provide conversation context and can trigger Memory Sync when unsummarized conversations exceed the configured threshold.
new-session
Start a new session when context is high. Claude uses /clear, Codex uses /exit. Use when context is high or when a fresh session is needed.
http
Caddy-based web server providing web console hosting, file sharing, and health check endpoints. Use when configuring HTTP access, setting up file sharing, or troubleshooting web connectivity.
Didn't find tool you were looking for?