Agent skill

debug

Investigate stuck runs and execution failures by tracing Symphony and Codex logs with issue/session identifiers; use when runs stall, retry repeatedly, or fail unexpectedly.

Stars 26
Forks 4

Install this agent skill to your Project

npx add-skill https://github.com/ReinaMacCredy/maestro/tree/main/.codex/skills/maestro:symphony-setup/reference/codex-skills/debug

SKILL.md

Debug

Goals

  • Find why a run is stuck, retrying, or failing.
  • Correlate Linear issue identity to a Codex session quickly.
  • Read the right logs in the right order to isolate root cause.

Log Sources

  • Primary runtime log: log/symphony.log
    • Default comes from SymphonyElixir.LogFile (log/symphony.log).
    • Includes orchestrator, agent runner, and Codex app-server lifecycle logs.
  • Rotated runtime logs: log/symphony.log*
    • Check these when the relevant run is older.

Correlation Keys

  • issue_identifier: human ticket key (example: MT-625)
  • issue_id: Linear UUID (stable internal ID)
  • session_id: Codex thread-turn pair (<thread_id>-<turn_id>)

elixir/docs/logging.md requires these fields for issue/session lifecycle logs. Use them as your join keys during debugging.

Quick Triage (Stuck Run)

  1. Confirm scheduler/worker symptoms for the ticket.
  2. Find recent lines for the ticket (issue_identifier first).
  3. Extract session_id from matching lines.
  4. Trace that session_id across start, stream, completion/failure, and stall handling logs.
  5. Decide class of failure: timeout/stall, app-server startup failure, turn failure, or orchestrator retry loop.

Commands

bash
# 1) Narrow by ticket key (fastest entry point)
rg -n "issue_identifier=MT-625" log/symphony.log*

# 2) If needed, narrow by Linear UUID
rg -n "issue_id=<linear-uuid>" log/symphony.log*

# 3) Pull session IDs seen for that ticket
rg -o "session_id=[^ ;]+" log/symphony.log* | sort -u

# 4) Trace one session end-to-end
rg -n "session_id=<thread>-<turn>" log/symphony.log*

# 5) Focus on stuck/retry signals
rg -n "Issue stalled|scheduling retry|turn_timeout|turn_failed|Codex session failed|Codex session ended with error" log/symphony.log*

Investigation Flow

  1. Locate the ticket slice:
    • Search by issue_identifier=<KEY>.
    • If noise is high, add issue_id=<UUID>.
  2. Establish timeline:
    • Identify first Codex session started ... session_id=....
    • Follow with Codex session completed, ended with error, or worker exit lines.
  3. Classify the problem:
    • Stall loop: Issue stalled ... restarting with backoff.
    • App-server startup: Codex session failed ....
    • Turn execution failure: turn_failed, turn_cancelled, turn_timeout, or ended with error.
    • Worker crash: Agent task exited ... reason=....
  4. Validate scope:
    • Check whether failures are isolated to one issue/session or repeating across multiple tickets.
  5. Capture evidence:
    • Save key log lines with timestamps, issue_identifier, issue_id, and session_id.
    • Record probable root cause and the exact failing stage.

Reading Codex Session Logs

In Symphony, Codex session diagnostics are emitted into log/symphony.log and keyed by session_id. Read them as a lifecycle:

  1. Codex session started ... session_id=...
  2. Session stream/lifecycle events for the same session_id
  3. Terminal event:
    • Codex session completed ..., or
    • Codex session ended with error ..., or
    • Issue stalled ... restarting with backoff

For one specific session investigation, keep the trace narrow:

  1. Capture one session_id for the ticket.
  2. Build a timestamped slice for only that session:
    • rg -n "session_id=<thread>-<turn>" log/symphony.log*
  3. Mark the exact failing stage:
    • Startup failure before stream events (Codex session failed ...).
    • Turn/runtime failure after stream events (turn_* / ended with error).
    • Stall recovery (Issue stalled ... restarting with backoff).
  4. Pair findings with issue_identifier and issue_id from nearby lines to confirm you are not mixing concurrent retries.

Always pair session findings with issue_identifier/issue_id to avoid mixing concurrent runs.

Notes

  • Prefer rg over grep for speed on large logs.
  • Check rotated logs (log/symphony.log*) before concluding data is missing.
  • If required context fields are missing in new log statements, align with elixir/docs/logging.md conventions.

Expand your agent's capabilities with these related and highly-rated skills.

ReinaMacCredy/maestro

maestro-skill-author

Create, update, or debug maestro built-in skills. Covers SKILL.md frontmatter, reference directory structure, step-file architecture, build-time embedding, naming conventions, alias management, and registry validation. Use when creating a new maestro built-in skill, modifying an existing SKILL.md, adding reference files, debugging skill loading failures, updating the skills registry, or working on the skills full port. Also use when frontmatter validation fails, skills don't appear in skill-list, or reference files fail to load.

26 4
Explore
ReinaMacCredy/maestro

maestro:brainstorming

Use before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation.

26 4
Explore
ReinaMacCredy/maestro

mcp-builder

Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).

26 4
Explore
ReinaMacCredy/maestro

maestro:plan-review-loop

Deep-review any plan (maestro, Codex, Claude Code plan mode, or plain markdown) using iterative subagent review loops with BMAD-inspired adversarial edge-case discovery. Spawns reviewer subagents that find issues using pre-mortem, inversion, and red-team techniques, auto-fixes them with structured fix strategies, and re-reviews until the plan passes with zero actionable issues. Use when the user says 'review the plan', 'deep review', 'check the plan thoroughly', 'review loop', 'validate before approving', or wants rigorous plan validation before execution. Also use proactively before plan-approve when the plan is complex or high-risk.

26 4
Explore
ReinaMacCredy/maestro

maestro:research

Structured research workflow for maestro features. Guides tool selection across three tiers (codebase exploration, Context7 for library docs, NotebookLM for deep analysis), defines research patterns, finding organization via memory_write, and completion criteria. Use during the research pipeline stage after feature_create and before plan_write. Also use when investigating a problem space, comparing technical approaches, gathering context on unfamiliar code, or needing to understand external library APIs before making architectural decisions.

26 4
Explore
ReinaMacCredy/maestro

cli-for-agents

Designs or reviews CLIs so coding agents can run them reliably: non-interactive flags, layered --help with examples, stdin/pipelines, fast actionable errors, idempotency, dry-run, and predictable structure. Use when building a CLI, adding commands, writing --help, or when the user mentions agents, terminals, or automation-friendly CLIs.

26 4
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results