Agent skill
mcaf-observability
Design or improve observability for application and delivery flows: logs, metrics, traces, correlation, alerts, and operational diagnostics. Use when a change affects runtime visibility, failure diagnosis, SLOs, or alerting.
Stars
47
Forks
6
Install this agent skill to your Project
npx add-skill https://github.com/managedcode/MCAF/tree/main/skills/mcaf-observability
SKILL.md
MCAF: Observability
Trigger On
- a change affects runtime visibility or failure diagnosis
- logs, metrics, traces, or alerts are missing or vague
- the team cannot answer "how will we know this broke?"
Value
- produce a concrete project delta: code, docs, config, tests, CI, or review artifact
- reduce ambiguity through explicit planning, verification, and final validation skills
- leave reusable project context so future tasks are faster and safer
Do Not Use For
- feature behaviour work with no runtime visibility impact
- generic monitoring talk with no concrete flow to instrument
Inputs
- the critical user or system flow under change
- current logs, metrics, traces, dashboards, and alerts
- operator expectations for diagnosis and response
Quick Start
- Read the nearest
AGENTS.mdand confirm scope and constraints. - Run this skill's
Workflowthrough theRalph Loopuntil outcomes are acceptable. - Return the
Required Result Formatwith concrete artifacts and verification evidence.
Workflow
- Identify the critical user or system flow that needs visibility.
- Define what must be observable:
- success and failure
- latency and throughput
- correlation across boundaries
- actionable alerting
- Treat observability as part of done, not an afterthought.
- Load only the references that match the affected runtime concern.
Deliver
- observability requirements for the changed flow
- updated logging, metrics, traces, or alerting guidance
- clear operator and engineer visibility expectations
Validate
- a failure can be detected and diagnosed from the chosen signals
- alerts are actionable, not noise
- cross-boundary correlation is possible where the flow needs it
- the observability plan matches user impact and operator needs
Ralph Loop
Use the Ralph Loop for every task, including docs, architecture, testing, and tooling work.
- Brainstorm first (mandatory):
- analyze current state
- define the problem, target outcome, constraints, and risks
- generate options and think through trade-offs before committing
- capture the recommended direction and open questions
- Plan second (mandatory):
- write a detailed execution plan from the chosen direction
- list final validation skills to run at the end, with order and reason
- Execute one planned step and produce a concrete delta.
- Review the result and capture findings with actionable next fixes.
- Apply fixes in small batches and rerun the relevant checks or review steps.
- Update the plan after each iteration.
- Repeat until outcomes are acceptable or only explicit exceptions remain.
- If a dependency is missing, bootstrap it or return
status: not_applicablewith explicit reason and fallback path.
Required Result Format
status:complete|clean|improved|configured|not_applicable|blockedplan: concise plan and current iteration stepactions_taken: concrete changes madevalidation_skills: final skills run, or skipped with reasonsverification: commands, checks, or review evidence summaryremaining: top unresolved items ornone
For setup-only requests with no execution, return status: configured and exact next commands.
Load References
- read
references/observability.mdfirst - open
references/alerting.md,references/best-practices.md,references/correlation-id.md,references/log-vs-metric-vs-trace.md, orreferences/pitfalls.mdonly when needed
Example Requests
- "Add observability requirements for this background worker."
- "We have logs but still cannot debug failures. Fix the plan."
- "Define alerts and traces for this API flow."
Didn't find tool you were looking for?