Agent skill
troubleshoot-sandbox
Troubleshoot OpenSandbox issues by running diagnostics (logs, inspect, events, summary) via CLI or HTTP API to diagnose sandbox failures like OOM, crash, image pull errors, network problems, etc.
Install this agent skill to your Project
npx add-skill https://github.com/alibaba/OpenSandbox/tree/main/skills/troubleshoot-sandbox
SKILL.md
OpenSandbox Troubleshooting
Troubleshoot sandbox $ARGUMENTS using the opensandbox diagnostics.
There are two ways to interact with the diagnostics API: CLI (if opensandbox CLI is installed) or HTTP (curl against the server directly). Use whichever is available. The HTTP approach works regardless of how the sandbox was created (SDK, API, CLI).
Workflow
Step 1: Confirm sandbox state
CLI:
opensandbox sandbox get <sandbox-id>
HTTP:
curl http://<server-domain>/v1/sandboxes/<sandbox-id>
If the server requires authentication, add -H "OPEN-SANDBOX-API-KEY: <your-key>" to all curl commands.
Check the sandbox status (Running, Pending, Paused, Failed, etc.). If the sandbox is not found, it may have been deleted or expired.
Step 2: Get diagnostics summary (recommended first action)
CLI:
opensandbox devops summary <sandbox-id>
HTTP:
curl http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/summary
This returns a combined plain-text view of:
- Inspect: container/pod details (status, resources, network, labels)
- Events: state transitions, OOM kills, errors
- Logs: recent container output
Read the output carefully and look for common failure patterns listed below.
Step 3: Drill down if needed
If the summary is not enough, use individual endpoints for more detail:
CLI:
opensandbox devops logs <sandbox-id> --tail 500
opensandbox devops logs <sandbox-id> --since 30m
opensandbox devops inspect <sandbox-id>
opensandbox devops events <sandbox-id> --limit 100
HTTP:
# Get more log lines
curl "http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/logs?tail=500"
# Get logs from recent time window
curl "http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/logs?since=30m"
# Detailed container/pod inspection
curl "http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/inspect"
# More events
curl "http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/events?limit=100"
Step 4: Diagnose common problems
| Symptom | What to check | Likely cause |
|---|---|---|
| Status=Pending, no IP | inspect - look for Waiting containers | Image pull failure, insufficient resources, node scheduling |
| OOMKilled=true | inspect - check memory limits | Container exceeded memory limit, increase memory resource |
| Exit Code 137 | events + logs | OOM kill or external SIGKILL |
| Exit Code 1 | logs - check application output | Application error, check entrypoint and env vars |
| Exit Code 126/127 | logs | Entrypoint command not found or not executable |
| Connection refused to sandbox | inspect - check ports and network | Service not started inside sandbox, wrong port, network policy blocking |
| Sandbox stuck in Running but unresponsive | logs (tail=200) | Application hung, check for deadlocks or resource exhaustion |
| execd health check failing | logs - look for execd errors | execd daemon crashed or port conflict |
| ImagePullBackOff (K8s) | events | Wrong image name, missing registry credentials |
| CrashLoopBackOff (K8s) | events + logs | Application keeps crashing, check exit code and logs |
Step 5: Suggest resolution
Based on the diagnosis, suggest one of:
- Image issue: Verify image name, check registry access
- OOM: Increase memory limit in sandbox creation (e.g.
memory=4Gi) - Application error: Fix the entrypoint or application code
- Network: Check network policy, verify port configuration
- Scheduling (K8s): Check node resources, check pool availability
- execd: Update execd image version, check port conflicts
API Reference
All diagnostics endpoints return text/plain and are available at:
| Endpoint | Query Params | Description |
|---|---|---|
GET /v1/sandboxes/{id}/diagnostics/summary |
tail (default 50), event_limit (default 20) |
Combined inspect + events + logs |
GET /v1/sandboxes/{id}/diagnostics/logs |
tail (default 100), since (e.g. 10m, 1h) |
Container/pod logs |
GET /v1/sandboxes/{id}/diagnostics/inspect |
- | Container/pod detailed state |
GET /v1/sandboxes/{id}/diagnostics/events |
limit (default 50) |
Container/pod events |
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
ubiquitous-language
Extract a DDD-style ubiquitous language glossary from the current conversation, flagging ambiguities and proposing canonical terms. Saves to UBIQUITOUS_LANGUAGE.md. Use when user wants to define domain terms, build a glossary, harden terminology, create a ubiquitous language, or mentions "domain model" or "DDD".
every-style-editor
This skill should be used when reviewing or editing copy to ensure adherence to Every's style guide. It provides a systematic line-by-line review process for grammar, punctuation, mechanics, and style guide compliance.
manage-codex
Autonomous Codex batch orchestrator. Use for "/manage-codex", "manage codex", "use codex", "dispatch to codex", or long-running Codex work.
seo-audit
When the user wants to audit, review, or diagnose SEO issues on their site. Also use when the user mentions "SEO audit," "technical SEO," "why am I not ranking," "SEO issues," "on-page SEO," "meta tags review," "SEO health check," "my traffic dropped," "lost rankings," "not showing up in Google," "site isn't ranking," "Google update hit me," "page speed," "core web vitals," "crawl errors," or "indexing issues." Use this even if the user just says something vague like "my SEO is bad" or "help with SEO" — start with an audit. For building pages at scale to target keywords, see programmatic-seo. For adding structured data, see schema-markup. For AI search optimization, see ai-seo.
capture-learning
Analyze recent conversation context and capture learnings to project knowledge files (for project-specific insights) or skills/commands/subagents (for cross-project patterns). Use when the user asks to "capture this learning", "update the docs with this", "remember this for next time", "document this issue", "add this to CLAUDE.md", "save this knowledge", or "update project knowledge". Also triggers after resolving build/setup issues, discovering non-obvious patterns, or completing debugging sessions with valuable insights.
agent-changelog
Compile an agent-optimized changelog by cross-referencing git history with plans and documentation. Use when asked to "update changelog", "compile history", "document project evolution", or proactively after major milestones, architectural changes, or when stale/deprecated information is detected that could confuse coding agents.
Didn't find tool you were looking for?