Agent skill

troubleshoot-sandbox

Troubleshoot OpenSandbox issues by running diagnostics (logs, inspect, events, summary) via CLI or HTTP API to diagnose sandbox failures like OOM, crash, image pull errors, network problems, etc.

Stars 9,933
Forks 771

Install this agent skill to your Project

npx add-skill https://github.com/alibaba/OpenSandbox/tree/main/skills/troubleshoot-sandbox

SKILL.md

OpenSandbox Troubleshooting

Troubleshoot sandbox $ARGUMENTS using the opensandbox diagnostics.

There are two ways to interact with the diagnostics API: CLI (if opensandbox CLI is installed) or HTTP (curl against the server directly). Use whichever is available. The HTTP approach works regardless of how the sandbox was created (SDK, API, CLI).

Workflow

Step 1: Confirm sandbox state

CLI:

bash
opensandbox sandbox get <sandbox-id>

HTTP:

bash
curl http://<server-domain>/v1/sandboxes/<sandbox-id>

If the server requires authentication, add -H "OPEN-SANDBOX-API-KEY: <your-key>" to all curl commands.

Check the sandbox status (Running, Pending, Paused, Failed, etc.). If the sandbox is not found, it may have been deleted or expired.

Step 2: Get diagnostics summary (recommended first action)

CLI:

bash
opensandbox devops summary <sandbox-id>

HTTP:

bash
curl http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/summary

This returns a combined plain-text view of:

  • Inspect: container/pod details (status, resources, network, labels)
  • Events: state transitions, OOM kills, errors
  • Logs: recent container output

Read the output carefully and look for common failure patterns listed below.

Step 3: Drill down if needed

If the summary is not enough, use individual endpoints for more detail:

CLI:

bash
opensandbox devops logs <sandbox-id> --tail 500
opensandbox devops logs <sandbox-id> --since 30m
opensandbox devops inspect <sandbox-id>
opensandbox devops events <sandbox-id> --limit 100

HTTP:

bash
# Get more log lines
curl "http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/logs?tail=500"

# Get logs from recent time window
curl "http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/logs?since=30m"

# Detailed container/pod inspection
curl "http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/inspect"

# More events
curl "http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/events?limit=100"

Step 4: Diagnose common problems

Symptom What to check Likely cause
Status=Pending, no IP inspect - look for Waiting containers Image pull failure, insufficient resources, node scheduling
OOMKilled=true inspect - check memory limits Container exceeded memory limit, increase memory resource
Exit Code 137 events + logs OOM kill or external SIGKILL
Exit Code 1 logs - check application output Application error, check entrypoint and env vars
Exit Code 126/127 logs Entrypoint command not found or not executable
Connection refused to sandbox inspect - check ports and network Service not started inside sandbox, wrong port, network policy blocking
Sandbox stuck in Running but unresponsive logs (tail=200) Application hung, check for deadlocks or resource exhaustion
execd health check failing logs - look for execd errors execd daemon crashed or port conflict
ImagePullBackOff (K8s) events Wrong image name, missing registry credentials
CrashLoopBackOff (K8s) events + logs Application keeps crashing, check exit code and logs

Step 5: Suggest resolution

Based on the diagnosis, suggest one of:

  • Image issue: Verify image name, check registry access
  • OOM: Increase memory limit in sandbox creation (e.g. memory=4Gi)
  • Application error: Fix the entrypoint or application code
  • Network: Check network policy, verify port configuration
  • Scheduling (K8s): Check node resources, check pool availability
  • execd: Update execd image version, check port conflicts

API Reference

All diagnostics endpoints return text/plain and are available at:

Endpoint Query Params Description
GET /v1/sandboxes/{id}/diagnostics/summary tail (default 50), event_limit (default 20) Combined inspect + events + logs
GET /v1/sandboxes/{id}/diagnostics/logs tail (default 100), since (e.g. 10m, 1h) Container/pod logs
GET /v1/sandboxes/{id}/diagnostics/inspect - Container/pod detailed state
GET /v1/sandboxes/{id}/diagnostics/events limit (default 50) Container/pod events

Expand your agent's capabilities with these related and highly-rated skills.

petekp/claude-code-setup

ubiquitous-language

Extract a DDD-style ubiquitous language glossary from the current conversation, flagging ambiguities and proposing canonical terms. Saves to UBIQUITOUS_LANGUAGE.md. Use when user wants to define domain terms, build a glossary, harden terminology, create a ubiquitous language, or mentions "domain model" or "DDD".

20 6
Explore
petekp/claude-code-setup

every-style-editor

This skill should be used when reviewing or editing copy to ensure adherence to Every's style guide. It provides a systematic line-by-line review process for grammar, punctuation, mechanics, and style guide compliance.

20 6
Explore
petekp/claude-code-setup

manage-codex

Autonomous Codex batch orchestrator. Use for "/manage-codex", "manage codex", "use codex", "dispatch to codex", or long-running Codex work.

20 6
Explore
petekp/claude-code-setup

seo-audit

When the user wants to audit, review, or diagnose SEO issues on their site. Also use when the user mentions "SEO audit," "technical SEO," "why am I not ranking," "SEO issues," "on-page SEO," "meta tags review," "SEO health check," "my traffic dropped," "lost rankings," "not showing up in Google," "site isn't ranking," "Google update hit me," "page speed," "core web vitals," "crawl errors," or "indexing issues." Use this even if the user just says something vague like "my SEO is bad" or "help with SEO" — start with an audit. For building pages at scale to target keywords, see programmatic-seo. For adding structured data, see schema-markup. For AI search optimization, see ai-seo.

20 6
Explore
petekp/claude-code-setup

capture-learning

Analyze recent conversation context and capture learnings to project knowledge files (for project-specific insights) or skills/commands/subagents (for cross-project patterns). Use when the user asks to "capture this learning", "update the docs with this", "remember this for next time", "document this issue", "add this to CLAUDE.md", "save this knowledge", or "update project knowledge". Also triggers after resolving build/setup issues, discovering non-obvious patterns, or completing debugging sessions with valuable insights.

20 6
Explore
petekp/claude-code-setup

agent-changelog

Compile an agent-optimized changelog by cross-referencing git history with plans and documentation. Use when asked to "update changelog", "compile history", "document project evolution", or proactively after major milestones, architectural changes, or when stale/deprecated information is detected that could confuse coding agents.

20 6
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results