Agent skills
troubleshoot-sandbox

Agent skill

troubleshoot-sandbox

Troubleshoot OpenSandbox issues by running diagnostics (logs, inspect, events, summary) via CLI or HTTP API to diagnose sandbox failures like OOM, crash, image pull errors, network problems, etc.

View SKILL.md on GitHub Repository

Stars 9,933

Forks 771

Install this agent skill to your Project

npx add-skill https://github.com/alibaba/OpenSandbox/tree/main/skills/troubleshoot-sandbox

SKILL.md

OpenSandbox Troubleshooting

Troubleshoot sandbox $ARGUMENTS using the opensandbox diagnostics.

There are two ways to interact with the diagnostics API: CLI (if opensandbox CLI is installed) or HTTP (curl against the server directly). Use whichever is available. The HTTP approach works regardless of how the sandbox was created (SDK, API, CLI).

Workflow

Step 1: Confirm sandbox state

CLI:

bash

opensandbox sandbox get <sandbox-id>

HTTP:

bash

curl http://<server-domain>/v1/sandboxes/<sandbox-id>

If the server requires authentication, add -H "OPEN-SANDBOX-API-KEY: <your-key>" to all curl commands.

Check the sandbox status (Running, Pending, Paused, Failed, etc.). If the sandbox is not found, it may have been deleted or expired.

Step 2: Get diagnostics summary (recommended first action)

CLI:

bash

opensandbox devops summary <sandbox-id>

HTTP:

bash

curl http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/summary

This returns a combined plain-text view of:

Inspect: container/pod details (status, resources, network, labels)
Events: state transitions, OOM kills, errors
Logs: recent container output

Read the output carefully and look for common failure patterns listed below.

Step 3: Drill down if needed

If the summary is not enough, use individual endpoints for more detail:

CLI:

bash

opensandbox devops logs <sandbox-id> --tail 500
opensandbox devops logs <sandbox-id> --since 30m
opensandbox devops inspect <sandbox-id>
opensandbox devops events <sandbox-id> --limit 100

HTTP:

bash

# Get more log lines
curl "http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/logs?tail=500"

# Get logs from recent time window
curl "http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/logs?since=30m"

# Detailed container/pod inspection
curl "http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/inspect"

# More events
curl "http://<server-domain>/v1/sandboxes/<sandbox-id>/diagnostics/events?limit=100"

Step 4: Diagnose common problems

Symptom	What to check	Likely cause
Status=Pending, no IP	inspect - look for Waiting containers	Image pull failure, insufficient resources, node scheduling
OOMKilled=true	inspect - check memory limits	Container exceeded memory limit, increase memory resource
Exit Code 137	events + logs	OOM kill or external SIGKILL
Exit Code 1	logs - check application output	Application error, check entrypoint and env vars
Exit Code 126/127	logs	Entrypoint command not found or not executable
Connection refused to sandbox	inspect - check ports and network	Service not started inside sandbox, wrong port, network policy blocking
Sandbox stuck in Running but unresponsive	logs (tail=200)	Application hung, check for deadlocks or resource exhaustion
execd health check failing	logs - look for execd errors	execd daemon crashed or port conflict
ImagePullBackOff (K8s)	events	Wrong image name, missing registry credentials
CrashLoopBackOff (K8s)	events + logs	Application keeps crashing, check exit code and logs

Step 5: Suggest resolution

Based on the diagnosis, suggest one of:

Image issue: Verify image name, check registry access
OOM: Increase memory limit in sandbox creation (e.g. memory=4Gi)
Application error: Fix the entrypoint or application code
Network: Check network policy, verify port configuration
Scheduling (K8s): Check node resources, check pool availability
execd: Update execd image version, check port conflicts

API Reference

All diagnostics endpoints return text/plain and are available at:

Endpoint	Query Params	Description
`GET /v1/sandboxes/{id}/diagnostics/summary`	`tail` (default 50), `event_limit` (default 20)	Combined inspect + events + logs
`GET /v1/sandboxes/{id}/diagnostics/logs`	`tail` (default 100), `since` (e.g. 10m, 1h)	Container/pod logs
`GET /v1/sandboxes/{id}/diagnostics/inspect`	-	Container/pod detailed state
`GET /v1/sandboxes/{id}/diagnostics/events`	`limit` (default 50)	Container/pod events

Maintainer

alibaba Core maintainer

Source details

Full Name: alibaba/OpenSandbox
Branch: main
Path in repo: skills/troubleshoot-sandbox
License: Apache License 2.0
Topics: ai ai-agent kubernetes ai-infra sandbox

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

petekp/claude-code-setup

ubiquitous-language

Extract a DDD-style ubiquitous language glossary from the current conversation, flagging ambiguities and proposing canonical terms. Saves to UBIQUITOUS_LANGUAGE.md. Use when user wants to define domain terms, build a glossary, harden terminology, create a ubiquitous language, or mentions "domain model" or "DDD".

20 6

Explore

petekp/claude-code-setup

every-style-editor

This skill should be used when reviewing or editing copy to ensure adherence to Every's style guide. It provides a systematic line-by-line review process for grammar, punctuation, mechanics, and style guide compliance.

20 6

Explore

petekp/claude-code-setup

manage-codex

Autonomous Codex batch orchestrator. Use for "/manage-codex", "manage codex", "use codex", "dispatch to codex", or long-running Codex work.

20 6

Explore

petekp/claude-code-setup

seo-audit

When the user wants to audit, review, or diagnose SEO issues on their site. Also use when the user mentions "SEO audit," "technical SEO," "why am I not ranking," "SEO issues," "on-page SEO," "meta tags review," "SEO health check," "my traffic dropped," "lost rankings," "not showing up in Google," "site isn't ranking," "Google update hit me," "page speed," "core web vitals," "crawl errors," or "indexing issues." Use this even if the user just says something vague like "my SEO is bad" or "help with SEO" — start with an audit. For building pages at scale to target keywords, see programmatic-seo. For adding structured data, see schema-markup. For AI search optimization, see ai-seo.

20 6

Explore

petekp/claude-code-setup

capture-learning

Analyze recent conversation context and capture learnings to project knowledge files (for project-specific insights) or skills/commands/subagents (for cross-project patterns). Use when the user asks to "capture this learning", "update the docs with this", "remember this for next time", "document this issue", "add this to CLAUDE.md", "save this knowledge", or "update project knowledge". Also triggers after resolving build/setup issues, discovering non-obvious patterns, or completing debugging sessions with valuable insights.

20 6

Explore

petekp/claude-code-setup

agent-changelog

Compile an agent-optimized changelog by cross-referencing git history with plans and documentation. Use when asked to "update changelog", "compile history", "document project evolution", or proactively after major milestones, architectural changes, or when stale/deprecated information is detected that could confuse coding agents.

20 6

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

OpenSandbox Troubleshooting

Workflow

Step 1: Confirm sandbox state

Step 2: Get diagnostics summary (recommended first action)

Step 3: Drill down if needed

Step 4: Diagnose common problems

Step 5: Suggest resolution

API Reference

Recommended Agent Skills

ubiquitous-language

every-style-editor

manage-codex

seo-audit

capture-learning

agent-changelog