Agent skill

resilience-analysis

Assess error handling, isolation boundaries, and recovery mechanisms in agent frameworks. Use when (1) tracing error propagation paths, (2) evaluating sandboxing for code execution, (3) understanding retry and fallback mechanisms, (4) assessing production readiness, or (5) identifying failure modes and recovery patterns.

Stars 232
Forks 15

Install this agent skill to your Project

npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/dowwie/resilience-analysis

SKILL.md

Resilience Analysis

Assesses error handling and isolation boundaries.

Process

  1. Trace error propagation — Map exception flow from tools to agent
  2. Identify isolation — Sandbox mechanisms for dangerous operations
  3. Catalog recovery — Retry logic, fallbacks, circuit breakers
  4. Assess boundaries — What crashes propagate vs. are contained

Error Propagation Analysis

Questions to Answer

  1. Does a tool exception terminate the agent?
  2. Are LLM API errors retried automatically?
  3. Is parsing failure (malformed output) recoverable?
  4. What happens when state updates fail?

Propagation Patterns

Crash Propagation (Dangerous)

python
def run_tool(self, tool, args):
    return tool.execute(args)  # Exception bubbles up

Exception Wrapping

python
def run_tool(self, tool, args):
    try:
        return tool.execute(args)
    except Exception as e:
        raise ToolExecutionError(tool.name, e) from e

Error Containment

python
def run_tool(self, tool, args):
    try:
        return ToolResult(success=True, output=tool.execute(args))
    except Exception as e:
        return ToolResult(success=False, error=str(e))

Propagation Map Template

User Input
    ↓
┌─────────────────────────────────────────┐
│ Agent Loop                              │
│   ↓                                     │
│ ┌─────────────────────────────────────┐ │
│ │ LLM Call                            │ │
│ │ • APIError → [Retry 3x / Propagate] │ │
│ │ • RateLimit → [Backoff / Propagate] │ │
│ │ • Timeout → [Retry / Propagate]     │ │
│ └─────────────────────────────────────┘ │
│   ↓                                     │
│ ┌─────────────────────────────────────┐ │
│ │ Output Parsing                      │ │
│ │ • ParseError → [Retry / Contained]  │ │
│ │ • ValidationError → [Contained]     │ │
│ └─────────────────────────────────────┘ │
│   ↓                                     │
│ ┌─────────────────────────────────────┐ │
│ │ Tool Execution                      │ │
│ │ • ToolError → [Feedback to LLM]     │ │
│ │ • Timeout → [Kill / Continue]       │ │
│ │ • SecurityError → [Propagate]       │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────┘

Sandboxing Mechanisms

Code Execution Isolation

Mechanism Safety Level Performance Complexity
None ⚠️ Dangerous Fast None
RestrictedPython Medium Fast Low
AST Validation Low Fast Medium
Subprocess Medium Overhead Low
Docker/Container High High overhead Medium
gVisor/Firecracker Very High Medium overhead High

Detection Patterns

No Sandboxing

python
exec(user_code)  # Direct execution
eval(expression)  # Direct eval
subprocess.run(cmd, shell=True)  # Shell injection risk

Basic Sandboxing

python
# RestrictedPython
from RestrictedPython import compile_restricted
code = compile_restricted(user_code, '<string>', 'exec')

# AST validation
tree = ast.parse(user_code)
if has_dangerous_nodes(tree):
    raise SecurityError()

Process Isolation

python
# Subprocess with limits
result = subprocess.run(
    ['python', '-c', user_code],
    timeout=30,
    capture_output=True,
    user='nobody'  # Drop privileges
)

Container Isolation

python
import docker
client = docker.from_env()
container = client.containers.run(
    'python:3.11-slim',
    command=['python', '-c', user_code],
    mem_limit='256m',
    network_disabled=True,
    remove=True
)

Recovery Patterns

Retry Logic

python
# Simple retry
@retry(max_attempts=3, backoff=exponential)
def call_llm(self, prompt):
    return self.client.generate(prompt)

# Retry with error feedback
def call_with_retry(self, prompt, max_retries=3):
    errors = []
    for i in range(max_retries):
        try:
            return self.llm.generate(prompt)
        except ParseError as e:
            errors.append(str(e))
            prompt = f"{prompt}\n\nPrevious errors: {errors}"
    raise MaxRetriesExceeded(errors)

Fallback Mechanisms

python
def generate(self, prompt):
    try:
        return self.primary_llm.generate(prompt)
    except APIError:
        return self.fallback_llm.generate(prompt)

Circuit Breaker

python
class CircuitBreaker:
    def __init__(self, failure_threshold=5, reset_timeout=60):
        self.failures = 0
        self.state = 'closed'
        self.last_failure = None
    
    def call(self, func, *args):
        if self.state == 'open':
            if time.time() - self.last_failure > self.reset_timeout:
                self.state = 'half-open'
            else:
                raise CircuitOpen()
        
        try:
            result = func(*args)
            self.failures = 0
            self.state = 'closed'
            return result
        except Exception as e:
            self.failures += 1
            self.last_failure = time.time()
            if self.failures >= self.failure_threshold:
                self.state = 'open'
            raise

Output Template

markdown
## Resilience Analysis: [Framework Name]

### Error Propagation Map

| Error Source | Error Type | Handling | Propagates? |
|--------------|-----------|----------|-------------|
| LLM API | RateLimitError | Retry 3x with backoff | No |
| LLM API | APIError | Retry 1x | Yes |
| Parser | ParseError | Feed back to LLM | No |
| Tool | Exception | Wrap and feed to LLM | No |
| Tool | Timeout | Kill process | No |
| State | ValidationError | Propagate | Yes |

### Sandboxing Assessment
- **Code Execution**: [Mechanism or None]
- **File System**: [Isolated/Restricted/Open]
- **Network**: [Blocked/Filtered/Open]
- **Resource Limits**: [Memory/CPU/Time limits]

### Recovery Mechanisms

| Pattern | Implementation | Location |
|---------|---------------|----------|
| Retry | Exponential backoff, 3 attempts | llm.py:L45 |
| Fallback | Secondary model | agent.py:L120 |
| Circuit Breaker | None | - |

### Risk Assessment
- **Critical Gaps**: [List any missing protections]
- **Production Ready**: [Yes/No/Needs work]

Integration

  • Prerequisite: codebase-mapping to identify execution code
  • Feeds into: antipattern-catalog for error handling issues
  • Related: execution-engine-analysis for async error handling

Expand your agent's capabilities with these related and highly-rated skills.

aiskillstore/marketplace

perigon-backend

Perigon ASP.NET Core + EF Core + Aspire conventions

232 15
Explore
aiskillstore/marketplace

perigon-agent

Pointers for Copilot/agents to apply Perigon conventions

232 15
Explore
aiskillstore/marketplace

perigon-angular

Angular 21+ standalone/Material/signal conventions for Perigon WebApp

232 15
Explore
aiskillstore/marketplace

fastapi-mastery

Comprehensive FastAPI development skill covering REST API creation, routing, request/response handling, validation, authentication, database integration, middleware, and deployment. Use when working with FastAPI projects, building APIs, implementing CRUD operations, setting up authentication/authorization, integrating databases (SQL/NoSQL), adding middleware, handling WebSockets, or deploying FastAPI applications. Triggered by requests involving .py files with FastAPI code, API endpoint creation, Pydantic models, or FastAPI-specific features.

232 15
Explore
aiskillstore/marketplace

context7-efficient

Token-efficient library documentation fetcher using Context7 MCP with 86.8% token savings through intelligent shell pipeline filtering. Fetches code examples, API references, and best practices for JavaScript, Python, Go, Rust, and other libraries. Use when users ask about library documentation, need code examples, want API usage patterns, are learning a new framework, need syntax reference, or troubleshooting with library-specific information. Triggers include questions like "Show me React hooks", "How do I use Prisma", "What's the Next.js routing syntax", or any request for library/framework documentation.

232 15
Explore
aiskillstore/marketplace

browser-use

Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.

232 15
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results