Agent skill

error-recovery-patterns

Stars 4,286
Forks 350

Install this agent skill to your Project

npx add-skill https://github.com/github/gh-aw/tree/main/skills/error-recovery-patterns

SKILL.md

Error Recovery Patterns Skill

This skill provides comprehensive guidance on error handling patterns, recovery strategies, and debugging techniques in GitHub Agentic Workflows (gh-aw).

Purpose

Guide developers in implementing robust error recovery patterns to:

  • Reduce retry loops in agent sessions (target: <10% vs current 23%)
  • Implement circuit breakers to prevent infinite retry loops
  • Add proactive recovery for installation, dependency, and API failures
  • Improve debug logging for recovery attempts

When to Use This Skill

Invoke this skill when:

  • Implementing retry logic for network operations, installations, or API calls
  • Debugging retry loop issues in workflows or agent sessions
  • Adding error recovery patterns to new or existing code
  • Understanding transient vs non-transient error classification
  • Implementing circuit breakers or exponential backoff
  • Adding debug logging for recovery attempts

Key Concepts Covered

1. Circuit Breaker Pattern

  • Maximum retry limits (standard: 3 attempts)
  • Exponential backoff strategies
  • Fail-fast on non-transient errors
  • Implementation in JavaScript, Shell, and Go

2. Installation Failure Recovery

  • NPM installation with cache clearing and registry fallbacks
  • Python pip installation with mirror alternatives
  • Docker image pull with retry and rate limit handling
  • Copilot CLI installation with network retry

3. API Timeout and Rate Limit Handling

  • GitHub API rate limit detection and backoff
  • Transient error detection patterns
  • Custom retry configuration for different APIs
  • Rate limit-specific retry strategies

4. Debug Logging for Recovery

  • Logger package usage for retry attempts
  • Category naming conventions (pkg:filename)
  • DEBUG environment variable patterns
  • Zero-overhead logging when disabled

5. Error Categorization

  • Transient vs non-transient errors
  • Network errors, timeout patterns
  • HTTP error codes (502, 503, 504)
  • GitHub-specific errors (rate limits, abuse detection)

Anti-Patterns to Avoid

This skill explicitly covers anti-patterns to avoid:

  • ❌ Infinite retry loops without maximum limits
  • ❌ Retrying validation errors that won't self-correct
  • ❌ No backoff delay between attempts
  • ❌ Silent retries without logging
  • ❌ Retrying non-transient errors

Code Examples Provided

The skill includes production-ready examples for:

  • JavaScript retry with withRetry() function
  • Shell script retry loops with exponential backoff
  • Go retry patterns with context and timeouts
  • NPM/pip/docker installation recovery
  • GitHub API rate limit handling
  • Debug logging for all recovery attempts

Related Skills

  • error-messages - Error message formatting and style guide
  • error-pattern-safety - Safety guidelines for error pattern regex
  • developer - General development guidelines and conventions

Full Documentation

Complete documentation available at: ../../scratchpad/error-recovery-patterns.md

This skill references the comprehensive error recovery patterns document which includes:

  • Console formatting requirements
  • Error wrapping patterns
  • Common error scenarios with step-by-step resolution
  • Error message templates
  • Debugging runbook
  • Error categorization decision trees
  • Metrics and monitoring strategies

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results