Agent skill

debug-fetcher

Automated URL fetch failure handling with strategy exhaustion, memory learning, and human-in-the-loop recovery. Use when fetches fail and you need intelligent retry, pattern learning, and human collaboration.

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/debug-fetcher

Metadata

Additional technical details for this skill

short description: Failure-to-recovery automation for URL fetching

SKILL.md

Debug-Fetcher Skill

Automated fetch failure handling that:

Queries /memory first - applies learned strategies before trying defaults
Exhausts all strategies - direct, playwright, wayback, brave, jina, proxy, UA rotation
Stores successes - saves working strategies to /memory for future runs
Collaborates with humans - uses /interview when all automated strategies fail

Quick Start

bash

# Fetch single URL with failure handling
./run.sh fetch https://example.com

# Fetch batch with failure handling
./run.sh fetch-batch urls.txt

# Check what was learned about a domain
./run.sh recall example.com

# Export all learned strategies
./run.sh export-learnings

How It Works

URL Request
    │
    ▼
┌──────────────────────────┐
│  1. Query /memory        │
│  "What works for this    │
│   domain?"               │
└──────────────────────────┘
    │
    ▼
┌──────────────────────────┐
│  2. Try learned strategy │
│     (if exists)          │
└──────────────────────────┘
    │
    ▼ (fail or no learned strategy)
┌──────────────────────────┐
│  3. Exhaust strategies:  │
│  - direct fetch          │
│  - playwright            │
│  - wayback machine       │
│  - brave alternates      │
│  - jina reader           │
│  - proxy rotation        │
│  - user-agent rotation   │
└──────────────────────────┘
    │
    ▼ (all fail)
┌──────────────────────────┐
│  4. Launch /interview    │
│  Ask human for help:     │
│  - Credentials?          │
│  - Mirror URL?           │
│  - Manual download?      │
│  - Skip this URL?        │
└──────────────────────────┘
    │
    ▼
┌──────────────────────────┐
│  5. Store to /memory     │
│  - Successful strategy   │
│  - Domain patterns       │
│  - Human-provided info   │
└──────────────────────────┘

Memory Schema

Each learned strategy stores:

Field	Description
`domain`	Target domain (e.g., "nytimes.com")
`path_pattern`	URL path pattern (e.g., "/article/*")
`successful_strategy`	What worked (e.g., "playwright")
`headers`	Custom headers that helped
`timing_ms`	How long the fetch took
`success_rate`	Historical success rate
`failure_count`	How many times this domain failed
`last_used`	Timestamp of last use
`discovered_at`	When strategy was first learned

Commands

Command	Description
`fetch <url>`	Fetch single URL with failure handling
`fetch-batch <manifest>`	Fetch list of URLs with failure handling
`recall <domain>`	Show learned strategies for domain
`export-learnings`	Export all strategies to JSON

Environment Variables

Variable	Description
`DEBUG_FETCHER_MEMORY_SCOPE`	Memory scope for storing strategies (default: "fetcher_strategies")
`DEBUG_FETCHER_MAX_RETRIES`	Max retries per strategy (default: 2)
`DEBUG_FETCHER_INTERVIEW_THRESHOLD`	Min failures before triggering interview (default: 3)

Integration with Fetcher

Debug-fetcher wraps the standard fetcher skill and adds failure handling capabilities. All fetcher environment variables (BRAVE_API_KEY, FETCHER_EMIT_MARKDOWN, etc.) are respected.

Examples

Learning from Failures

After fetching a batch of URLs, debug-fetcher stores successful strategies:

bash

# Fetch a batch
./run.sh fetch-batch urls.txt --output results.jsonl

# View what was learned
./run.sh recall attack.mitre.org
# Output:
# Domain: attack.mitre.org
# Strategy: playwright
# Success rate: 95%
# Last used: 2025-01-30

# Next time, playwright will be tried first for attack.mitre.org
./run.sh fetch https://attack.mitre.org/techniques/T1059

Human-in-the-Loop Interview

When all strategies fail, an interview is generated:

bash

# Fetch batch with failures
./run.sh fetch-batch difficult_urls.txt

# Interview generated at: /tmp/interview_abc123.json
# Run: ./agents/skills/interview/run.sh /tmp/interview_abc123.json

# Example interview questions:
# - "Failed 5 URLs from nytimes.com. Do you have credentials?"
# - "archive.org not working. Try a mirror URL?"

YouTube URL Handling

YouTube URLs are automatically detected and handled via the /ingest-youtube skill:

bash

# YouTube URLs use transcript extraction
./run.sh fetch https://www.youtube.com/watch?v=abc123
# Uses: /ingest-youtube skill for transcript extraction
# Falls back to other strategies if transcript unavailable

Batch Analysis

After a batch run, analyze patterns:

python

from debug_fetcher.batch_analyzer import analyze_batch, get_failure_summary

# Get summary
summary = get_failure_summary(results)
# {
#   "total": 1000,
#   "success": 850,
#   "failed": 150,
#   "success_rate": "85.0%",
#   "top_failing_domains": [
#     {"domain": "nytimes.com", "count": 45},
#     {"domain": "wsj.com", "count": 30}
#   ],
#   "patterns": [
#     "All 45 URLs from nytimes.com returned HTTP 403",
#     "High failure rate: 50% of failures are paywalled sites"
#   ]
# }

Recovery Actions

When human provides help via interview:

Action Type	Description	Example
`credentials`	Login credentials provided	username/password for site
`mirror`	Alternative URL to try	archive.org mirror
`manual_file`	Human downloaded file manually	Path to local PDF
`skip`	URL not needed	"Not critical"
`retry`	Try again later	Server was down
`custom_strategy`	Specific approach suggested	"Use proxy"

Files

.agents/skills/debug-fetcher/
├── SKILL.md           # This file
├── run.sh             # Entry point
├── pyproject.toml     # Dependencies
└── debug_fetcher/     # Python package
    ├── __init__.py
    ├── cli.py                 # CLI commands
    ├── memory_schema.py       # FetchStrategy dataclass
    ├── memory_bridge.py       # Recall/learn from /memory
    ├── strategy_engine.py     # Strategy exhaustion loop
    ├── batch_analyzer.py      # Analyze batch failures
    ├── interview_generator.py # Generate /interview JSON
    ├── interview_processor.py # Process interview responses
    ├── recovery_executor.py   # Execute recovery actions
    └── pdf_bridge.py          # Cross-skill integration with debug-pdf

Companion Skill: debug-pdf

debug-fetcher and debug-pdf work together in the pipeline:

URL → debug-fetcher → /fetcher → /extractor → debug-pdf
         ↓                           ↓
      fetch fail               extraction fail
         ↓                           ↓
    retry/recover            analyze PDF issues
         ↓                           ↓
      /memory                     /memory

Shared failure patterns:

Pattern	debug-fetcher	debug-pdf
`auth_required`	HTTP 401/403	N/A
`access_restricted`	HTTP 403	N/A
`paywall_detected`	Soft paywall	N/A
`password_protected`	N/A	Encrypted PDF
`scanned_no_ocr`	N/A	No text layer
`archive_org_wrap`	Wayback wrapper	Wayback wrapper

Cross-skill notifications:

When debug-fetcher successfully fetches a PDF but detects issues (password protected, scanned), it notifies debug-pdf via agent-inbox
When debug-fetcher fails to fetch a PDF URL, it notifies debug-pdf for tracking

Related Skills

/memory - Stores learned fetch strategies
/interview - Human collaboration for unrecoverable URLs
/ingest-youtube - YouTube transcript extraction
/fetcher - Core URL fetching functionality
/extractor - Content extraction from fetched documents
/debug-pdf - Companion skill for PDF extraction failures

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/debug-fetcher
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

Debug-Fetcher Skill

Quick Start

How It Works

Memory Schema

Commands

Environment Variables

Integration with Fetcher

Examples

Learning from Failures

Human-in-the-Loop Interview

YouTube URL Handling

Batch Analysis

Recovery Actions

Files

Companion Skill: debug-pdf

Related Skills

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state