Agent skills
web-content-extraction-navigat...

Agent skill

web-content-extraction-navigator

When the user needs to extract specific content from websites that require interactive navigation, particularly when dealing with dynamic content, truncated text, or pages that require clicking buttons to reveal full information. This skill handles browser automation to navigate web pages, interact with page elements (like expanding truncated descriptions), extract structured content from complex page layouts, and overcome common web scraping challenges like bot detection or IP blocking. It's triggered when dealing with websites that require clicking 'show more', 'expand', or similar interactive elements to access complete information.

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/web-content-extraction-navigator

SKILL.md

Skill: Web Content Extraction Navigator

Purpose

Extract structured content from dynamic web pages that require user interaction (clicking buttons, expanding sections, navigating through page elements) to reveal complete information.

Primary Use Case

When the target content is hidden behind:

"Show more", "Expand", "...more" buttons
Truncated descriptions or lists
Dynamic page loads requiring interaction
Pages with bot detection mechanisms

Core Strategy

Browser Automation First: Use Playwright for reliable interaction with dynamic content.
Fallback Layers: When direct APIs fail (e.g., transcript APIs blocked), navigate manually.
Progressive Discovery: Navigate through page spans/sections to locate target content.
Structured Extraction: Parse and format extracted content according to user requirements.

Execution Workflow

Phase 1: Initial Setup & Exploration

Read Format Requirements: Check for any format specifications in workspace files.
Search for Target: Use web search to locate the target URL/content.
Attempt Direct Methods: Try direct API access first (e.g., YouTube transcript API).

Phase 2: Browser Navigation & Interaction

Navigate to Target URL: Use Playwright to load the page.
Handle Bot Detection: Recognize and work around sign-in prompts or bot checks.
Expand Hidden Content:
- Look for truncation indicators ("...more", "Show more", "Expand description")
- Click interactive elements to reveal full content
- Use snapshot navigation to explore different page sections

Phase 3: Content Extraction & Processing

Locate Target Content: Identify the specific content area (e.g., video description, tracklist).
Extract Structured Data: Parse the content into clean, structured format.
Format According to Requirements: Apply any specified formatting templates.
Write Output File: Save extracted content to the requested location.

Key Techniques

Snapshot Navigation: Use browser_snapshot_navigate_to_next_span to explore large pages
Element Targeting: Click specific elements by reference ID or role/text
Content Parsing: Extract clean text from HTML structures
Error Recovery: Handle API blocks, bot detection, and loading issues

Common Patterns

YouTube video descriptions with tracklists
Articles with "Read more" buttons
Comment sections requiring expansion
Paginated content
Modal popups with additional information

Output Requirements

Always verify the extracted content matches the request
Follow any specified format templates exactly
Include source attribution when relevant
Handle edge cases (missing content, format variations)

Failure Recovery

If direct API fails → Use browser automation
If page blocks access → Try alternative search results
If content not found → Search for similar pages
If format unclear → Use sensible defaults and document assumptions

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/web-content-extraction-navigator
License: MIT License

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Didn't find tool you were looking for?