Agent skill

agent-browser

Automates browser interactions via CLI using agent-browser by Vercel Labs. Covers navigation, clicking, form filling, snapshots, refs-based selectors, agent mode with JSON output, session management, and CDP integration. Use when the user needs to automate web browsing, scrape pages, fill forms, or integrate browser automation into AI agent workflows.

View SKILL.md on GitHub Repository

Stars 254

Forks 41

Install this agent skill to your Project

npx add-skill https://github.com/partme-ai/full-stack-skills/tree/main/skills/dev-utils-skills/agent-browser

SKILL.md

When to use this skill

Use this skill whenever the user wants to:

Automate browser interactions (click, fill, navigate, screenshot) via CLI
Scrape web content or extract data from pages
Build AI agent workflows that interact with websites
Use refs-based element selection for deterministic automation
Run browser automation in agent mode with JSON output
Manage authenticated sessions with custom headers or CDP

How to use this skill

This skill is organized to match the agent-browser official documentation structure (https://github.com/vercel-labs/agent-browser/blob/main/README.md). When working with agent-browser:

Quick-Start Example: Snapshot → Identify → Interact

bash

# 1. Install
npm install -g @anthropic-ai/agent-browser

# 2. Open a page and take a snapshot to get element refs
agent-browser open "https://example.com"
agent-browser snapshot
# Output includes refs like @e1, @e2, @e3 for each element

# 3. Click an element by ref
agent-browser click @e3

# 4. Fill a form field
agent-browser fill @e5 "hello@example.com"

# 5. Agent mode (JSON output for programmatic use)
agent-browser snapshot --json

Detailed Documentation

Install agent-browser:
- Load examples/getting-started/installation.md for installation instructions
Quick Start:
- Load examples/quick-start/quick-start.md for basic workflow examples
Learn core commands:
- Load examples/commands/basic-commands.md for basic commands (open, click, fill, etc.)
- Load examples/commands/advanced-commands.md for advanced commands (snapshot, eval, etc.)
- Load examples/commands/get-info/ for information retrieval commands
- Load examples/commands/check-state/ for state checking commands
- Load examples/commands/find-elements/ for semantic locator commands
- Load examples/commands/wait/ for wait commands
- Load examples/commands/mouse-control/ for mouse control commands
- Load examples/commands/browser-settings/ for browser configuration
- Load examples/commands/cookies-storage/ for cookies and storage management
- Load examples/commands/network/ for network interception
- Load examples/commands/tabs-windows/ for tab and window management
- Load examples/commands/frames/ for iframe handling
- Load examples/commands/dialogs/ for dialog handling
- Load examples/commands/debug/ for debugging commands
- Load examples/commands/navigation/ for navigation commands
- Load examples/commands/setup/ for setup commands
Understand selectors:
- Load examples/selectors/refs.md for refs-based selection (@e1, @e2, etc.)
- Load examples/selectors/traditional-selectors.md for CSS, XPath, and semantic locators
Use agent mode:
- Load examples/agent-mode/introduction.md for agent mode overview
- Load examples/agent-mode/optimal-workflow.md for optimal AI workflow
- Load examples/agent-mode/integration.md for integrating with AI agents
Advanced features:
- Load examples/advanced/sessions.md for session management
- Load examples/advanced/headed-mode.md for debugging with visible browser
- Load examples/advanced/authenticated-sessions.md for authentication via headers
- Load examples/advanced/custom-executable.md for custom browser executable
- Load examples/advanced/cdp-mode.md for Chrome DevTools Protocol integration
- Load examples/advanced/streaming.md for browser viewport streaming
- Load examples/advanced/architecture.md for architecture overview
- Load examples/advanced/platforms.md for platform support
- Load examples/advanced/usage-with-agents.md for AI agent integration patterns
Configure options:
- Load examples/options/global-options.md for global CLI options
- Load examples/options/snapshot-options.md for snapshot-specific options
- Load examples/options/session-options.md for session management options
Reference API documentation when needed:
- api/commands.md - Complete command reference
- api/selectors.md - Selector reference
- api/options.md - Options reference
Use templates for quick start:
- templates/basic-automation.md - Basic automation workflow
- templates/ai-agent-workflow.md - AI agent workflow template

Doc mapping (one-to-one with official documentation)

See examples and API files → https://github.com/vercel-labs/agent-browser

Examples and Templates

This skill includes detailed examples organized to match the official documentation structure. All examples are in the examples/ directory (see mapping above).

To use examples:

Identify the topic from the user's request
Load the appropriate example file from the mapping above
Follow the instructions, syntax, and best practices in that file
Adapt the code examples to your specific use case

To use templates:

Reference templates in templates/ directory for common scaffolding
Adapt templates to your specific needs and coding style

API Reference

Commands API: api/commands.md - Complete command reference with syntax and examples
Selectors API: api/selectors.md - Selector types and usage reference
Options API: api/options.md - All options reference

Best Practices

Use Refs: Prefer refs (@e1, @e2) over traditional selectors for deterministic automation
Snapshot First: Always snapshot before interacting with elements to get refs
Agent Mode: Use --json flag for machine-readable output in agent mode
Session Management: Use --session to maintain state across commands
Interactive Snapshot: Use -i flag for interactive snapshot selection
Semantic Locators: Use semantic locators (role/name) when refs are not available
Error Handling: Check command exit codes and error messages
Wait for Navigation: Commands automatically wait for navigation to complete
Headed Mode: Use --headed for debugging, headless for production
CDP Integration: Use --cdp for Chrome DevTools Protocol integration
Streaming: Use AGENT_BROWSER_STREAM_PORT for live browser preview
Authenticated Sessions: Use --headers for authentication without login flows
Custom Executable: Use --executable-path for serverless deployments or custom browsers
Snapshot Options: Combine -i, -c, -d, -s options to optimize snapshot output

Resources

GitHub Repository: https://github.com/vercel-labs/agent-browser
Official README: https://github.com/vercel-labs/agent-browser/blob/main/README.md
Agent Mode Documentation: https://agent-browser.dev/agent-mode
Issues: https://github.com/vercel-labs/agent-browser/issues

Keywords

agent-browser, CLI browser automation, AI agents, browser automation CLI, refs, snapshot, agent mode, semantic locators, browser automation tool, command-line browser, AI agent browser, deterministic selectors, accessibility tree, browser commands, web automation CLI, sessions, headed mode, authenticated sessions, CDP mode, streaming, Chrome DevTools Protocol, Playwright, browser automation for AI

Maintainer

partme-ai Core maintainer

Source details

Full Name: partme-ai/full-stack-skills
Branch: main
Path in repo: skills/dev-utils-skills/agent-browser
License: Other
Topics: claude-code agent-skills cursor skills codebuddy qoder

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

partme-ai/full-stack-skills

ocrmypdf-batch

OCRmyPDF batch processing skill — process multiple PDFs, Docker automation, shell scripting, and CI/CD integration. Use when the user needs to OCR many PDFs, set up automated OCR pipelines, or integrate OCR into workflows.

254 41

Explore

partme-ai/full-stack-skills

ocrmypdf-optimize

OCRmyPDF optimization skill — compress PDFs, configure PDF/A output, JBIG2 encoding, and lossless optimization. Use when the user needs to reduce PDF file size, create archival PDF/A files, or optimize OCR output.

254 41

Explore

partme-ai/full-stack-skills

ocrmypdf-image

OCRmyPDF image processing skill — deskew, rotate, clean, despeckle, remove border from scanned documents. Use when the user needs to improve scanned PDF quality, fix skewed pages, remove noise, or clean up scanned documents before OCR.

254 41

Explore

partme-ai/full-stack-skills

ocrmypdf-api

OCRmyPDF Python API and plugin skill — use OCRmyPDF programmatically from Python, integrate with applications, and extend with plugins (EasyOCR, PaddleOCR, AppleOCR). Use when the user needs to call OCRmyPDF from Python code, build OCR pipelines, or use alternative OCR engines.

254 41

Explore

partme-ai/full-stack-skills

ocrmypdf

OCRmyPDF core skill — add searchable OCR text layer to scanned PDFs, convert images to searchable PDFs, support 100+ languages via Tesseract. Use when the user needs to OCR a PDF, make a scanned PDF searchable, or extract text from scanned documents.

254 41

Explore

partme-ai/full-stack-skills

svelte

Guides Svelte and SvelteKit development including reactive components, stores, transitions, lifecycle hooks, SSR, file-based routing, and deployment. Use when the user needs to build Svelte components, create SvelteKit applications, implement reactivity patterns, or configure Svelte with Vite.

254 41

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

When to use this skill

How to use this skill

Quick-Start Example: Snapshot → Identify → Interact

Detailed Documentation

Doc mapping (one-to-one with official documentation)

Examples and Templates

API Reference

Best Practices

Resources

Keywords

Recommended Agent Skills

ocrmypdf-batch

ocrmypdf-optimize

ocrmypdf-image

ocrmypdf-api

ocrmypdf

svelte