Agent skill

agent-browser

Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection

Stars 18,484
Forks 2,621

Install this agent skill to your Project

npx add-skill https://github.com/eosphoros-ai/DB-GPT/tree/main/skills/agent-browser

Metadata

Additional technical details for this skill

clawdbot
{
    "emoji": "\ud83c\udf10",
    "homepage": "https://github.com/vercel-labs/agent-browser",
    "requires": {
        "commands": [
            "agent-browser"
        ]
    }
}

SKILL.md

Agent Browser Skill

Fast browser automation using accessibility tree snapshots with refs for deterministic element selection.

Why Use This Over Built-in Browser Tool

Use agent-browser when:

  • Automating multi-step workflows
  • Need deterministic element selection
  • Performance is critical
  • Working with complex SPAs
  • Need session isolation

Use built-in browser tool when:

  • Need screenshots/PDFs for analysis
  • Visual inspection required
  • Browser extension integration needed

Core Workflow

bash
# 1. Navigate and snapshot
agent-browser open https://example.com
agent-browser snapshot -i --json

# 2. Parse refs from JSON, then interact
agent-browser click @e2
agent-browser fill @e3 "text"

# 3. Re-snapshot after page changes
agent-browser snapshot -i --json

Key Commands

Navigation

bash
agent-browser open <url>
agent-browser back | forward | reload | close

Snapshot (Always use -i --json)

bash
agent-browser snapshot -i --json          # Interactive elements, JSON output
agent-browser snapshot -i -c -d 5 --json  # + compact, depth limit
agent-browser snapshot -s "#main" -i      # Scope to selector

Interactions (Ref-based)

bash
agent-browser click @e2
agent-browser fill @e3 "text"
agent-browser type @e3 "text"
agent-browser hover @e4
agent-browser check @e5 | uncheck @e5
agent-browser select @e6 "value"
agent-browser press "Enter"
agent-browser scroll down 500
agent-browser drag @e7 @e8

Get Information

bash
agent-browser get text @e1 --json
agent-browser get html @e2 --json
agent-browser get value @e3 --json
agent-browser get attr @e4 "href" --json
agent-browser get title --json
agent-browser get url --json
agent-browser get count ".item" --json

Check State

bash
agent-browser is visible @e2 --json
agent-browser is enabled @e3 --json
agent-browser is checked @e4 --json

Wait

bash
agent-browser wait @e2                    # Wait for element
agent-browser wait 1000                   # Wait ms
agent-browser wait --text "Welcome"       # Wait for text
agent-browser wait --url "**/dashboard"   # Wait for URL
agent-browser wait --load networkidle     # Wait for network
agent-browser wait --fn "window.ready === true"

Sessions (Isolated Browsers)

bash
agent-browser --session admin open site.com
agent-browser --session user open site.com
agent-browser session list
# Or via env: AGENT_BROWSER_SESSION=admin agent-browser ...

State Persistence

bash
agent-browser state save auth.json        # Save cookies/storage
agent-browser state load auth.json        # Load (skip login)

Screenshots & PDFs

bash
agent-browser screenshot page.png
agent-browser screenshot --full page.png
agent-browser pdf page.pdf

Network Control

bash
agent-browser network route "**/ads/*" --abort           # Block
agent-browser network route "**/api/*" --body '{"x":1}'  # Mock
agent-browser network requests --filter api              # View

Cookies & Storage

bash
agent-browser cookies                     # Get all
agent-browser cookies set name value
agent-browser storage local key           # Get localStorage
agent-browser storage local set key val

Tabs & Frames

bash
agent-browser tab new https://example.com
agent-browser tab 2                       # Switch to tab
agent-browser frame @e5                   # Switch to iframe
agent-browser frame main                  # Back to main

Snapshot Output Format

json
{
  "success": true,
  "data": {
    "snapshot": "...",
    "refs": {
      "e1": {"role": "heading", "name": "Example Domain"},
      "e2": {"role": "button", "name": "Submit"},
      "e3": {"role": "textbox", "name": "Email"}
    }
  }
}

Best Practices

  1. Always use -i flag - Focus on interactive elements
  2. Always use --json - Easier to parse
  3. Wait for stability - agent-browser wait --load networkidle
  4. Save auth state - Skip login flows with state save/load
  5. Use sessions - Isolate different browser contexts
  6. Use --headed for debugging - See what's happening

Example: Search and Extract

bash
agent-browser open https://www.google.com
agent-browser snapshot -i --json
# AI identifies search box @e1
agent-browser fill @e1 "AI agents"
agent-browser press Enter
agent-browser wait --load networkidle
agent-browser snapshot -i --json
# AI identifies result refs
agent-browser get text @e3 --json
agent-browser get attr @e4 "href" --json

Example: Multi-Session Testing

bash
# Admin session
agent-browser --session admin open app.com
agent-browser --session admin state load admin-auth.json
agent-browser --session admin snapshot -i --json

# User session (simultaneous)
agent-browser --session user open app.com
agent-browser --session user state load user-auth.json
agent-browser --session user snapshot -i --json

Installation

bash
npm install -g agent-browser
agent-browser install                     # Download Chromium
agent-browser install --with-deps         # Linux: + system deps

Credits

Skill created by Yossi Elkrief (@MaTriXy)

agent-browser CLI by Vercel Labs

Expand your agent's capabilities with these related and highly-rated skills.

eosphoros-ai/DB-GPT

financial-report-analyzer

专门用于上市公司财报(如年度报告、季度报告)的深度分析。该技能能够自动提取关键财务指标,计算核心财务比率,生成可视化图表,并结合行业背景生成专业的财务分析报告。

18,484 2,621
Explore
eosphoros-ai/DB-GPT

walmart-sales-analyzer

Analyze Walmart sales data to explore trends between store sales and unemployment rates. Generate insightful visualizations and a beautiful HTML report with deep analysis. Suitable for quick insights into the relationship between sales data and macroeconomic factors.

18,484 2,621
Explore
eosphoros-ai/DB-GPT

csv-data-analysis

This skill should be used when users need to analyze CSV or Excel files, understand data patterns, generate statistical summaries, or create data visualizations. Trigger keywords include "analyze CSV", "analyze Excel", "data analysis", "CSV analysis", "Excel analysis", "data statistics", "generate charts", "data visualization", "分析CSV", "分析Excel", "数据分析", "CSV分析", "Excel分析", "数据统计", "生成图表", "数据可视化".

18,484 2,621
Explore
eosphoros-ai/DB-GPT

skill-creator

Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.

18,484 2,621
Explore
petekp/claude-code-setup

ubiquitous-language

Extract a DDD-style ubiquitous language glossary from the current conversation, flagging ambiguities and proposing canonical terms. Saves to UBIQUITOUS_LANGUAGE.md. Use when user wants to define domain terms, build a glossary, harden terminology, create a ubiquitous language, or mentions "domain model" or "DDD".

20 6
Explore
petekp/claude-code-setup

every-style-editor

This skill should be used when reviewing or editing copy to ensure adherence to Every's style guide. It provides a systematic line-by-line review process for grammar, punctuation, mechanics, and style guide compliance.

20 6
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results