Agent skill

agent-browser

Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection

View SKILL.md on GitHub Repository

Stars 18,484

Forks 2,621

Install this agent skill to your Project

npx add-skill https://github.com/eosphoros-ai/DB-GPT/tree/main/skills/agent-browser

Metadata

Additional technical details for this skill

clawdbot: { "emoji": "\ud83c\udf10", "homepage": "https://github.com/vercel-labs/agent-browser", "requires": { "commands": [ "agent-browser" ] } }

SKILL.md

Agent Browser Skill

Fast browser automation using accessibility tree snapshots with refs for deterministic element selection.

Why Use This Over Built-in Browser Tool

Use agent-browser when:

Automating multi-step workflows
Need deterministic element selection
Performance is critical
Working with complex SPAs
Need session isolation

Use built-in browser tool when:

Need screenshots/PDFs for analysis
Visual inspection required
Browser extension integration needed

Core Workflow

bash

# 1. Navigate and snapshot
agent-browser open https://example.com
agent-browser snapshot -i --json

# 2. Parse refs from JSON, then interact
agent-browser click @e2
agent-browser fill @e3 "text"

# 3. Re-snapshot after page changes
agent-browser snapshot -i --json

Key Commands

Navigation

bash

agent-browser open <url>
agent-browser back | forward | reload | close

Snapshot (Always use -i --json)

bash

agent-browser snapshot -i --json          # Interactive elements, JSON output
agent-browser snapshot -i -c -d 5 --json  # + compact, depth limit
agent-browser snapshot -s "#main" -i      # Scope to selector

Interactions (Ref-based)

bash

agent-browser click @e2
agent-browser fill @e3 "text"
agent-browser type @e3 "text"
agent-browser hover @e4
agent-browser check @e5 | uncheck @e5
agent-browser select @e6 "value"
agent-browser press "Enter"
agent-browser scroll down 500
agent-browser drag @e7 @e8

Get Information

bash

agent-browser get text @e1 --json
agent-browser get html @e2 --json
agent-browser get value @e3 --json
agent-browser get attr @e4 "href" --json
agent-browser get title --json
agent-browser get url --json
agent-browser get count ".item" --json

Check State

bash

agent-browser is visible @e2 --json
agent-browser is enabled @e3 --json
agent-browser is checked @e4 --json

Wait

bash

agent-browser wait @e2                    # Wait for element
agent-browser wait 1000                   # Wait ms
agent-browser wait --text "Welcome"       # Wait for text
agent-browser wait --url "**/dashboard"   # Wait for URL
agent-browser wait --load networkidle     # Wait for network
agent-browser wait --fn "window.ready === true"

Sessions (Isolated Browsers)

bash

agent-browser --session admin open site.com
agent-browser --session user open site.com
agent-browser session list
# Or via env: AGENT_BROWSER_SESSION=admin agent-browser ...

State Persistence

bash

agent-browser state save auth.json        # Save cookies/storage
agent-browser state load auth.json        # Load (skip login)

Screenshots & PDFs

bash

agent-browser screenshot page.png
agent-browser screenshot --full page.png
agent-browser pdf page.pdf

Network Control

bash

agent-browser network route "**/ads/*" --abort           # Block
agent-browser network route "**/api/*" --body '{"x":1}'  # Mock
agent-browser network requests --filter api              # View

Cookies & Storage

bash

agent-browser cookies                     # Get all
agent-browser cookies set name value
agent-browser storage local key           # Get localStorage
agent-browser storage local set key val

Tabs & Frames

bash

agent-browser tab new https://example.com
agent-browser tab 2                       # Switch to tab
agent-browser frame @e5                   # Switch to iframe
agent-browser frame main                  # Back to main

Snapshot Output Format

json

{
  "success": true,
  "data": {
    "snapshot": "...",
    "refs": {
      "e1": {"role": "heading", "name": "Example Domain"},
      "e2": {"role": "button", "name": "Submit"},
      "e3": {"role": "textbox", "name": "Email"}
    }
  }
}

Best Practices

Always use -i flag - Focus on interactive elements
Always use --json - Easier to parse
Wait for stability - agent-browser wait --load networkidle
Save auth state - Skip login flows with state save/load
Use sessions - Isolate different browser contexts
Use --headed for debugging - See what's happening

Example: Search and Extract

bash

agent-browser open https://www.google.com
agent-browser snapshot -i --json
# AI identifies search box @e1
agent-browser fill @e1 "AI agents"
agent-browser press Enter
agent-browser wait --load networkidle
agent-browser snapshot -i --json
# AI identifies result refs
agent-browser get text @e3 --json
agent-browser get attr @e4 "href" --json

Example: Multi-Session Testing

bash

# Admin session
agent-browser --session admin open app.com
agent-browser --session admin state load admin-auth.json
agent-browser --session admin snapshot -i --json

# User session (simultaneous)
agent-browser --session user open app.com
agent-browser --session user state load user-auth.json
agent-browser --session user snapshot -i --json

Installation

bash

npm install -g agent-browser
agent-browser install                     # Download Chromium
agent-browser install --with-deps         # Linux: + system deps

Credits

Skill created by Yossi Elkrief (@MaTriXy)

agent-browser CLI by Vercel Labs

Maintainer

eosphoros-ai Core maintainer

Source details

Full Name: eosphoros-ai/DB-GPT
Branch: main
Path in repo: skills/agent-browser
License: MIT License
Topics: agents llm gpt database hacktoberfest security gpt-4 rag deepseek bgi private vicuna

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

eosphoros-ai/DB-GPT

financial-report-analyzer

专门用于上市公司财报（如年度报告、季度报告）的深度分析。该技能能够自动提取关键财务指标，计算核心财务比率，生成可视化图表，并结合行业背景生成专业的财务分析报告。

18,484 2,621

Explore

eosphoros-ai/DB-GPT

walmart-sales-analyzer

Analyze Walmart sales data to explore trends between store sales and unemployment rates. Generate insightful visualizations and a beautiful HTML report with deep analysis. Suitable for quick insights into the relationship between sales data and macroeconomic factors.

18,484 2,621

Explore

eosphoros-ai/DB-GPT

csv-data-analysis

This skill should be used when users need to analyze CSV or Excel files, understand data patterns, generate statistical summaries, or create data visualizations. Trigger keywords include "analyze CSV", "analyze Excel", "data analysis", "CSV analysis", "Excel analysis", "data statistics", "generate charts", "data visualization", "分析CSV", "分析Excel", "数据分析", "CSV分析", "Excel分析", "数据统计", "生成图表", "数据可视化".

18,484 2,621

Explore

eosphoros-ai/DB-GPT

skill-creator

Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.

18,484 2,621

Explore

petekp/claude-code-setup

ubiquitous-language

Extract a DDD-style ubiquitous language glossary from the current conversation, flagging ambiguities and proposing canonical terms. Saves to UBIQUITOUS_LANGUAGE.md. Use when user wants to define domain terms, build a glossary, harden terminology, create a ubiquitous language, or mentions "domain model" or "DDD".

20 6

Explore

petekp/claude-code-setup

every-style-editor

This skill should be used when reviewing or editing copy to ensure adherence to Every's style guide. It provides a systematic line-by-line review process for grammar, punctuation, mechanics, and style guide compliance.

20 6

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

Agent Browser Skill

Why Use This Over Built-in Browser Tool

Core Workflow

Key Commands

Navigation

Snapshot (Always use -i --json)

Interactions (Ref-based)

Get Information

Check State

Wait

Sessions (Isolated Browsers)

State Persistence

Screenshots & PDFs

Network Control

Cookies & Storage

Tabs & Frames

Snapshot Output Format

Best Practices

Example: Search and Extract

Example: Multi-Session Testing

Installation

Credits

Recommended Agent Skills

financial-report-analyzer

walmart-sales-analyzer

csv-data-analysis

skill-creator

ubiquitous-language

every-style-editor