Agent skill
agent-browser
Automates browser interactions for web testing, form filling, screenshots, and data extraction.
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/testing/agent-browser-fohlio-ai-tools-fohlio-ai-tools
SKILL.md
Agent Browser Skill
Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
Quick start
bash
agent-browser open <url> # Navigate to page
agent-browser snapshot -i # Get interactive elements with refs
agent-browser click @e1 # Click element by ref
agent-browser fill @e2 "text" # Fill input by ref
agent-browser close # Close browser
Core workflow
- Navigate:
agent-browser open <url> - Snapshot:
agent-browser snapshot -i(returns elements with refs like @e1, @e2) - Interact using refs from the snapshot
- Re-snapshot after navigation or significant DOM changes
Commands
Navigation
agent-browser open <url># Navigate to URLagent-browser back# Go backagent-browser forward# Go forwardagent-browser reload# Reload pageagent-browser close# Close browser
Snapshot (page analysis)
agent-browser snapshot# Full accessibility treeagent-browser snapshot -i# Interactive elements only (recommended)agent-browser snapshot -c# Compact outputagent-browser snapshot -d 3# Limit depth to 3agent-browser snapshot -s "#main"# Scope to CSS selector
Interactions (use @refs from snapshot)
agent-browser click @e1# Clickagent-browser dblclick @e1# Double-clickagent-browser focus @e1# Focus elementagent-browser fill @e2 "text"# Clear and typeagent-browser type @e2 "text"# Type without clearingagent-browser press Enter# Press keyagent-browser press Control+a# Key combinationagent-browser keydown Shift# Hold key downagent-browser keyup Shift# Release keyagent-browser hover @e1# Hoveragent-browser check @e1# Check checkboxagent-browser uncheck @e1# Uncheck checkboxagent-browser select @e1 "value"# Select dropdownagent-browser scroll down 500# Scroll pageagent-browser scrollintoview @e1# Scroll element into viewagent-browser drag @e1 @e2# Drag and dropagent-browser upload @e1 file.pdf# Upload files
Get information
agent-browser get text @e1# Get element textagent-browser get html @e1# Get innerHTMLagent-browser get value @e1# Get input valueagent-browser get attr @e1 href# Get attributeagent-browser get title# Get page titleagent-browser get url# Get current URLagent-browser get count ".item"# Count matching elementsagent-browser get box @e1# Get bounding box
Check state
agent-browser is visible @e1# Check if visibleagent-browser is enabled @e1# Check if enabledagent-browser is checked @e1# Check if checked
Screenshots & PDF
agent-browser screenshot# Screenshot to stdoutagent-browser screenshot path.png# Save to fileagent-browser screenshot --full# Full pageagent-browser pdf output.pdf# Save as PDF
Video recording
agent-browser record start ./demo.webm# Start recording (uses current URL + state)agent-browser click @e1# Perform actionsagent-browser record stop# Stop and save videoagent-browser record restart ./take2.webm# Stop current + start new recording
Recording creates a fresh context but preserves cookies/storage from your session. If no URL is provided, it automatically returns to your current page. For smooth demos, explore first, then start recording.
Wait
agent-browser wait @e1# Wait for elementagent-browser wait 2000# Wait millisecondsagent-browser wait --text "Success"# Wait for textagent-browser wait --url "**/dashboard"# Wait for URL patternagent-browser wait --load networkidle# Wait for network idleagent-browser wait --fn "window.ready"# Wait for JS condition
Mouse control
agent-browser mouse move 100 200# Move mouseagent-browser mouse down left# Press buttonagent-browser mouse up left# Release buttonagent-browser mouse wheel 100# Scroll wheel
Semantic locators (alternative to refs)
agent-browser find role button click --name "Submit"agent-browser find text "Sign In" clickagent-browser find label "Email" fill "user@test.com"agent-browser find first ".item" clickagent-browser find nth 2 "a" text
Browser settings
agent-browser set viewport 1920 1080# Set viewport sizeagent-browser set device "iPhone 14"# Emulate deviceagent-browser set geo 37.7749 -122.4194# Set geolocationagent-browser set offline on# Toggle offline modeagent-browser set headers '{"X-Key":"v"}'# Extra HTTP headersagent-browser set credentials user pass# HTTP basic authagent-browser set media dark# Emulate color scheme
Cookies & Storage
agent-browser cookies# Get all cookiesagent-browser cookies set name value# Set cookieagent-browser cookies clear# Clear cookiesagent-browser storage local# Get all localStorageagent-browser storage local key# Get specific keyagent-browser storage local set k v# Set valueagent-browser storage local clear# Clear all
Network
agent-browser network route <url># Intercept requestsagent-browser network route <url> --abort# Block requestsagent-browser network route <url> --body '{}'# Mock responseagent-browser network unroute [url]# Remove routesagent-browser network requests# View tracked requestsagent-browser network requests --filter api# Filter requests
Tabs & Windows
agent-browser tab# List tabsagent-browser tab new [url]# New tabagent-browser tab 2# Switch to tabagent-browser tab close# Close tabagent-browser window new# New window
Frames
agent-browser frame "#iframe"# Switch to iframeagent-browser frame main# Back to main frame
Dialogs
agent-browser dialog accept [text]# Accept dialogagent-browser dialog dismiss# Dismiss dialog
JavaScript
agent-browser eval "document.title"# Run JavaScript
Examples
Form submission
bash
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
Authentication with saved state
bash
# Login once
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
# Later sessions: load saved state
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
Sessions (parallel browsers)
bash
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list
JSON output (for parsing)
Add --json for machine-readable output:
bash
agent-browser snapshot -i --json
agent-browser get text @e1 --json
Debugging
agent-browser open example.com --headed# Show browser windowagent-browser console# View console messagesagent-browser errors# View page errorsagent-browser record start ./debug.webm# Record from current pageagent-browser record stop# Save recordingagent-browser highlight @e1# Highlight elementagent-browser trace start# Start recording traceagent-browser trace stop trace.zip# Stop and save trace
Didn't find tool you were looking for?