Agent skill
browser
Web browser automation with AI-optimized snapshots for claude-flow agents
Install this agent skill to your Project
npx add-skill https://github.com/diegopacheco/ai-playground/tree/main/pocs/claude-flow-fun/.claude/skills/browser
SKILL.md
Browser Automation Skill
Web browser automation using agent-browser with AI-optimized snapshots. Reduces context by 93% using element refs (@e1, @e2) instead of full DOM.
Core Workflow
# 1. Navigate to page
agent-browser open <url>
# 2. Get accessibility tree with element refs
agent-browser snapshot -i # -i = interactive elements only
# 3. Interact using refs from snapshot
agent-browser click @e2
agent-browser fill @e3 "text"
# 4. Re-snapshot after page changes
agent-browser snapshot -i
Quick Reference
Navigation
| Command | Description |
|---|---|
open <url> |
Navigate to URL |
back |
Go back |
forward |
Go forward |
reload |
Reload page |
close |
Close browser |
Snapshots (AI-Optimized)
| Command | Description |
|---|---|
snapshot |
Full accessibility tree |
snapshot -i |
Interactive elements only (buttons, links, inputs) |
snapshot -c |
Compact (remove empty elements) |
snapshot -d 3 |
Limit depth to 3 levels |
screenshot [path] |
Capture screenshot (base64 if no path) |
Interaction
| Command | Description |
|---|---|
click <sel> |
Click element |
fill <sel> <text> |
Clear and fill input |
type <sel> <text> |
Type with key events |
press <key> |
Press key (Enter, Tab, etc.) |
hover <sel> |
Hover element |
select <sel> <val> |
Select dropdown option |
check/uncheck <sel> |
Toggle checkbox |
scroll <dir> [px] |
Scroll page |
Get Info
| Command | Description |
|---|---|
get text <sel> |
Get text content |
get html <sel> |
Get innerHTML |
get value <sel> |
Get input value |
get attr <sel> <attr> |
Get attribute |
get title |
Get page title |
get url |
Get current URL |
Wait
| Command | Description |
|---|---|
wait <selector> |
Wait for element |
wait <ms> |
Wait milliseconds |
wait --text "text" |
Wait for text |
wait --url "pattern" |
Wait for URL |
wait --load networkidle |
Wait for load state |
Sessions
| Command | Description |
|---|---|
--session <name> |
Use isolated session |
session list |
List active sessions |
Selectors
Element Refs (Recommended)
# Get refs from snapshot
agent-browser snapshot -i
# Output: button "Submit" [ref=e2]
# Use ref to interact
agent-browser click @e2
CSS Selectors
agent-browser click "#submit"
agent-browser fill ".email-input" "test@test.com"
Semantic Locators
agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"
agent-browser find testid "login-btn" click
Examples
Login Flow
agent-browser open https://example.com/login
agent-browser snapshot -i
agent-browser fill @e2 "user@example.com"
agent-browser fill @e3 "password123"
agent-browser click @e4
agent-browser wait --url "**/dashboard"
Form Submission
agent-browser open https://example.com/contact
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser fill @e3 "Hello, this is my message"
agent-browser click @e4
agent-browser wait --text "Thank you"
Data Extraction
agent-browser open https://example.com/products
agent-browser snapshot -i
# Iterate through product refs
agent-browser get text @e1 # Product name
agent-browser get text @e2 # Price
agent-browser get attr @e3 href # Link
Multi-Session (Swarm)
# Session 1: Navigator
agent-browser --session nav open https://example.com
agent-browser --session nav state save auth.json
# Session 2: Scraper (uses same auth)
agent-browser --session scrape state load auth.json
agent-browser --session scrape open https://example.com/data
agent-browser --session scrape snapshot -i
Integration with Claude Flow
MCP Tools
All browser operations are available as MCP tools with browser/ prefix:
browser/openbrowser/snapshotbrowser/clickbrowser/fillbrowser/screenshot- etc.
Memory Integration
# Store successful patterns
npx @claude-flow/cli memory store --namespace browser-patterns --key "login-flow" --value "snapshot->fill->click->wait"
# Retrieve before similar task
npx @claude-flow/cli memory search --query "login automation"
Hooks
# Pre-browse hook (get context)
npx @claude-flow/cli hooks pre-edit --file "browser-task.ts"
# Post-browse hook (record success)
npx @claude-flow/cli hooks post-task --task-id "browse-1" --success true
Tips
- Always use snapshots - They're optimized for AI with refs
- Prefer
-iflag - Gets only interactive elements, smaller output - Use refs, not selectors - More reliable, deterministic
- Re-snapshot after navigation - Page state changes
- Use sessions for parallel work - Each session is isolated
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
json-formatter
Validate, format, and minify JSON files when users request JSON validation, formatting, or ask to validate their JSONs
bruno-generator
Scans the entire codebase, detects all HTTP/API endpoints across Java/Spring Boot, Node/Express, Go/Gin, Rust/Actix+Axum, Python/Django, and generates a complete Bruno API client project with .bru files, sample requests, and environments.
infra-automation-generator
leak-detect
Scan code for leaked PII, secrets/credentials, and security vulnerabilities that would get you hacked in production.
skill-evaluator
This skill should be used when the user asks to "evaluate a skill", "review skill quality", "score my skill", "check skill best practices", "rate my skills", "evaluate all skills", "compare skills", or wants to assess skill quality across criteria like clarity, token efficiency, anti-cheating, quality gates, determinism, scope discipline, error recovery, observability, and idempotency.
metrics-report
Scan an entire codebase, discover and run all test types, compute hybrid coverage, evaluate quality, and generate a full metrics report website with trends and charts.
Didn't find tool you were looking for?