Agent skill
browser
Complete real user web tasks end-to-end via browser-tool, navigate, interact, wait for page state, extract results, and provide evidence when needed.
Install this agent skill to your Project
npx add-skill https://github.com/wecode-ai/Wegent/tree/main/backend/init_data/skills/browser
SKILL.md
Browser Control Skill
Goal
Finish the user’s real task reliably.
Prioritize successful completion and correct results over aggressive call minimization.
Operating Rules
- Start with the intended action directly (
navigate/open/act/evaluate). Do not runstatusas a pre-check. - Use
snapshotonly when refs are required for interaction (click/type/select/drag/scrollIntoView). - Prefer
evaluatefor extraction. Return structured data in one comprehensive call when possible. - Use condition waits by default (
loadState/url→selector/text/textGone→fn). AvoidtimeMsunless explicitly needed. - Before clicking potentially off-screen elements, run
act.scrollIntoViewon the ref first. - Keep context stable: once
targetIdis known, pass it in follow-up calls when supported. - Avoid blind loops: every extra call must have a clear purpose.
Reliability and Recovery
- If
Ref not found, do not reuse stale refs. Take one freshsnapshot, retry once, then stop if still failing. - For repeated failures with the same cause, stop and explain the blocker clearly instead of retrying endlessly.
- Connection recovery is built into the tool. Allow auto-recovery once; if still disconnected, instruct user to install/connect extension.
Screenshot Policy
- Default: no screenshot.
- Use screenshots only when user asks, or when visual proof is required.
- Prefer element screenshots (
reforelement) over full-page screenshots. - Use full-page screenshots only for page-level evidence.
Recommended Flow
- Direct action first (
navigate/openor immediateact/evaluate). - If interaction needs refs, run
snapshot(interactive: truepreferred). - Wait for readiness using
act.waitwith explicit conditions. - Interact (
scrollIntoView→click/type/select/dragas needed). - Extract/verify with
evaluate(preferred) orsnapshot. - Provide screenshot evidence only when necessary.
Connection Handling
Connection recovery is built into the tool. On connection failure, let the tool auto-attach/launch/retry once. If still disconnected, stop and instruct the user to install/connect the extension.
Minimal CLI Usage
Use <BROWSER_TOOL_CMD> for commands:
- macOS/Linux:
~/.wegent-executor/bin/browser-tool - Windows:
~/.wegent-executor/bin/browser-tool.cmd
<BROWSER_TOOL_CMD> '<json>'
Quick Examples
# Navigate directly
<BROWSER_TOOL_CMD> '{"action":"navigate","url":"https://example.com"}'
# Snapshot only when refs are needed
<BROWSER_TOOL_CMD> '{"action":"snapshot","interactive":true}'
# Act on ref
<BROWSER_TOOL_CMD> '{"action":"act","request":{"kind":"click","ref":"e1"}}'
# Ensure element is visible before click (recommended on long pages)
<BROWSER_TOOL_CMD> '{"action":"act","request":{"kind":"scrollIntoView","ref":"e1"}}'
# Condition wait (preferred over fixed sleep)
<BROWSER_TOOL_CMD> '{"action":"act","request":{"kind":"wait","loadState":"domcontentloaded","timeoutMs":15000}}'
# URL-based wait
<BROWSER_TOOL_CMD> '{"action":"act","request":{"kind":"wait","url":"checkout","timeoutMs":10000}}'
# Run JS in page context via act.evaluate (function or expression)
<BROWSER_TOOL_CMD> '{"action":"act","request":{"kind":"evaluate","fn":"() => ({title: document.title, href: location.href})"}}'
# Run JS against a target element ref via act.evaluate
<BROWSER_TOOL_CMD> '{"action":"act","request":{"kind":"evaluate","ref":"e1","fn":"(el) => ({text: el.textContent?.trim() || \"\"})"}}'
# Close current tab (or pass targetId)
<BROWSER_TOOL_CMD> '{"action":"act","request":{"kind":"close"}}'
# Element screenshot (prefer over full-page when only target proof is needed)
<BROWSER_TOOL_CMD> '{"action":"screenshot","ref":"e1","type":"jpeg"}'
# Comprehensive extraction in one evaluate
<BROWSER_TOOL_CMD> '{"action":"evaluate","expression":"(() => ({title:document.title,url:location.href}))()"}'
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
wiki_submit
Submit wiki documentation sections to Wegent backend API. Simplifies the HTTP POST process for wiki content submission.
subscription-manager
Create and manage scheduled subscription tasks. Use when the user wants to set up recurring reminders, periodic reports, scheduled checks, or any automated tasks that run on a schedule. Supports cron expressions, fixed intervals, and one-time executions.
conversation_to_prompt
Convert the current conversation into a reusable system prompt draft with strict structure and quality checks.
mermaid-diagram
Use this skill when you need to draw diagrams including: Flowchart, Sequence Diagram, Class Diagram, State Diagram, ER Diagram, User Journey, Gantt Chart, Pie Chart, Quadrant Chart, Requirement Diagram, Gitgraph, Mindmap, Timeline, Sankey, XY Chart (Bar/Line), Block Diagram, Packet Diagram, Kanban, Architecture Diagram, C4 Diagram, Radar Chart, Treemap, and ZenUML. You MUST use this skill BEFORE outputting any mermaid code block.
wegent-knowledge
Knowledge base management tools for Wegent. Provides capabilities to list, create, and update knowledge bases and documents. Use this skill when the user wants to manage knowledge bases or documents programmatically.
sandbox
Provides read_file/write_file/exec/list_files/read_file/write_file for running process and managing filesystems in the sandbox. Ideal for code testing, file management, and command execution. The sub_claude_agent tool is available for advanced use cases. You MUST load this skill BEFORE use sandbox tools.
Didn't find tool you were looking for?