Agent skill

dev-browser

Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request.

Stars 8,297
Forks 729

Install this agent skill to your Project

npx add-skill https://github.com/MemTensor/MemOS/tree/main/apps/openwork-memos-integration/apps/desktop/skills/dev-browser

SKILL.md

Dev Browser Skill

Browser automation that maintains page state across script executions. Write small, focused scripts to accomplish tasks incrementally. Once you've proven out part of a workflow and there is repeated work to be done, you can write a script to do the repeated work in a single execution.

Choosing Your Approach

  • Local/source-available sites: Read the source code first to write selectors directly
  • Unknown page layouts: Use getAISnapshot() to discover elements and selectSnapshotRef() to interact with them
  • Visual feedback: Take screenshots to see what the user sees

Setup

Two modes available. Ask the user if unclear which to use.

Standalone Mode (Default)

Launches a new Chromium browser for fresh automation sessions.

bash
./skills/dev-browser/server.sh &

Add --headless flag if user requests it. Wait for the Ready message before running scripts.

Extension Mode

Connects to user's existing Chrome browser. Use this when:

  • The user is already logged into sites and wants you to do things behind an authed experience that isn't local dev.
  • The user asks you to use the extension

Important: The core flow is still the same. You create named pages inside of their browser.

Start the relay server:

bash
cd skills/dev-browser && npm i && npm run start-extension &

Wait for Waiting for extension to connect... followed by Extension connected in the console. To know that a client has connected and the browser is ready to be controlled. Workflow:

  1. Scripts call client.page("name") just like the normal mode to create new pages / connect to existing ones.
  2. Automation runs on the user's actual browser session

If the extension hasn't connected yet, tell the user to launch and activate it. Download link: https://github.com/SawyerHood/dev-browser/releases

Writing Scripts

Run all scripts from skills/dev-browser/ directory. The @/ import alias requires this directory's config.

Execute scripts inline using heredocs:

bash
cd skills/dev-browser && npx tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";

const client = await connect();
// Create page with custom viewport size (optional)
const page = await client.page("example", { viewport: { width: 1920, height: 1080 } });

await page.goto("https://example.com");
await waitForPageLoad(page);

console.log({ title: await page.title(), url: page.url() });
await client.disconnect();
EOF

Write to tmp/ files only when the script needs reuse, is complex, or user explicitly requests it.

Key Principles

  1. Small scripts: Each script does ONE thing (navigate, click, fill, check)
  2. Evaluate state: Log/return state at the end to decide next steps
  3. Descriptive page names: Use "checkout", "login", not "main"
  4. Disconnect to exit: await client.disconnect() - pages persist on server
  5. Plain JS in evaluate: page.evaluate() runs in browser - no TypeScript syntax

Workflow Loop

Follow this pattern for complex tasks:

  1. Write a script to perform one action
  2. Run it and observe the output
  3. Evaluate - did it work? What's the current state?
  4. Decide - is the task complete or do we need another script?
  5. Repeat until task is done

No TypeScript in Browser Context

Code passed to page.evaluate() runs in the browser, which doesn't understand TypeScript:

typescript
// ✅ Correct: plain JavaScript
const text = await page.evaluate(() => {
  return document.body.innerText;
});

// ❌ Wrong: TypeScript syntax will fail at runtime
const text = await page.evaluate(() => {
  const el: HTMLElement = document.body; // Type annotation breaks in browser!
  return el.innerText;
});

Scraping Data

For scraping large datasets, intercept and replay network requests rather than scrolling the DOM. See references/scraping.md for the complete guide covering request capture, schema discovery, and paginated API replay.

Client API

typescript
const client = await connect();

// Get or create named page (viewport only applies to new pages)
const page = await client.page("name");
const pageWithSize = await client.page("name", { viewport: { width: 1920, height: 1080 } });

const pages = await client.list(); // List all page names
await client.close("name"); // Close a page
await client.disconnect(); // Disconnect (pages persist)

// ARIA Snapshot methods
const snapshot = await client.getAISnapshot("name"); // Get accessibility tree
const element = await client.selectSnapshotRef("name", "e5"); // Get element by ref

The page object is a standard Playwright Page.

Waiting

typescript
import { waitForPageLoad } from "@/client.js";

await waitForPageLoad(page); // After navigation
await page.waitForSelector(".results"); // For specific elements
await page.waitForURL("**/success"); // For specific URL

Inspecting Page State

Screenshots

typescript
await page.screenshot({ path: "tmp/screenshot.png" });
await page.screenshot({ path: "tmp/full.png", fullPage: true });

ARIA Snapshot (Element Discovery)

Use getAISnapshot() to discover page elements. Returns YAML-formatted accessibility tree:

yaml
- banner:
  - link "Hacker News" [ref=e1]
  - navigation:
    - link "new" [ref=e2]
- main:
  - list:
    - listitem:
      - link "Article Title" [ref=e8]
      - link "328 comments" [ref=e9]
- contentinfo:
  - textbox [ref=e10]
    - /placeholder: "Search"

Interpreting refs:

  • [ref=eN] - Element reference for interaction (visible, clickable elements only)
  • [checked], [disabled], [expanded] - Element states
  • [level=N] - Heading level
  • /url:, /placeholder: - Element properties

Interacting with refs:

typescript
const snapshot = await client.getAISnapshot("hackernews");
console.log(snapshot); // Find the ref you need

const element = await client.selectSnapshotRef("hackernews", "e2");
await element.click();

Error Recovery

Page state persists after failures. Debug with:

bash
cd skills/dev-browser && npx tsx <<'EOF'
import { connect } from "@/client.js";

const client = await connect();
const page = await client.page("hackernews");

await page.screenshot({ path: "tmp/debug.png" });
console.log({
  url: page.url(),
  title: await page.title(),
  bodyText: await page.textContent("body").then((t) => t?.slice(0, 200)),
});

await client.disconnect();
EOF

Expand your agent's capabilities with these related and highly-rated skills.

MemTensor/MemOS

browserwing-executor

Control browser automation through HTTP API. Supports page navigation, element interaction (click, type, select), data extraction, accessibility snapshot analysis, screenshot, JavaScript execution, and batch operations.

8,297 729
Explore
MemTensor/MemOS

browserwing-admin

Manage and operate BrowserWing — an intelligent browser automation platform. Install dependencies, configure LLM, create/manage/execute automation scripts, use AI-driven exploration to generate scripts, browse the script marketplace, and troubleshoot issues.

8,297 729
Explore
MemTensor/MemOS

memos-memory-guide

Use the MemOS Local memory system to search and use the user's past conversations. Use this skill whenever the user refers to past chats, their own preferences or history, or when you need to answer from prior context. When auto-recall returns nothing (long or unclear user query), generate your own short search query and call memory_search. Available tools: memory_search, memory_get, memory_write_public, memory_share, memory_unshare, task_summary, skill_get, skill_search, skill_install, skill_publish, skill_unpublish, network_memory_detail, network_skill_pull, network_team_info, memory_timeline, memory_viewer.

8,297 729
Explore
MemTensor/MemOS

memos-local

Persistent local memory for OpenClaw agents. Use when users say: - "install memos" - "install MemOS" - "setup memory" - "add memory plugin" - "openclaw memory" - "memos onboarding" - "memory not working" - "configure memory" - "enable memory" - "upgrade MemOS" - "update memory plugin"

8,297 729
Explore
MemTensor/MemOS

safe-file-deletion

Enforces explicit user permission before any file deletion. Activates when you're about to use rm, unlink, fs.rm, or any operation that removes files from disk. MUST be followed for all delete operations.

8,297 729
Explore
MemTensor/MemOS

ask-user-question

Ask users questions via the UI. Use when you need clarification, user preferences, or confirmation before proceeding. The user CANNOT see CLI output - this tool is the ONLY way to communicate with them.

8,297 729
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results