Agent skill

browser-tools

Interactive browser automation via Chrome DevTools Protocol. Use when you need to interact with web pages, test frontends, or when user interaction with a visible browser is required.

View SKILL.md on GitHub Repository

Stars 981

Forks 102

Install this agent skill to your Project

npx add-skill https://github.com/badlogic/pi-skills/tree/main/browser-tools

SKILL.md

Browser Tools

Chrome DevTools Protocol tools for agent-assisted web automation. These tools connect to Chrome running on :9222 with remote debugging enabled.

Setup

Run once before first use:

bash

cd {baseDir}/browser-tools
npm install

Start Chrome

bash

{baseDir}/browser-start.js              # Fresh profile
{baseDir}/browser-start.js --profile    # Copy user's profile (cookies, logins)

Launch Chrome with remote debugging on :9222. Use --profile to preserve user's authentication state.

Navigate

bash

{baseDir}/browser-nav.js https://example.com
{baseDir}/browser-nav.js https://example.com --new

Navigate to URLs. Use --new flag to open in a new tab instead of reusing current tab.

Evaluate JavaScript

bash

{baseDir}/browser-eval.js 'document.title'
{baseDir}/browser-eval.js 'document.querySelectorAll("a").length'

Execute JavaScript in the active tab. Code runs in async context. Use this to extract data, inspect page state, or perform DOM operations programmatically.

Screenshot

bash

{baseDir}/browser-screenshot.js

Capture current viewport and return temporary file path. Use this to visually inspect page state or verify UI changes.

Pick Elements

bash

{baseDir}/browser-pick.js "Click the submit button"

IMPORTANT: Use this tool when the user wants to select specific DOM elements on the page. This launches an interactive picker that lets the user click elements to select them. The user can select multiple elements (Cmd/Ctrl+Click) and press Enter when done. The tool returns CSS selectors for the selected elements.

Common use cases:

User says "I want to click that button" → Use this tool to let them select it
User says "extract data from these items" → Use this tool to let them select the elements
When you need specific selectors but the page structure is complex or ambiguous

Cookies

bash

{baseDir}/browser-cookies.js

Display all cookies for the current tab including domain, path, httpOnly, and secure flags. Use this to debug authentication issues or inspect session state.

Extract Page Content

bash

{baseDir}/browser-content.js https://example.com

Navigate to a URL and extract readable content as markdown. Uses Mozilla Readability for article extraction and Turndown for HTML-to-markdown conversion. Works on pages with JavaScript content (waits for page to load).

When to Use

Testing frontend code in a real browser
Interacting with pages that require JavaScript
When user needs to visually see or interact with a page
Debugging authentication or session issues
Scraping dynamic content that requires JS execution

Efficiency Guide

DOM Inspection Over Screenshots

Don't take screenshots to see page state. Do parse the DOM directly:

javascript

// Get page structure
document.body.innerHTML.slice(0, 5000)

// Find interactive elements
Array.from(document.querySelectorAll('button, input, [role="button"]')).map(e => ({
  id: e.id,
  text: e.textContent.trim(),
  class: e.className
}))

Complex Scripts in Single Calls

Wrap everything in an IIFE to run multi-statement code:

javascript

(function() {
  // Multiple operations
  const data = document.querySelector('#target').textContent;
  const buttons = document.querySelectorAll('button');
  
  // Interactions
  buttons[0].click();
  
  // Return results
  return JSON.stringify({ data, buttonCount: buttons.length });
})()

Batch Interactions

Don't make separate calls for each click. Do batch them:

javascript

(function() {
  const actions = ["btn1", "btn2", "btn3"];
  actions.forEach(id => document.getElementById(id).click());
  return "Done";
})()

Typing/Input Sequences

javascript

(function() {
  const text = "HELLO";
  for (const char of text) {
    document.getElementById("key-" + char).click();
  }
  document.getElementById("submit").click();
  return "Submitted: " + text;
})()

Reading App/Game State

Extract structured state in one call:

javascript

(function() {
  const state = {
    score: document.querySelector('.score')?.textContent,
    status: document.querySelector('.status')?.className,
    items: Array.from(document.querySelectorAll('.item')).map(el => ({
      text: el.textContent,
      active: el.classList.contains('active')
    }))
  };
  return JSON.stringify(state, null, 2);
})()

Waiting for Updates

If DOM updates after actions, add a small delay with bash:

bash

sleep 0.5 && {baseDir}/browser-eval.js '...'

Investigate Before Interacting

Always start by understanding the page structure:

javascript

(function() {
  return {
    title: document.title,
    forms: document.forms.length,
    buttons: document.querySelectorAll('button').length,
    inputs: document.querySelectorAll('input').length,
    mainContent: document.body.innerHTML.slice(0, 3000)
  };
})()

Then target specific elements based on what you find.

Maintainer

badlogic Core maintainer

Source details

Full Name: badlogic/pi-skills
Branch: main
Path in repo: browser-tools
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

badlogic/pi-skills

brave-search

Web search and content extraction via Brave Search API. Use for searching documentation, facts, or any web content. Lightweight, no browser required.

981 102

Explore

badlogic/pi-skills

vscode

VS Code integration for viewing diffs and comparing files. Use when showing file differences to the user.

981 102

Explore

badlogic/pi-skills

gmcli

Gmail CLI for searching emails, reading threads, sending messages, managing drafts, and handling labels/attachments.

981 102

Explore

badlogic/pi-skills

youtube-transcript

Fetch transcripts from YouTube videos for summarization and analysis.

981 102

Explore

badlogic/pi-skills

gdcli

Google Drive CLI for listing, searching, uploading, downloading, and sharing files and folders.

981 102

Explore

badlogic/pi-skills

gccli

Google Calendar CLI for listing calendars, viewing/creating/updating events, and checking availability.

981 102

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Browser Tools

Setup

Start Chrome

Navigate

Evaluate JavaScript

Screenshot

Pick Elements

Cookies

Extract Page Content

When to Use

Efficiency Guide

DOM Inspection Over Screenshots

Complex Scripts in Single Calls

Batch Interactions

Typing/Input Sequences

Reading App/Game State

Waiting for Updates

Investigate Before Interacting

Recommended Agent Skills

brave-search

vscode

gmcli

youtube-transcript

gdcli

gccli