Agent skill
browser-pilot
Chrome DevTools Protocol (CDP) browser automation, web scraping, crawling. 브라우저 자동화, 웹 스크래핑, 크롤링. Features/기능: screenshot with region control 영역지정스크린샷, viewport control 뷰포트제어, PDF generation PDF생성, web scraping 웹스크래핑, data extraction 데이터추출, form filling 폼작성, login automation 로그인자동화, click/input 클릭/입력, element finder 요소찾기, tab management 탭관리, cookie control 쿠키제어, JavaScript execution JS실행, page navigation 페이지이동, wait for element 요소대기, scroll 스크롤, accessibility tree 접근성트리, console messages 콘솔메시지, network idle 네트워크대기, back/forward 뒤로/앞으로, reload 새로고침, file upload 파일업로드, React compatibility React호환성, Smart Mode with Interaction Map 스마트모드. Selectors 셀렉터: CSS selectors (ID, class, attribute), XPath selectors with wildcard * (text-based, structural), XPath indexing (select N-th element with same text). Smart Mode: text-based element search with automatic selector generation. Bot detection bypass 봇감지우회 (navigator.webdriver=false). Auto Chrome connection 자동크롬연결. Headless/headed mode. Daemon-based architecture 데몬기반. Interaction Map System 인터랙션맵. React/framework compatibility React/프레임워크호환성.
Install this agent skill to your Project
npx add-skill https://github.com/Dev-GOM/claude-code-marketplace/tree/main/plugins/browser-pilot/skills
SKILL.md
browser-pilot
Purpose
Automate Chrome browser using Chrome DevTools Protocol (CDP) with a daemon-based architecture. Maintains persistent browser connection for instant command execution. Features Smart Mode with Interaction Map for reliable element targeting using text-based search instead of brittle selectors.
Always run scripts with --help first to see usage. DO NOT read the source until you try running the script first and find that a customized solution is abslutely necessary. These scripts can be very large and thus pollute your context window. They exist to be called directly as black-box scripts rather than ingested into your context window.
When to Use
Use browser-pilot when tasks involve:
- Browser automation, web scraping, data extraction
- Screenshot capture, PDF generation
- Form filling, login automation, element interaction
- Tab management, cookie control, JavaScript execution
- Tasks requiring text-based element selection ("click the 3rd Delete button")
- Bot detection bypass requirements (navigator.webdriver = false)
⚠️ Important Guidelines
When to Ask User: Use AskUserQuestion tool if:
- Task requirements unclear or ambiguous
- Multiple implementation approaches possible
- Element selectors not working despite troubleshooting
- User intent uncertain (e.g., "automate this" without specifics)
DO NOT guess or assume user requirements. Always clarify first.
Prerequisites
Chrome must be installed. Local scripts initialize automatically on session start (no manual setup required).
Getting Help
All commands support --help for detailed options:
# See all available commands
node .browser-pilot/bp --help
# Get help for specific command
node .browser-pilot/bp <command> --help
Architecture
Daemon-based design:
- Background daemon maintains persistent CDP connection
- CLI commands communicate via IPC
- Auto-starts on first command, stops at session end
- 30-minute inactivity timeout
Interaction Map System:
- Auto-generates JSON map of interactive elements on page load
- Enables text-based search with automatic selector generation
- Handles duplicates with indexing
- 10-minute cache with auto-regeneration
Core Workflow
1. Extract Required Information
From user's request, identify:
- Target URL(s) to visit
- Actions to perform (screenshot, click, fill, etc.)
- Element identifiers (text content, CSS selectors, or XPath)
- Output file names (for screenshots/PDFs)
- Data to extract or forms to fill
When information is missing or ambiguous, use AskUserQuestion tool.
2. Execute Commands
All commands use .browser-pilot/bp wrapper script. Replace placeholders with actual values.
Navigation:
node .browser-pilot/bp navigate -u <url>
node .browser-pilot/bp back
node .browser-pilot/bp forward
node .browser-pilot/bp reload
Interaction (Smart Mode - Recommended):
# Text-based element search (map auto-generated)
# No quotes for single words
node .browser-pilot/bp click --text Login --type button
node .browser-pilot/bp fill --text Email -v <value>
# Use quotes when text contains spaces
node .browser-pilot/bp click --text "Sign In" --type button
node .browser-pilot/bp fill --text "Email Address" -v <value>
# Handle duplicates with indexing
node .browser-pilot/bp click --text Delete --index 2
# Filter visible elements only
node .browser-pilot/bp click --text Submit --viewport-only
# Type aliases (auto-expanded)
node .browser-pilot/bp click --text Search --type input # Matches: input, input-text, input-search, etc.
# Tag-based filtering (HTML tag)
node .browser-pilot/bp click --text Submit --tag button # Matches all <button> tags
node .browser-pilot/bp fill --text Email --tag input -v user@example.com
# 3-stage fallback (automatic)
# Stage 1: Type search (with alias expansion)
# Stage 2: Tag search (if type fails)
# Stage 3: Map regeneration + retry (up to 3 attempts)
Interaction (Direct Mode - fallback for unique IDs):
node .browser-pilot/bp click -s "#login-button"
node .browser-pilot/bp fill -s "input[name='email']" -v <value>
Capture:
# Screenshots saved to .browser-pilot/screenshots/
node .browser-pilot/bp screenshot -o <filename>.png
# Capture specific region
node .browser-pilot/bp screenshot -o region.png --clip-x 100 --clip-y 200 --clip-width 800 --clip-height 600
# Set viewport size for responsive testing
node .browser-pilot/bp set-viewport -w 375 -h 667 --scale 2 --mobile
# Get current viewport size
node .browser-pilot/bp get-viewport
# Get screen and viewport information
node .browser-pilot/bp get-screen-info
# PDFs saved to .browser-pilot/pdfs/
node .browser-pilot/bp pdf -o <filename>.pdf
Chain Mode (multiple commands):
# Basic chain (no quotes needed for single words)
node .browser-pilot/bp chain navigate -u <url> click --text Submit extract -s .result
# With spaces (quotes required)
node .browser-pilot/bp chain navigate -u <url> click --text "Sign In" fill --text Email -v <email>
# Login workflow
node .browser-pilot/bp chain navigate -u <url> fill --text Email -v <email> fill --text Password -v <password> click --text Login
# Screenshot workflow
node .browser-pilot/bp chain navigate -u <url> wait -s .content-loaded screenshot -o result.png
Chain-specific options:
--timeout <ms>: Map wait timeout after navigation (default: 10000ms)--delay <ms>: Fixed delay between commands (overrides random 300-800ms)
Data Extraction:
node .browser-pilot/bp extract -s <selector>
node .browser-pilot/bp content
node .browser-pilot/bp console
node .browser-pilot/bp cookies
Other Actions:
node .browser-pilot/bp wait -s <selector> -t <timeout-ms>
node .browser-pilot/bp scroll -s <selector>
node .browser-pilot/bp eval -e <javascript-expression>
3. Query Interaction Map (when needed)
# List all element types
node .browser-pilot/bp query --list-types
# Find elements by text
node .browser-pilot/bp query --text <text>
# Check map status
node .browser-pilot/bp map-status
# Force regenerate map
node .browser-pilot/bp regen-map
Best Practices
-
🌟 Use Smart Mode by default: Text-based search (
--text) is more stable than CSS selectors- Recommended:
click --text Login - Fallback:
click -s #login-btn(only for unique IDs)
- Recommended:
-
Maps auto-generate: No manual map generation needed, happens on page load
-
Handle duplicates with indexing:
--index 2selects 2nd match when multiple elements have same text -
Filter with type aliases:
--type inputauto-expands to matchinput,input-text,input-search, etc.- Generic:
--type input(matches all input types) - Specific:
--type input-search(exact match only)
- Generic:
-
Use tag-based search for flexibility:
--tag buttonmatches all<button>elements regardless of type -
3-stage fallback is automatic: If element not found, system automatically:
- Tries type-based search (with alias expansion)
- Falls back to tag-based search
- Regenerates map and retries (up to 3 attempts)
-
Verify element visibility:
--viewport-onlyensures element is on screen -
Use Chain Mode for workflows: Execute multiple commands in sequence for complex automation
-
Check console for errors:
node .browser-pilot/bp consoleafter actions fail -
Let daemon auto-manage: Starts on first command, stops at session end
References
Detailed documentation in references/ folder (load as needed):
references/commands-reference.md: Complete command list with all options and examplesreferences/interaction-map.md: Smart Mode system, map structure, and query APIreferences/selector-guide.md: Selector strategies, best practices, and troubleshooting
Load references when user needs detailed information about specific features, advanced usage patterns, or troubleshooting guidance.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
unity-editor-toolkit
Unity Editor control and automation, WebSocket-based real-time communication. 유니티에디터제어및자동화, WebSocket기반실시간통신. Features/기능: GameObject control 게임오브젝트제어, Transform manipulation 트랜스폼조작, Component management 컴포넌트관리, Scene management 씬관리, SQLite database integration SQLite데이터베이스통합, GUID-based persistence GUID기반영구식별, Multi-scene synchronization 멀티씬동기화, Command Pattern with Undo/Redo 명령패턴실행취소재실행, Menu execution 메뉴실행, ScriptableObject management 스크립터블오브젝트관리, Array/List manipulation 배열리스트조작, All field types support 모든필드타입지원, Material/Rendering 머티리얼/렌더링, Prefab system 프리팹시스템, Asset Database 애셋데이터베이스, Animation 애니메이션, Physics 물리, Console logging 콘솔로깅, EditorPrefs management 에디터프리퍼런스관리, Editor automation 에디터자동화, Build pipeline 빌드파이프라인, Lighting 라이팅, Camera 카메라, Audio 오디오, Navigation 네비게이션, Particles 파티클, Timeline 타임라인, UI Toolkit, Profiler 프로파일러, Test Runner 테스트러너. Protocol 프로토콜: JSON-RPC 2.0 over WebSocket (port 9500-9600). 500+ commands 명령어, 25 categories 카테고리. Real-time bidirectional communication 실시간양방향통신. Security 보안: Defense-in-depth 심층방어 (path traversal protection 경로순회방지, command injection defense 명령어인젝션방어, JSON injection prevention JSON인젝션방지, SQL injection prevention SQL인젝션방지, transaction safety 트랜잭션안전성). Localhost-only connections 로컬호스트전용. Cross-platform 크로스플랫폼 (Windows, macOS, Linux).
blender-toolkit
Blender automation with geometry creation, materials, modifiers, and Mixamo animation retargeting. Core Features: WebSocket-based real-time control, automatic bone mapping with UI review, two-phase confirmation workflow, quality assessment, multi-project support, comprehensive CLI commands. Use Cases: Create 3D primitives (cube, sphere, cylinder, etc.), manipulate objects (transform, duplicate, delete), manage materials and modifiers, retarget Mixamo animations to custom rigs with fuzzy bone matching.
unity-test-runner
Execute and analyze Unity Test Framework tests from the command line. This skill automates test execution for Unity projects by detecting the Unity Editor, configuring test parameters (EditMode/PlayMode), running tests via CLI, parsing XML results, and generating detailed failure reports. Use this when running Unity tests, validating game logic, or debugging test failures.
Unity Template Generator
Generates production-ready C# script templates (MonoBehaviour, ScriptableObject, Editor, tests). Use when creating new scripts or setting up project structure.
unity-compile-fixer
Detect and resolve Unity C# compilation errors using VSCode diagnostics. Use this skill when Unity projects have compilation errors that need diagnosis and automated fixes. Analyzes errors from VSCode Language Server, proposes solutions based on error patterns, and handles version control conflicts for Unity projects.
Unity UI Toolkit
Assists with Unity UI Toolkit development - UXML structure, USS styling, C# VisualElement manipulation, data binding, and custom controls. Use when implementing UI Toolkit interfaces.
Didn't find tool you were looking for?