Agent skill
agentic-browser
Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots. Capabilities: web scraping, form filling, clicking, typing, JavaScript execution. Use for: web automation, data extraction, testing, agent browsing, research. Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot, browse web, playwright, headless browser, web agent, surf internet
Install this agent skill to your Project
npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/inference-sh/agentic-browser
SKILL.md
Agentic Browser
Browser automation for AI agents via inference.sh.
Quick Start
curl -fsSL https://cli.inference.sh | sh && infsh login
# Open a page and get interactive elements
infsh app run agentic-browser --function open --input '{"url": "https://example.com"}' --session new
Core Workflow
Every browser automation follows this pattern:
- Open: Navigate to URL, get element refs
- Snapshot: Re-fetch elements after DOM changes
- Interact: Use
@erefs to click, fill, etc. - Re-snapshot: After navigation, get fresh refs
# Start session
RESULT=$(infsh app run agentic-browser --function open --session new --input '{
"url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
# Elements returned like: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"
# Fill form
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e1", "text": "user@example.com"
}'
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e2", "text": "password123"
}'
# Click submit
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{
"action": "click", "ref": "@e3"
}'
# Close when done
infsh app run agentic-browser --function close --session $SESSION_ID --input '{}'
Functions
open
Navigate to URL and configure browser. Returns page snapshot with @e refs.
infsh app run agentic-browser --function open --session new --input '{
"url": "https://example.com",
"width": 1280,
"height": 720,
"user_agent": "Mozilla/5.0..."
}'
Returns:
url: Current page URLtitle: Page titleelements: List of interactive elements with@erefsscreenshot: Page screenshot (for vision agents)
snapshot
Re-fetch page state after DOM changes. Always call after clicks that navigate.
infsh app run agentic-browser --function snapshot --session $SESSION_ID --input '{}'
interact
Interact with elements using @e refs from snapshot.
| Action | Description | Required Fields |
|---|---|---|
click |
Click element | ref |
fill |
Clear and type text | ref, text |
type |
Type text (no clear) | text |
press |
Press key | text (e.g., "Enter") |
select |
Select dropdown | ref, text |
hover |
Hover over element | ref |
scroll |
Scroll page | direction (up/down) |
back |
Go back in history | - |
wait |
Wait milliseconds | wait_ms |
# Click
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{
"action": "click", "ref": "@e5"
}'
# Fill input
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e1", "text": "hello@example.com"
}'
# Press Enter
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{
"action": "press", "text": "Enter"
}'
# Scroll down
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{
"action": "scroll", "direction": "down"
}'
screenshot
Take page screenshot.
infsh app run agentic-browser --function screenshot --session $SESSION_ID --input '{
"full_page": true
}'
execute
Run JavaScript on the page.
infsh app run agentic-browser --function execute --session $SESSION_ID --input '{
"code": "document.title"
}'
close
Close browser session.
infsh app run agentic-browser --function close --session $SESSION_ID --input '{}'
Element Refs
Elements are returned with @e refs like:
@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
Important: Refs are invalidated after navigation. Always re-snapshot after:
- Clicking links/buttons that navigate
- Form submissions
- Dynamic content loading
Examples
Form Submission
SESSION=$(infsh app run agentic-browser --function open --session new --input '{
"url": "https://example.com/contact"
}' | jq -r '.session_id')
# Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea] "Message", @e4 [button] "Send"
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}'
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'
# Check result
infsh app run agentic-browser --function snapshot --session $SESSION --input '{}'
infsh app run agentic-browser --function close --session $SESSION --input '{}'
Search and Extract
SESSION=$(infsh app run agentic-browser --function open --session new --input '{
"url": "https://google.com"
}' | jq -r '.session_id')
# Fill search box and submit
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'
# Get results page
infsh app run agentic-browser --function snapshot --session $SESSION --input '{}'
infsh app run agentic-browser --function close --session $SESSION --input '{}'
Extract Data with JavaScript
infsh app run agentic-browser --function execute --session $SESSION --input '{
"code": "Array.from(document.querySelectorAll(\"h2\")).map(h => h.textContent)"
}'
Sessions
Browser state persists within a session. Always:
- Start with
--session newon first call - Use returned
session_idfor subsequent calls - Close session when done
Related Skills
# Web search (for research + browse)
npx skills add inference-sh/skills@web-search
# LLM models (analyze extracted content)
npx skills add inference-sh/skills@llm-models
Documentation
- inference.sh Sessions - Session management
- Multi-function Apps - How functions work
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
perigon-backend
Perigon ASP.NET Core + EF Core + Aspire conventions
perigon-agent
Pointers for Copilot/agents to apply Perigon conventions
perigon-angular
Angular 21+ standalone/Material/signal conventions for Perigon WebApp
fastapi-mastery
Comprehensive FastAPI development skill covering REST API creation, routing, request/response handling, validation, authentication, database integration, middleware, and deployment. Use when working with FastAPI projects, building APIs, implementing CRUD operations, setting up authentication/authorization, integrating databases (SQL/NoSQL), adding middleware, handling WebSockets, or deploying FastAPI applications. Triggered by requests involving .py files with FastAPI code, API endpoint creation, Pydantic models, or FastAPI-specific features.
context7-efficient
Token-efficient library documentation fetcher using Context7 MCP with 86.8% token savings through intelligent shell pipeline filtering. Fetches code examples, API references, and best practices for JavaScript, Python, Go, Rust, and other libraries. Use when users ask about library documentation, need code examples, want API usage patterns, are learning a new framework, need syntax reference, or troubleshooting with library-specific information. Triggers include questions like "Show me React hooks", "How do I use Prisma", "What's the Next.js routing syntax", or any request for library/framework documentation.
browser-use
Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.
Didn't find tool you were looking for?