Agent skill
using-web-scraping
Search and scrape public web content with headless Chrome and DuckDuckGo using safe practices.
Install this agent skill to your Project
npx add-skill https://github.com/besoeasy/open-skills/tree/main/skills/using-web-scraping
SKILL.md
Web Scraping Skill — Chrome (Playwright) + DuckDuckGo
A privacy-minded, agent-facing web-scraping skill that uses headless Chrome (Playwright/Puppeteer) and DuckDuckGo for search. Focuses on: reliable navigation, extracting structured text, obeying robots.txt, and rate-limiting.
When to use
- Collect public webpage content for summarization, metadata extraction, or link discovery.
- Use DuckDuckGo for queries when you want a privacy-respecting search source.
- NOT for bypassing paywalls, scraping private/logged-in content, or violating Terms of Service.
Safety & etiquette
- Always check and respect
/robots.txtbefore scraping a site. - Rate-limit requests (default: 1 request/sec) and use polite
User-Agentstrings. - Avoid executing arbitrary user-provided JavaScript on scraped pages.
- Only scrape public content; if login is required, return
login_requiredinstead of attempting to bypass.
Capabilities
- Search DuckDuckGo and return top-N result links.
- Visit result pages in headless Chrome and extract
title,meta description,maintext (or best-effort article text), andcanonicalURL. - Return results as structured JSON for downstream consumption.
Examples
Node.js (Playwright)
const { chromium } = require('playwright');
async function ddgSearchAndScrape(query) {
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage({ userAgent: 'open-skills-bot/1.0' });
// DuckDuckGo search
await page.goto('https://duckduckgo.com/');
await page.fill('input[name="q"]', query);
await page.keyboard.press('Enter');
await page.waitForSelector('.result__title a');
// collect top result URL
const href = await page.getAttribute('.result__title a', 'href');
if (!href) { await browser.close(); return []; }
// visit result and extract
await page.goto(href, { waitUntil: 'domcontentloaded' });
const title = await page.title();
const description = await page.locator('meta[name="description"]').getAttribute('content').catch(() => null);
const article = await page.locator('article, main, #content').first().innerText().catch(() => null);
await browser.close();
return [{ url: href, title, description, text: article }];
}
// usage
// ddgSearchAndScrape('open-source agent runtimes').then(console.log);
Agent prompt (copy/paste)
You are an agent with a web-scraping skill. For any `search:` task, use DuckDuckGo to find relevant pages, then open each page in a headless Chrome instance (Playwright/Puppeteer) and extract `title`, `meta description`, `main text`, and `canonical` URL. Always:
- Check and respect robots.txt
- Rate-limit requests (<=1 req/sec)
- Use a clear `User-Agent` and do not execute arbitrary page JS
Return results as JSON: [{url,title,description,text}] or `login_required` if a page needs authentication.
Quick setup
- Node:
npm i playwrightand runnpx playwright installfor browser binaries. - Python:
pip install playwrightandplaywright install.
Tips
- Use
page.routeto block large assets (images, fonts) when you only need text. - Respect site terms and introduce exponential backoff for retries.
See also
- using-youtube-download.md — media-specific scraping and download examples.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
phone-specs-scraper
Scrape phone specifications from GSM Arena, PhoneDB, and alternative sites. Use when: (1) Comparing smartphone specs, (2) Researching device features, or (3) Building phone comparison tools.
check-crypto-address-balance
Check cryptocurrency wallet balances across multiple blockchains using free public APIs.
get-crypto-price
Fetch current and historical crypto prices and compute ATH or ATL over common time windows.
static-assets-hosting
Host static websites and assets via zip upload to Originless IPFS. Use when: (1) Deploying static sites, (2) Hosting HTML/CSS/JS projects, (3) Sharing web assets publicly, or (4) User asks to host static files.
chat-logger
Log all chat messages to a SQLite database for searchable history and audit. Use when: (1) Building chat history, (2) Auditing conversations, (3) Searching past messages, or (4) User asks to log chats.
using-nostr
Post notes, send encrypted messages, and interact with relays using the Nostr protocol.
Didn't find tool you were looking for?