Agent skill

gallery-scraper

Bulk download images from login-protected gallery websites using an attached browser session. Use when asked to scrape, download, or save images from authenticated gallery pages, extract full-size images from thumbnails, or batch download from multi-page galleries.

View SKILL.md on GitHub Repository

Stars 225

Forks 26

Install this agent skill to your Project

npx add-skill https://github.com/jdrhyne/agent-skills/tree/main/clawdbot/gallery-scraper

SKILL.md

Gallery Scraper

Bulk download images from authenticated gallery websites via browser relay.

Safety Boundaries

Do not access gallery sites or user accounts that the user has not explicitly attached and authorized.
Do not download beyond the selected gallery, profile, or page range without confirmation.
Do not store cookies, tokens, or hidden form values in local output files.
Do not keep retrying blocked downloads indefinitely; surface rate limits or auth failures instead.

Prerequisites

User must have Chrome with OpenClaw Browser Relay extension
User must be logged into the target site
User must attach the browser tab (click relay toolbar button, badge ON)

Workflow

1. Attach Browser Tab

Ask user to:

Log into the gallery site in Chrome
Navigate to the target gallery/profile page
Click the OpenClaw Browser Relay toolbar button (badge shows ON)

2. Discover Image URL Pattern

Most gallery sites store full-size URLs in data attributes. Common patterns:

javascript

// Extract via browser evaluate
() => {
  // Try common patterns
  const patterns = [
    'img[data-max]',           // data-max attribute
    'img[data-src]',           // lazy-load pattern
    'img[data-full]',          // full-size pattern
    'a[data-lightbox] img',    // lightbox galleries
    '.gallery-item img'        // generic gallery
  ];
  
  for (const sel of patterns) {
    const imgs = document.querySelectorAll(sel);
    if (imgs.length > 0) {
      return {
        selector: sel,
        count: imgs.length,
        sample: imgs[0].outerHTML.substring(0, 200)
      };
    }
  }
  return null;
}

3. Extract Full-Size URLs

Once pattern identified, extract all URLs:

javascript

// For data-max pattern (common)
() => Array.from(document.querySelectorAll('img[data-max]'))
  .map(img => img.dataset.max)

// For thumbnail→full conversion (replace path segment)
() => Array.from(document.querySelectorAll('.gallery img'))
  .map(img => img.src.replace('/thumb/', '/full/'))

4. Handle Pagination

Check for multiple pages:

javascript

() => {
  const pagination = document.querySelectorAll('.pagination a, [class*="page"] a');
  return Array.from(pagination).map(a => ({text: a.textContent, href: a.href}));
}

Navigate to each page and collect URLs.

4b. Batch scrape multiple galleries (iframe trick)

When you need multiple galleries quickly and can’t automate CDP, you can load each gallery in a hidden iframe and extract data-max URLs:

javascript

async () => {
  const urls = [
    'https://site.example/galleries/view/123',
    'https://site.example/galleries/view/456'
  ];
  const results = [];
  for (const url of urls) {
    const iframe = document.createElement('iframe');
    iframe.style.position = 'fixed';
    iframe.style.left = '-9999px';
    iframe.style.width = '800px';
    iframe.style.height = '600px';
    iframe.src = url;
    document.body.appendChild(iframe);
    await new Promise((resolve, reject) => {
      const t = setTimeout(() => reject(new Error('timeout load')), 20000);
      iframe.onload = () => { clearTimeout(t); resolve(); };
    });
    const doc = iframe.contentDocument;
    const start = Date.now();
    let imgs = [];
    while (Date.now() - start < 20000) {
      imgs = Array.from(doc.querySelectorAll('img[data-max]')).map(i => i.dataset.max);
      if (imgs.length) break;
      await new Promise(r => setTimeout(r, 500));
    }
    results.push({ id: url.split('/').pop(), urls: imgs });
    iframe.remove();
  }
  return results;
}

5. Check CDN Access

Test if CDN requires authentication or just Referer:

bash

# Test direct access
curl -I "CDN_URL" 2>/dev/null | head -3

# Test with Referer
curl -I -H "Referer: https://SITE_DOMAIN/" "CDN_URL" 2>/dev/null | head -3

6. Bulk Download

Collect the URLs into a text file, then parallel download:

bash

# Create output directory
mkdir -p ~/Downloads/gallery_name

# Download with Referer header (parallel)
cd ~/Downloads/gallery_name
while IFS= read -r url; do
  filename=$(basename "$url")
  curl -s -H "Referer: https://SITE_DOMAIN/" -o "$filename" "$url" &
  [ $(jobs -r | wc -l) -ge 8 ] && wait -n
done < urls.txt
wait

Python ThreadPool fallback (avoids shell quoting + wait -n issues):

python

import os
import requests
from concurrent.futures import ThreadPoolExecutor

outdir = os.path.expanduser('~/Downloads/gallery_name')
os.makedirs(outdir, exist_ok=True)
headers = {'Referer': 'https://SITE_DOMAIN/', 'User-Agent': 'Mozilla/5.0'}

with open('urls.txt') as f:
    urls = [line.strip() for line in f if line.strip()]

def download(url):
    filename = os.path.join(outdir, os.path.basename(url))
    if os.path.exists(filename) and os.path.getsize(filename) > 0:
        return
    r = requests.get(url, headers=headers, timeout=60)
    r.raise_for_status()
    with open(filename, 'wb') as f:
        f.write(r.content)

with ThreadPoolExecutor(max_workers=8) as ex:
    for url in urls:
        ex.submit(download, url)

Handling Lock Buttons

Some galleries have "lock" buttons to reveal hidden content. Look for:

javascript

// Find lock/unlock buttons
() => {
  const locks = document.querySelectorAll(
    '[class*="lock"], [class*="unlock"], ' +
    'button[title*="lock"], .premium-unlock'
  );
  return Array.from(locks).map(el => ({
    tag: el.tagName,
    class: el.className,
    text: el.innerText?.substring(0, 30)
  }));
}

Click each lock button before extracting URLs.

Output Organization

Optionally organize by gallery:

bash

# Derive a gallery-specific folder name from the selected URL
mkdir -p "gallery_<id>"

Troubleshooting

403 Forbidden: Add Referer header or extract cookies from browser
Rate limited: Reduce parallel downloads, add delays
Missing images: Check for JavaScript-loaded content, may need scroll injection
Login required for CDN: Extract session cookies via document.cookie

Maintainer

jdrhyne Core maintainer

Source details

Full Name: jdrhyne/agent-skills
Branch: main
Path in repo: clawdbot/gallery-scraper
Topics: claude-code agent-skills automation mcp ai-agents cursor developer-tools gemini-cli clawdbot github-copilot prompt-engineering openclaw codex prompts agentic-ai llm-agents

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

jdrhyne/agent-skills

senior-engineering

Engineering principles for building software like a senior engineer. Load when tackling non-trivial development work, architecting systems, reviewing code, or orchestrating multi-agent builds. Covers planning, delivery, quality gates, and LLM-specific patterns.

225 26

Explore

jdrhyne/agent-skills

web-design-guidelines

Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices".

225 26

Explore

jdrhyne/agent-skills

munger-observer

Daily wisdom review applying Charlie Munger's mental models to your work and thinking. Use when asked to review decisions, analyze thinking patterns, detect biases, apply mental models, do a "Munger review", or run the Munger Observer. Triggers on scheduled daily reviews or manual requests like "run munger observer", "review my thinking", "check for blind spots", or "apply mental models".

225 26

Explore

jdrhyne/agent-skills

frontend-design

Expert frontend design guidelines for creating beautiful, modern UIs. Use when building landing pages, dashboards, or any user interface.

225 26

Explore

jdrhyne/agent-skills

humanizer

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

225 26

Explore

jdrhyne/agent-skills

gemini

Use when the user asks to run Gemini CLI for code review, plan review, or big context (>200k) processing. Ideal for comprehensive analysis requiring large context windows. Uses Gemini 3 Pro by default for state-of-the-art reasoning and coding.

225 26

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Gallery Scraper

Safety Boundaries

Prerequisites

Workflow

1. Attach Browser Tab

2. Discover Image URL Pattern

3. Extract Full-Size URLs

4. Handle Pagination

4b. Batch scrape multiple galleries (iframe trick)

5. Check CDN Access

6. Bulk Download

Handling Lock Buttons

Output Organization

Troubleshooting

Recommended Agent Skills

senior-engineering

web-design-guidelines

munger-observer

frontend-design

humanizer

gemini