Agent skill
web-scraper-energy
Web scraping workflows for energy data collection from BSEE and BOEM using Scrapy
Install this agent skill to your Project
npx add-skill https://github.com/vamseeachanta/workspace-hub/tree/main/.claude/skills/data/energy/web-scraper-energy
SKILL.md
Web Scraper Energy
When to Use This Skill
Use this skill when you need to:
- Scrape BSEE/BOEM websites for data not in APIs
- Collect lease sale results and bid data
- Extract platform and facility information
- Build automated data collection pipelines
- Parse HTML tables into structured data
Core Pattern
"""
ABOUTME: Web scraping utilities for energy data collection
ABOUTME: Supports Scrapy spiders and BeautifulSoup parsing
"""
from dataclasses import dataclass
from typing import List, Dict, Optional
from bs4 import BeautifulSoup
import requests
import time
@dataclass
class ScrapingConfig:
"""Configuration for web scraping."""
base_url: str
rate_limit_sec: float = 1.0
max_retries: int = 3
timeout_sec: int = 30
user_agent: str = "WorldEnergyData/1.0"
cache_enabled: bool = True
class BOEMScraper:
*See sub-skills for full details.*
## YAML Configuration Template
```yaml
# config/input/scraping-config.yaml
metadata:
feature_name: "energy-scraping"
created: "2025-01-15"
scraping:
rate_limit_sec: 1.5
max_retries: 3
timeout_sec: 30
cache_enabled: true
cache_ttl_hours: 24
targets:
- name: "lease_sales"
source: "boem"
sale_numbers: [257, 258, 259]
output: "data/lease_sales/"
- name: "platforms"
source: "boem"
areas: ["GC", "WR", "MC"]
output: "data/platforms/"
*See sub-skills for full details.*
## CLI Usage
```bash
# Scrape lease sale results
python -m worldenergydata.scraper \
--source boem \
--type lease-sale \
--sale 259 \
--output data/lease_sale_259.csv
# Scrape platform data
python -m worldenergydata.scraper \
--source boem \
--type platforms \
--area GC \
--output data/gc_platforms.csv
Web Crawling & MCP Assessment (2026-03-14)
No external MCP or paid service needed for energy data scraping.
This skill's requests + BeautifulSoup pattern is sufficient for BSEE/BOEM/EIA targets.
For async fetching at scale (WRK-1202 Tier 3), upgrade to httpx (async) + beautifulsoup4.
For JS-rendered pages, use claude-in-chrome browser automation (already available).
See doc-research-download skill for the full assessment.
Sub-Skills
- Best Practices
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
gsd-complete-milestone
Archive completed milestone and prepare for next version
gsd-reapply-patches
Reapply local modifications after a GSD update
gsd-verify-work
Validate built features through conversational UAT
gsd-thread
Manage persistent context threads for cross-session work
clinical-trial-protocol
Generate clinical trial protocols for medical devices or drugs through a modular, waypoint-based architecture with research-only and full protocol modes.
single-cell-rna-qc
Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations.
Didn't find tool you were looking for?