Agent skill
bright-data
Bright Data proxy and web scraping API. Use when user mentions "Bright Data", "proxy", "web scraping at scale", or data collection.
Install this agent skill to your Project
npx add-skill https://github.com/vm0-ai/vm0-skills/tree/main/bright-data
SKILL.md
Bright Data Web Scraper API
Use the Bright Data API via direct curl calls for social media scraping, web data extraction, and account management.
Official docs:
https://docs.brightdata.com/
When to Use
Use this skill when you need to:
- Scrape social media - Twitter/X, Reddit, YouTube, Instagram, TikTok, LinkedIn
- Extract web data - Posts, profiles, comments, engagement metrics
- Monitor usage - Track bandwidth and request usage
- Manage account - Check status and zones
Prerequisites
- Sign up at Bright Data
- Get your API key from Settings > Users
- Create a Web Scraper dataset in the Control Panel to get your
dataset_id
export BRIGHTDATA_TOKEN="your-api-key"
Base URL
https://api.brightdata.com
Social Media Scraping
Bright Data supports scraping these social media platforms:
| Platform | Profiles | Posts | Comments | Reels/Videos |
|---|---|---|---|---|
| Twitter/X | ✅ | ✅ | - | - |
| - | ✅ | ✅ | - | |
| YouTube | ✅ | ✅ | ✅ | - |
| ✅ | ✅ | ✅ | ✅ | |
| TikTok | ✅ | ✅ | ✅ | - |
| ✅ | ✅ | - | - |
How to Use
1. Trigger Scraping (Asynchronous)
Trigger a data collection job and get a snapshot_id for later retrieval.
Write to /tmp/brightdata_request.json:
[
{"url": "https://twitter.com/username"},
{"url": "https://twitter.com/username2"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Response:
{
"snapshot_id": "s_m4x7enmven8djfqak"
}
2. Trigger Scraping (Synchronous)
Get results immediately in the response (for small requests).
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.reddit.com/r/technology/comments/xxxxx"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
3. Monitor Progress
Check the status of a scraping job (replace <snapshot-id> with your actual snapshot ID):
curl -s "https://api.brightdata.com/datasets/v3/progress/<snapshot-id>" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)"
Response:
{
"snapshot_id": "s_m4x7enmven8djfqak",
"dataset_id": "gd_xxxxx",
"status": "running"
}
Status values: running, ready, failed
4. Download Results
Once status is ready, download the collected data (replace <snapshot-id> with your actual snapshot ID):
curl -s "https://api.brightdata.com/datasets/v3/snapshot/<snapshot-id>?format=json" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)"
5. List Snapshots
Get all your snapshots:
curl -s "https://api.brightdata.com/datasets/v3/snapshots" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" | jq '.[] | {snapshot_id, dataset_id, status}'
6. Cancel Snapshot
Cancel a running job (replace <snapshot-id> with your actual snapshot ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=<snapshot-id>" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)"
Platform-Specific Examples
Twitter/X - Scrape Profile
Write to /tmp/brightdata_request.json:
[
{"url": "https://twitter.com/elonmusk"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Returns: x_id, profile_name, biography, is_verified, followers, following, profile_image_link
Twitter/X - Scrape Posts
Write to /tmp/brightdata_request.json:
[
{"url": "https://twitter.com/username/status/123456789"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Returns: post_id, text, replies, likes, retweets, views, hashtags, media
Reddit - Scrape Subreddit Posts
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.reddit.com/r/technology", "sort_by": "hot"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Parameters: url, sort_by (new/top/hot)
Returns: post_id, title, description, num_comments, upvotes, date_posted, community
Reddit - Scrape Comments
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.reddit.com/r/technology/comments/xxxxx/post_title"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Returns: comment_id, user_posted, comment_text, upvotes, replies
YouTube - Scrape Video Info
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Returns: title, views, likes, num_comments, video_length, transcript, channel_name
YouTube - Search by Keyword
Write to /tmp/brightdata_request.json:
[
{"keyword": "artificial intelligence", "num_of_posts": 50}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
YouTube - Scrape Comments
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.youtube.com/watch?v=xxxxx", "load_replies": 3}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Returns: comment_text, likes, replies, username, date
Instagram - Scrape Profile
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.instagram.com/username"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Returns: followers, post_count, profile_name, is_verified, biography
Instagram - Scrape Posts
Write to /tmp/brightdata_request.json:
[
{
"url": "https://www.instagram.com/username",
"num_of_posts": 20,
"start_date": "01-01-2024",
"end_date": "12-31-2024"
}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Account Management
Check Account Status
curl -s "https://api.brightdata.com/status" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)"
Response:
{
"status": "active",
"customer": "hl_xxxxxxxx",
"can_make_requests": true,
"ip": "x.x.x.x"
}
Get Active Zones
curl -s "https://api.brightdata.com/zone/get_active_zones" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" | jq '.[] | {name, type}'
Get Bandwidth Usage
curl -s "https://api.brightdata.com/customer/bw" \
-H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)"
Getting Dataset IDs
To use the scraping features, you need a dataset_id:
- Go to Bright Data Control Panel
- Create a new Web Scraper dataset or select an existing one
- Choose the platform (Twitter, Reddit, YouTube, etc.)
- Copy the
dataset_idfrom the dataset settings
Dataset IDs can also be found in the bandwidth usage API response under the data field keys (e.g., v__ds_api_gd_xxxxx where gd_xxxxx is your dataset ID).
Common Parameters
| Parameter | Description | Example |
|---|---|---|
url |
Target URL to scrape | https://twitter.com/user |
keyword |
Search keyword | "artificial intelligence" |
num_of_posts |
Limit number of results | 50 |
start_date |
Filter by date (MM-DD-YYYY) | "01-01-2024" |
end_date |
Filter by date (MM-DD-YYYY) | "12-31-2024" |
sort_by |
Sort order (Reddit) | new, top, hot |
format |
Response format | json, csv |
Rate Limits
- Batch mode: up to 100 concurrent requests
- Maximum input size: 1GB per batch
- Exceeding limits returns
429error
Guidelines
- Create datasets first: Use the Control Panel to create scraper datasets
- Use async for large jobs: Use
/triggerfor discovery and batch operations - Use sync for small jobs: Use
/scrapefor single URL quick lookups - Check status before download: Poll
/progressuntil status isready - Respect rate limits: Don't exceed 100 concurrent requests
- Date format: Use MM-DD-YYYY for date parameters
Didn't find tool you were looking for?