Agent skill

bright-data

Bright Data proxy and web scraping API. Use when user mentions "Bright Data", "proxy", "web scraping at scale", or data collection.

View SKILL.md on GitHub Repository

Stars 50

Forks 8

Install this agent skill to your Project

npx add-skill https://github.com/vm0-ai/vm0-skills/tree/main/bright-data

SKILL.md

Bright Data Web Scraper API

Use the Bright Data API via direct curl calls for social media scraping, web data extraction, and account management.

Official docs: https://docs.brightdata.com/

When to Use

Use this skill when you need to:

Scrape social media - Twitter/X, Reddit, YouTube, Instagram, TikTok, LinkedIn
Extract web data - Posts, profiles, comments, engagement metrics
Monitor usage - Track bandwidth and request usage
Manage account - Check status and zones

Prerequisites

Sign up at Bright Data
Get your API key from Settings > Users
Create a Web Scraper dataset in the Control Panel to get your dataset_id

bash

export BRIGHTDATA_TOKEN="your-api-key"

Base URL

https://api.brightdata.com

Social Media Scraping

Bright Data supports scraping these social media platforms:

Platform	Profiles	Posts	Comments	Reels/Videos
Twitter/X	✅	✅	-	-
Reddit	-	✅	✅	-
YouTube	✅	✅	✅	-
Instagram	✅	✅	✅	✅
TikTok	✅	✅	✅	-
LinkedIn	✅	✅	-	-

How to Use

1. Trigger Scraping (Asynchronous)

Trigger a data collection job and get a snapshot_id for later retrieval.

Write to /tmp/brightdata_request.json:

json

[
  {"url": "https://twitter.com/username"},
  {"url": "https://twitter.com/username2"}
]

Then run (replace <dataset-id> with your actual dataset ID):

bash

curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json

Response:

json

{
  "snapshot_id": "s_m4x7enmven8djfqak"
}

2. Trigger Scraping (Synchronous)

Get results immediately in the response (for small requests).

Write to /tmp/brightdata_request.json:

json

[
  {"url": "https://www.reddit.com/r/technology/comments/xxxxx"}
]

Then run (replace <dataset-id> with your actual dataset ID):

bash

curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json

3. Monitor Progress

Check the status of a scraping job (replace <snapshot-id> with your actual snapshot ID):

bash

curl -s "https://api.brightdata.com/datasets/v3/progress/<snapshot-id>" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)"

Response:

json

{
  "snapshot_id": "s_m4x7enmven8djfqak",
  "dataset_id": "gd_xxxxx",
  "status": "running"
}

Status values: running, ready, failed

4. Download Results

Once status is ready, download the collected data (replace <snapshot-id> with your actual snapshot ID):

bash

curl -s "https://api.brightdata.com/datasets/v3/snapshot/<snapshot-id>?format=json" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)"

5. List Snapshots

Get all your snapshots:

bash

curl -s "https://api.brightdata.com/datasets/v3/snapshots" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" | jq '.[] | {snapshot_id, dataset_id, status}'

6. Cancel Snapshot

Cancel a running job (replace <snapshot-id> with your actual snapshot ID):

bash

curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=<snapshot-id>" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)"

Platform-Specific Examples

Twitter/X - Scrape Profile

Write to /tmp/brightdata_request.json:

json

[
  {"url": "https://twitter.com/elonmusk"}
]

Then run (replace <dataset-id> with your actual dataset ID):

bash

curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json

Returns: x_id, profile_name, biography, is_verified, followers, following, profile_image_link

Twitter/X - Scrape Posts

Write to /tmp/brightdata_request.json:

json

[
  {"url": "https://twitter.com/username/status/123456789"}
]

Then run (replace <dataset-id> with your actual dataset ID):

bash

curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json

Returns: post_id, text, replies, likes, retweets, views, hashtags, media

Reddit - Scrape Subreddit Posts

Write to /tmp/brightdata_request.json:

json

[
  {"url": "https://www.reddit.com/r/technology", "sort_by": "hot"}
]

Then run (replace <dataset-id> with your actual dataset ID):

bash

curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json

Parameters: url, sort_by (new/top/hot)

Returns: post_id, title, description, num_comments, upvotes, date_posted, community

Reddit - Scrape Comments

Write to /tmp/brightdata_request.json:

json

[
  {"url": "https://www.reddit.com/r/technology/comments/xxxxx/post_title"}
]

Then run (replace <dataset-id> with your actual dataset ID):

bash

curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json

Returns: comment_id, user_posted, comment_text, upvotes, replies

YouTube - Scrape Video Info

Write to /tmp/brightdata_request.json:

json

[
  {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
]

Then run (replace <dataset-id> with your actual dataset ID):

bash

curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json

Returns: title, views, likes, num_comments, video_length, transcript, channel_name

YouTube - Search by Keyword

Write to /tmp/brightdata_request.json:

json

[
  {"keyword": "artificial intelligence", "num_of_posts": 50}
]

Then run (replace <dataset-id> with your actual dataset ID):

bash

curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json

YouTube - Scrape Comments

Write to /tmp/brightdata_request.json:

json

[
  {"url": "https://www.youtube.com/watch?v=xxxxx", "load_replies": 3}
]

Then run (replace <dataset-id> with your actual dataset ID):

bash

curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json

Returns: comment_text, likes, replies, username, date

Instagram - Scrape Profile

Write to /tmp/brightdata_request.json:

json

[
  {"url": "https://www.instagram.com/username"}
]

Then run (replace <dataset-id> with your actual dataset ID):

bash

curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json

Returns: followers, post_count, profile_name, is_verified, biography

Instagram - Scrape Posts

Write to /tmp/brightdata_request.json:

json

[
  {
    "url": "https://www.instagram.com/username",
    "num_of_posts": 20,
    "start_date": "01-01-2024",
    "end_date": "12-31-2024"
  }
]

Then run (replace <dataset-id> with your actual dataset ID):

bash

curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json

Account Management

Check Account Status

bash

curl -s "https://api.brightdata.com/status" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)"

Response:

json

{
  "status": "active",
  "customer": "hl_xxxxxxxx",
  "can_make_requests": true,
  "ip": "x.x.x.x"
}

Get Active Zones

bash

curl -s "https://api.brightdata.com/zone/get_active_zones" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)" | jq '.[] | {name, type}'

Get Bandwidth Usage

bash

curl -s "https://api.brightdata.com/customer/bw" \
  -H "Authorization: Bearer $(printenv BRIGHTDATA_TOKEN)"

Getting Dataset IDs

To use the scraping features, you need a dataset_id:

Go to Bright Data Control Panel
Create a new Web Scraper dataset or select an existing one
Choose the platform (Twitter, Reddit, YouTube, etc.)
Copy the dataset_id from the dataset settings

Dataset IDs can also be found in the bandwidth usage API response under the data field keys (e.g., v__ds_api_gd_xxxxx where gd_xxxxx is your dataset ID).

Common Parameters

Parameter	Description	Example
`url`	Target URL to scrape	`https://twitter.com/user`
`keyword`	Search keyword	`"artificial intelligence"`
`num_of_posts`	Limit number of results	`50`
`start_date`	Filter by date (MM-DD-YYYY)	`"01-01-2024"`
`end_date`	Filter by date (MM-DD-YYYY)	`"12-31-2024"`
`sort_by`	Sort order (Reddit)	`new`, `top`, `hot`
`format`	Response format	`json`, `csv`

Rate Limits

Batch mode: up to 100 concurrent requests
Maximum input size: 1GB per batch
Exceeding limits returns 429 error

Guidelines

Create datasets first: Use the Control Panel to create scraper datasets
Use async for large jobs: Use /trigger for discovery and batch operations
Use sync for small jobs: Use /scrape for single URL quick lookups
Check status before download: Poll /progress until status is ready
Respect rate limits: Don't exceed 100 concurrent requests
Date format: Use MM-DD-YYYY for date parameters

Maintainer

vm0-ai Core maintainer

Source details

Full Name: vm0-ai/vm0-skills
Branch: main
Path in repo: bright-data

Featured Tools

Join Our Newsletter

Stripe API for payments. Use when user mentions "Stripe", "payment", "subscription", "billing", "invoice", or asks about payment processing.

50 8

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Bright Data Web Scraper API

When to Use

Prerequisites

Base URL

Social Media Scraping

How to Use

1. Trigger Scraping (Asynchronous)

2. Trigger Scraping (Synchronous)

3. Monitor Progress

4. Download Results

5. List Snapshots

6. Cancel Snapshot

Platform-Specific Examples

Twitter/X - Scrape Profile

Twitter/X - Scrape Posts

Reddit - Scrape Subreddit Posts

Reddit - Scrape Comments

YouTube - Scrape Video Info

YouTube - Search by Keyword

YouTube - Scrape Comments

Instagram - Scrape Profile

Instagram - Scrape Posts

Account Management

Check Account Status

Get Active Zones

Get Bandwidth Usage

Getting Dataset IDs

Common Parameters

Rate Limits

Guidelines

Recommended Agent Skills

brave-search

supadata

roadmap-planning

qdrant

calendly

stripe