Agent skill
tavily-extract
Extract clean markdown or text content from specific URLs via the Tavily CLI. Use this skill when the user has one or more URLs and wants their content, says "extract", "grab the content from", "pull the text from", "get the page at", "read this webpage", or needs clean text from web pages. Handles JavaScript-rendered pages, returns LLM-optimized markdown, and supports query-focused chunking for targeted extraction. Can process up to 20 URLs in a single call.
Install this agent skill to your Project
npx add-skill https://github.com/tavily-ai/skills/tree/main/skills/tavily-extract
SKILL.md
tavily extract
Extract clean markdown or text content from one or more URLs.
Before running any command
If tvly is not found on PATH, install it first:
curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login
Do not skip this step or fall back to other tools.
See tavily-cli for alternative install methods and auth options.
When to use
- You have a specific URL and want its content
- You need text from JavaScript-rendered pages
- Step 2 in the workflow: search → extract → map → crawl → research
Quick start
# Single URL
tvly extract "https://example.com/article" --json
# Multiple URLs
tvly extract "https://example.com/page1" "https://example.com/page2" --json
# Query-focused extraction (returns relevant chunks only)
tvly extract "https://example.com/docs" --query "authentication API" --chunks-per-source 3 --json
# JS-heavy pages
tvly extract "https://app.example.com" --extract-depth advanced --json
# Save to file
tvly extract "https://example.com/article" -o article.md
Options
| Option | Description |
|---|---|
--query |
Rerank chunks by relevance to this query |
--chunks-per-source |
Chunks per URL (1-5, requires --query) |
--extract-depth |
basic (default) or advanced (for JS pages) |
--format |
markdown (default) or text |
--include-images |
Include image URLs |
--timeout |
Max wait time (1-60 seconds) |
-o, --output |
Save output to file |
--json |
Structured JSON output |
Extract depth
| Depth | When to use |
|---|---|
basic |
Simple pages, fast — try this first |
advanced |
JS-rendered SPAs, dynamic content, tables |
Tips
- Max 20 URLs per request — batch larger lists into multiple calls.
- Use
--query+--chunks-per-sourceto get only relevant content instead of full pages. - Try
basicfirst, fall back toadvancedif content is missing. - Set
--timeoutfor slow pages (up to 60s). - If search results already contain the content you need (via
--include-raw-content), skip the extract step.
See also
- tavily-search — find pages when you don't have a URL
- tavily-crawl — extract content from many pages on a site
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
tavily-search
Search the web with LLM-optimized results via the Tavily CLI. Use this skill when the user wants to search the web, find articles, look up information, get recent news, discover sources, or says "search for", "find me", "look up", "what's the latest on", "find articles about", or needs current information from the internet. Returns relevant results with content snippets, relevance scores, and metadata — optimized for LLM consumption. Supports domain filtering, time ranges, and multiple search depths.
tavily-map
Discover and list all URLs on a website without extracting content, via the Tavily CLI. Use this skill when the user wants to find a specific page on a large site, list all URLs, see the site structure, find where something is on a domain, or says "map the site", "find the URL for", "what pages are on", "list all pages", or "site structure". Faster than crawling — returns URLs only. Essential when you know the site but not the exact page. Combine with extract for targeted content retrieval.
tavily-best-practices
Build production-ready Tavily integrations with best practices baked in. Reference documentation for developers using coding assistants (Claude Code, Cursor, etc.) to implement web search, content extraction, crawling, and research in agentic workflows, RAG systems, or autonomous agents.
tavily-crawl
Crawl websites and extract content from multiple pages via the Tavily CLI. Use this skill when the user wants to crawl a site, download documentation, extract an entire docs section, bulk-extract pages, save a site as local markdown files, or says "crawl", "get all the pages", "download the docs", "extract everything under /docs", "bulk extract", or needs content from many pages on the same domain. Supports depth/breadth control, path filtering, semantic instructions, and saving each page as a local markdown file.
tavily-research
Conduct comprehensive AI-powered research with citations via the Tavily CLI. Use this skill when the user wants deep research, a detailed report, a comparison, market analysis, literature review, or says "research", "investigate", "analyze in depth", "compare X vs Y", "what does the market look like for", or needs multi-source synthesis with explicit citations. Returns a structured report grounded in web sources. Takes 30-120 seconds. For quick fact-finding, use tavily-search instead.
tavily-cli
Web search, content extraction, crawling, and deep research via the Tavily CLI. Use this skill whenever the user wants to search the web, find articles, research a topic, look something up online, extract content from a URL, grab text from a webpage, crawl documentation, download a site's pages, discover URLs on a domain, or conduct in-depth research with citations. Also use when they say "fetch this page", "pull the content from", "get the page at https://", "find me articles about", or reference extracting data from external websites. This provides LLM-optimized web search, content extraction, site crawling, URL discovery, and AI-powered deep research — capabilities beyond what agents can do natively. Do NOT trigger for local file operations, git commands, deployments, or code editing tasks.
Didn't find tool you were looking for?