Web Analyzer MCP
Intelligent web content analysis and summarization via MCP.
Key Features
Use Cases
README
π Web Analyzer MCP
A powerful MCP (Model Context Protocol) server for intelligent web content analysis and summarization. Built with FastMCP, this server provides smart web scraping, content extraction, and AI-powered question-answering capabilities.
β¨ Features
π― Core Tools
-
url_to_markdown- Extract and summarize key web page content- Analyzes content importance using custom algorithms
- Removes ads, navigation, and irrelevant content
- Keeps only essential information (tables, images, key text)
- Outputs structured markdown optimized for analysis
-
web_content_qna- AI-powered Q&A about web content- Extracts relevant content sections from web pages
- Uses intelligent chunking and relevance matching
- Answers questions using OpenAI GPT models
π Key Features
- Smart Content Ranking: Algorithm-based content importance scoring
- Essential Content Only: Removes clutter, keeps what matters
- Multi-IDE Support: Works with Claude Desktop, Cursor, VS Code, PyCharm
- Flexible Models: Choose from GPT-3.5, GPT-4, GPT-4 Turbo, or GPT-5
π¦ Installation
Prerequisites
- uv (Python package manager)
- Chrome/Chromium browser (for Selenium)
- OpenAI API key (for Q&A functionality)
π Quick Start with uv (Recommended)
# Clone the repository
git clone https://github.com/kimdonghwi94/web-analyzer-mcp.git
cd web-analyzer-mcp
# Run directly with uv (auto-installs dependencies)
uv run mcp-webanalyzer
Installing via Smithery
To install web-analyzer-mcp for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @kimdonghwi94/web-analyzer-mcp --client claude
IDE/Editor Integration
Add to your Claude Desktop_config.json file. See Claude Desktop MCP documentation for more details.
{
"mcpServers": {
"web-analyzer": {
"command": "uv",
"args": [
"--directory",
"/path/to/web-analyzer-mcp",
"run",
"mcp-webanalyzer"
],
"env": {
"OPENAI_API_KEY": "your_openai_api_key_here",
"OPENAI_MODEL": "gpt-4"
}
}
}
}
Add the server using Claude Code CLI:
claude mcp add web-analyzer -e OPENAI_API_KEY=your_api_key_here -e OPENAI_MODEL=gpt-4 -- uv --directory /path/to/web-analyzer-mcp run mcp-webanalyzer
Add to your Cursor settings (File > Preferences > Settings > Extensions > MCP):
{
"mcpServers": {
"web-analyzer": {
"command": "uv",
"args": [
"--directory",
"/path/to/web-analyzer-mcp",
"run",
"mcp-webanalyzer"
],
"env": {
"OPENAI_API_KEY": "your_openai_api_key_here",
"OPENAI_MODEL": "gpt-4"
}
}
}
}
See JetBrains AI Assistant Documentation for more details.
- In JetBrains IDEs go to Settings β Tools β AI Assistant β Model Context Protocol (MCP)
- Click + Add
- Click on Command in the top-left corner of the dialog and select the As JSON option from the list
- Add this configuration and click OK:
{
"mcpServers": {
"web-analyzer": {
"command": "uv",
"args": [
"--directory",
"/path/to/web-analyzer-mcp",
"run",
"mcp-webanalyzer"
],
"env": {
"OPENAI_API_KEY": "your_openai_api_key_here",
"OPENAI_MODEL": "gpt-4"
}
}
}
}
ποΈ Tool Descriptions
url_to_markdown
Converts web pages to clean markdown format with essential content extraction.
Parameters:
url(string): The web page URL to analyze
Returns: Clean markdown content with structured data preservation
web_content_qna
Answers questions about web page content using intelligent content analysis.
Parameters:
url(string): The web page URL to analyzequestion(string): Question about the page content
Returns: AI-generated answer based on page content
ποΈ Architecture
Content Extraction Pipeline
- URL Validation - Ensures proper URL format
- HTML Fetching - Uses Selenium for dynamic content
- Content Parsing - BeautifulSoup for HTML processing
- Element Scoring - Custom algorithm ranks content importance
- Content Filtering - Removes duplicates and low-value content
- Markdown Conversion - Structured output generation
Q&A Processing Pipeline
- Content Chunking - Intelligent text segmentation
- Relevance Scoring - Matches content to questions
- Context Selection - Picks most relevant chunks
- Answer Generation - OpenAI GPT integration
ποΈ Project Structure
web-analyzer-mcp/
βββ web_analyzer_mcp/ # Main Python package
β βββ __init__.py # Package initialization
β βββ server.py # FastMCP server with tools
β βββ web_extractor.py # Web content extraction engine
β βββ rag_processor.py # RAG-based Q&A processor
βββ scripts/ # Build and utility scripts
β βββ build.js # Node.js build script
βββ README.md # English documentation
βββ README.ko.md # Korean documentation
βββ package.json # npm configuration and scripts
βββ pyproject.toml # Python package configuration
βββ .env.example # Environment variables template
βββ dist-info.json # Build information (generated)
π οΈ Development
Modern Development with uv
# Clone repository
git clone https://github.com/kimdonghwi94/web-analyzer-mcp.git
cd web-analyzer-mcp
# Development commands
uv run mcp-webanalyzer # Start development server
uv run python -m pytest # Run tests
uv run ruff check . # Lint code
uv run ruff format . # Format code
uv sync # Sync dependencies
# Install development dependencies
uv add --dev pytest ruff mypy
# Create production build
npm run build
Alternative: Traditional Python Development
# Setup Python environment (if not using uv)
pip install -e .[dev]
# Development commands
python -m web_analyzer_mcp.server # Start server
python -m pytest tests/ # Run tests
python -m ruff check . # Lint code
python -m ruff format . # Format code
python -m mypy web_analyzer_mcp/ # Type checking
π€ Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
π Roadmap
- Support for more content types (PDFs, videos)
- Multi-language content extraction
- Custom extraction rules
- Caching for frequently accessed content
- Webhook support for real-time updates
β οΈ Limitations
- Requires Chrome/Chromium for JavaScript-heavy sites
- OpenAI API key needed for Q&A functionality
- Rate limited to prevent abuse
- Some sites may block automated access
π License
This project is licensed under the MIT License - see the LICENSE file for details.
πββοΈ Support
- Create an issue for bug reports or feature requests
- Contribute to discussions in the GitHub repository
- Check the documentation for detailed guides
π Acknowledgments
- Built with FastMCP framework
- Inspired by HTMLRAG techniques for web content processing
- Thanks to the MCP community for feedback and contributions
Made with β€οΈ for the MCP community
Star History
Repository Owner
User
Repository Details
Programming Languages
Tags
Join Our Newsletter
Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.
Related MCPs
Discover similar Model Context Protocol servers
WebScraping.AI MCP Server
MCP server for advanced web scraping and AI-driven data extraction
WebScraping.AI MCP Server implements the Model Context Protocol to provide web data extraction and question answering functionalities. It integrates with WebScraping.AI to offer robust tools for retrieving, rendering, and parsing web content, including structured data and natural language answers from web pages. It supports JavaScript rendering, proxy management, device emulation, and custom extraction configurations, making it suitable for both individual and team deployments in AI-assisted workflows.
- β 33
- MCP
- webscraping-ai/webscraping-ai-mcp-server
ScreenMonitorMCP v2
Real-time screen monitoring and visual analysis for AI assistants via MCP.
ScreenMonitorMCP v2 is a Model Context Protocol (MCP) server enabling AI assistants to capture, analyze, and interact with screen content in real time. It supports instant screenshots, live streaming, advanced vision-based analysis, and provides performance monitoring across Windows, macOS, and Linux. Integration with clients like Claude Desktop is streamlined, offering easy configuration and broad compatibility. The tool leverages AI vision models to provide intelligent insights into screen content and system health.
- β 64
- MCP
- inkbytefo/ScreenMonitorMCP
Semgrep MCP Server
A Model Context Protocol server powered by Semgrep for seamless code analysis integration.
Semgrep MCP Server implements the Model Context Protocol (MCP) to enable efficient and standardized communication for code analysis tasks. It facilitates integration with platforms like LM Studio, Cursor, and Visual Studio Code, providing both Docker and Python (PyPI) deployment options. The tool is now maintained in the main Semgrep repository with continued updates, enhancing compatibility and support across developer tools.
- β 611
- MCP
- semgrep/mcp
MCP Server for Cortex
Bridge Cortex threat analysis capabilities to MCP-compatible clients like Claude.
MCP Server for Cortex exposes the analysis capabilities of a Cortex instance as tools consumable by Model Context Protocol (MCP) clients, such as large language models. It enables these clients to request threat intelligence analyses via Cortex and receive structured results. The server supports easy configuration, secure authentication, and flexible analyzer selection for integrating threat intelligence tasks into automated AI workflows.
- β 12
- MCP
- gbrigandi/mcp-server-cortex
Reddit Summarizer MCP Server
Summarize Reddit content through a Model Context Protocol server.
Reddit Summarizer MCP Server provides an MCP-compliant server interface for summarizing Reddit homepages, specific subreddits, and comments on individual posts. It enables users to extract concise reports from Reddit using customizable parameters such as sorting methods, comment inclusion, and post limits. The tool is designed to integrate with MCP clients like Claude Desktop and leverages the Reddit API for data retrieval. Support for environment variables and structured prompts ensures adaptability for a variety of summarization needs.
- β 11
- MCP
- sinanefeozler/reddit-summarizer-mcp
ScreenPilot
Empower LLMs with full device control through screen automation.
ScreenPilot provides an MCP server interface to enable large language models to interact with and control graphical user interfaces on a device. It offers a comprehensive toolkit for screen capture, mouse control, keyboard input, scrolling, element detection, and action sequencing. The toolkit is suitable for automation, education, and experimentation, allowing AI agents to perform complex operations on a userβs device.
- β 50
- MCP
- Mtehabsim/ScreenPilot
Didn't find tool you were looking for?