Web Analyzer MCP

Intelligent web content analysis and summarization via MCP.

Stars

Forks

Watchers

Issues

Web Analyzer MCP is an MCP-compliant server designed for intelligent web content analysis and summarization. It leverages FastMCP to perform advanced web scraping, content extraction, and AI-powered question-answering using OpenAI models. The tool integrates with various developer IDEs, offering structured markdown output, essential content extraction, and smart Q&A functionality. Its features streamline content analysis workflows and support flexible model selection.

Key Features

Extracts and summarizes key web page content to markdown

Removes ads, navigation, and irrelevant page elements

AI-powered question-answering about web content

Content importance scoring and ranking algorithms

Structured output optimized for analysis

Supports multiple large language models (GPT-3.5, GPT-4, GPT-4 Turbo, GPT-5)

Integration with popular IDEs like VS Code, PyCharm, Cursor, and Claude Desktop

Easy installation using uv and Smithery

Customizable configuration via environment variables

Automated relevance matching and intelligent content chunking

Use Cases

Summarizing long-form web articles for quick reference

Extracting structured data from news sites or blogs

Answering user questions based on specific web content

Filtering essential content from cluttered web pages

Enhancing AI-enabled IDEs with contextual web information

Assisting researchers in gathering summarized insights from various sources

Powering personal knowledge management workflows

Supporting fact-checking by extracting and querying source material

Improving web-based data pipelines with clean content extraction

Automating markdown report generation from online resources

README

🔍 Web Analyzer MCP

A powerful MCP (Model Context Protocol) server for intelligent web content analysis and summarization. Built with FastMCP, this server provides smart web scraping, content extraction, and AI-powered question-answering capabilities.

✨ Features

🎯 Core Tools

url_to_markdown - Extract and summarize key web page content
- Analyzes content importance using custom algorithms
- Removes ads, navigation, and irrelevant content
- Keeps only essential information (tables, images, key text)
- Outputs structured markdown optimized for analysis
web_content_qna - AI-powered Q&A about web content
- Extracts relevant content sections from web pages
- Uses intelligent chunking and relevance matching
- Answers questions using OpenAI GPT models

🚀 Key Features

Smart Content Ranking: Algorithm-based content importance scoring
Essential Content Only: Removes clutter, keeps what matters
Multi-IDE Support: Works with Claude Desktop, Cursor, VS Code, PyCharm
Flexible Models: Choose from GPT-3.5, GPT-4, GPT-4 Turbo, or GPT-5

📦 Installation

Prerequisites

uv (Python package manager)
Chrome/Chromium browser (for Selenium)
OpenAI API key (for Q&A functionality)

🚀 Quick Start with uv (Recommended)

bash

# Clone the repository
git clone https://github.com/kimdonghwi94/web-analyzer-mcp.git
cd web-analyzer-mcp

# Run directly with uv (auto-installs dependencies)
uv run mcp-webanalyzer

Installing via Smithery

To install web-analyzer-mcp for Claude Desktop automatically via Smithery:

bash

npx -y @smithery/cli install @kimdonghwi94/web-analyzer-mcp --client claude

IDE/Editor Integration

Add to your Claude Desktop_config.json file. See Claude Desktop MCP documentation for more details.

json

{
  "mcpServers": {
    "web-analyzer": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/web-analyzer-mcp",
        "run", 
        "mcp-webanalyzer"
      ],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "OPENAI_MODEL": "gpt-4"
      }
    }
  }
}

Add the server using Claude Code CLI:

bash

claude mcp add web-analyzer -e OPENAI_API_KEY=your_api_key_here -e OPENAI_MODEL=gpt-4 -- uv --directory /path/to/web-analyzer-mcp run mcp-webanalyzer

Add to your Cursor settings (File > Preferences > Settings > Extensions > MCP):

json

{
  "mcpServers": {
    "web-analyzer": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/web-analyzer-mcp",
        "run", 
        "mcp-webanalyzer"
      ],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "OPENAI_MODEL": "gpt-4"
      }
    }
  }
}

See JetBrains AI Assistant Documentation for more details.

In JetBrains IDEs go to Settings → Tools → AI Assistant → Model Context Protocol (MCP)
Click + Add
Click on Command in the top-left corner of the dialog and select the As JSON option from the list
Add this configuration and click OK:

json

{
  "mcpServers": {
    "web-analyzer": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/web-analyzer-mcp",
        "run", 
        "mcp-webanalyzer"
      ],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "OPENAI_MODEL": "gpt-4"
      }
    }
  }
}

🎛️ Tool Descriptions

`url_to_markdown`

Converts web pages to clean markdown format with essential content extraction.

Parameters:

url (string): The web page URL to analyze

Returns: Clean markdown content with structured data preservation

`web_content_qna`

Answers questions about web page content using intelligent content analysis.

Parameters:

url (string): The web page URL to analyze
question (string): Question about the page content

Returns: AI-generated answer based on page content

🏗️ Architecture

Content Extraction Pipeline

URL Validation - Ensures proper URL format
HTML Fetching - Uses Selenium for dynamic content
Content Parsing - BeautifulSoup for HTML processing
Element Scoring - Custom algorithm ranks content importance
Content Filtering - Removes duplicates and low-value content
Markdown Conversion - Structured output generation

Q&A Processing Pipeline

Content Chunking - Intelligent text segmentation
Relevance Scoring - Matches content to questions
Context Selection - Picks most relevant chunks
Answer Generation - OpenAI GPT integration

🏗️ Project Structure

web-analyzer-mcp/
├── web_analyzer_mcp/          # Main Python package
│   ├── __init__.py           # Package initialization
│   ├── server.py             # FastMCP server with tools
│   ├── web_extractor.py      # Web content extraction engine
│   └── rag_processor.py      # RAG-based Q&A processor
├── scripts/                   # Build and utility scripts
│   └── build.js              # Node.js build script
├── README.md                 # English documentation
├── README.ko.md              # Korean documentation
├── package.json              # npm configuration and scripts
├── pyproject.toml            # Python package configuration
├── .env.example              # Environment variables template
└── dist-info.json            # Build information (generated)

🛠️ Development

Modern Development with uv

bash

# Clone repository
git clone https://github.com/kimdonghwi94/web-analyzer-mcp.git
cd web-analyzer-mcp

# Development commands
uv run mcp-webanalyzer     # Start development server
uv run python -m pytest   # Run tests
uv run ruff check .        # Lint code
uv run ruff format .       # Format code
uv sync                    # Sync dependencies

# Install development dependencies
uv add --dev pytest ruff mypy

# Create production build
npm run build

Alternative: Traditional Python Development

bash

# Setup Python environment (if not using uv)
pip install -e .[dev]

# Development commands
python -m web_analyzer_mcp.server  # Start server
python -m pytest tests/            # Run tests
python -m ruff check .             # Lint code
python -m ruff format .            # Format code
python -m mypy web_analyzer_mcp/   # Type checking

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📋 Roadmap

Support for more content types (PDFs, videos)
Multi-language content extraction
Custom extraction rules
Caching for frequently accessed content
Webhook support for real-time updates

⚠️ Limitations

Requires Chrome/Chromium for JavaScript-heavy sites
OpenAI API key needed for Q&A functionality
Rate limited to prevent abuse
Some sites may block automated access

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙋‍♂️ Support

Create an issue for bug reports or feature requests
Contribute to discussions in the GitHub repository
Check the documentation for detailed guides

🌟 Acknowledgments

Built with FastMCP framework
Inspired by HTMLRAG techniques for web content processing
Thanks to the MCP community for feedback and contributions

Made with ❤️ for the MCP community

Star History

Repository Owner

kimdonghwi94

User

Repository Details

Language Python

Default Branch master

Size 145 KB

Contributors 3

License MIT License

MCP Verified Nov 12, 2025

Programming Languages

Python

100%

Related MCPs

Discover similar Model Context Protocol servers

WebScraping.AI MCP Server

MCP server for advanced web scraping and AI-driven data extraction

WebScraping.AI MCP Server implements the Model Context Protocol to provide web data extraction and question answering functionalities. It integrates with WebScraping.AI to offer robust tools for retrieving, rendering, and parsing web content, including structured data and natural language answers from web pages. It supports JavaScript rendering, proxy management, device emulation, and custom extraction configurations, making it suitable for both individual and team deployments in AI-assisted workflows.

⭐ 33
MCP
webscraping-ai/webscraping-ai-mcp-server

ScreenMonitorMCP v2

Real-time screen monitoring and visual analysis for AI assistants via MCP.

ScreenMonitorMCP v2 is a Model Context Protocol (MCP) server enabling AI assistants to capture, analyze, and interact with screen content in real time. It supports instant screenshots, live streaming, advanced vision-based analysis, and provides performance monitoring across Windows, macOS, and Linux. Integration with clients like Claude Desktop is streamlined, offering easy configuration and broad compatibility. The tool leverages AI vision models to provide intelligent insights into screen content and system health.

⭐ 64
MCP
inkbytefo/ScreenMonitorMCP

Semgrep MCP Server

A Model Context Protocol server powered by Semgrep for seamless code analysis integration.

Semgrep MCP Server implements the Model Context Protocol (MCP) to enable efficient and standardized communication for code analysis tasks. It facilitates integration with platforms like LM Studio, Cursor, and Visual Studio Code, providing both Docker and Python (PyPI) deployment options. The tool is now maintained in the main Semgrep repository with continued updates, enhancing compatibility and support across developer tools.

⭐ 611
MCP
semgrep/mcp

MCP Server for Cortex

Bridge Cortex threat analysis capabilities to MCP-compatible clients like Claude.

MCP Server for Cortex exposes the analysis capabilities of a Cortex instance as tools consumable by Model Context Protocol (MCP) clients, such as large language models. It enables these clients to request threat intelligence analyses via Cortex and receive structured results. The server supports easy configuration, secure authentication, and flexible analyzer selection for integrating threat intelligence tasks into automated AI workflows.

⭐ 12
MCP
gbrigandi/mcp-server-cortex

Reddit Summarizer MCP Server

Summarize Reddit content through a Model Context Protocol server.

Reddit Summarizer MCP Server provides an MCP-compliant server interface for summarizing Reddit homepages, specific subreddits, and comments on individual posts. It enables users to extract concise reports from Reddit using customizable parameters such as sorting methods, comment inclusion, and post limits. The tool is designed to integrate with MCP clients like Claude Desktop and leverages the Reddit API for data retrieval. Support for environment variables and structured prompts ensures adaptability for a variety of summarization needs.

⭐ 11
MCP
sinanefeozler/reddit-summarizer-mcp

ScreenPilot

Empower LLMs with full device control through screen automation.

ScreenPilot provides an MCP server interface to enable large language models to interact with and control graphical user interfaces on a device. It offers a comprehensive toolkit for screen capture, mouse control, keyboard input, scrolling, element detection, and action sequencing. The toolkit is suitable for automation, education, and experimentation, allowing AI agents to perform complex operations on a user’s device.

⭐ 50
MCP
Mtehabsim/ScreenPilot

View all Alternatives

Didn't find tool you were looking for?

Search AI Tools

Web Analyzer MCP

Key Features

Use Cases

README

🔍 Web Analyzer MCP

✨ Features

🎯 Core Tools

🚀 Key Features

📦 Installation

Prerequisites

🚀 Quick Start with uv (Recommended)

Installing via Smithery

IDE/Editor Integration

🎛️ Tool Descriptions

`url_to_markdown`

`web_content_qna`

🏗️ Architecture

Content Extraction Pipeline

Q&A Processing Pipeline

🏗️ Project Structure

🛠️ Development

Modern Development with uv

Alternative: Traditional Python Development

🤝 Contributing

📋 Roadmap

⚠️ Limitations

📄 License

🙋‍♂️ Support

🌟 Acknowledgments

Star History

Repository Owner

Repository Details

Programming Languages

Tags

Related MCPs