MCPs
OpenZIM MCP Server

OpenZIM MCP Server

Transforms ZIM archives into intelligent, structured knowledge engines for LLMs.

Stars

Forks

Watchers

Issues

OpenZIM MCP Server provides structured, intelligent access to ZIM-format knowledge bases, enabling large language models to efficiently search, navigate, and understand content in offline archives. Dual operation modes allow support for both advanced and simple LLM integrations. It features smart navigation by namespace, context-aware discovery, intelligent search, and relationship mapping to optimize knowledge extraction and utilization.

Key Features

Dual mode support for simple and advanced LLMs

Smart navigation by namespace (articles, metadata, media)

Context-aware discovery of article structure and metadata

Intelligent and relevance-ranked search

Caching and pagination for performance on large archives

Mapping of internal and external content relationships

Secure and modern server implementation

Structured API for LLM integration

Autocomplete search suggestions

Optimized offline knowledge retrieval

Use Cases

Enabling offline research assistants for knowledge retrieval

Powering LLM-based knowledge chatbots with structured data

Building intelligent Q&A systems for offline ZIM archives

Performing content analysis and summarization of large knowledge bases

Supporting educational tools with offline encyclopedia access

Facilitating advanced search applications for knowledge repositories

Integrating structured offline sources into enterprise AI workflows

Developing AI-powered metadata extraction tools

Enhancing link and relationship analysis in knowledge graphs

Accelerating content discovery in resource-constrained environments

README

OpenZIM MCP Server

🎯 Now with Dual Mode Support! Choose between Full mode (15 specialized tools) or Simple mode (1 intelligent natural language tool) to match your LLM's capabilities.

🧠 Built for LLM Intelligence

OpenZIM MCP transforms static ZIM archives into dynamic knowledge engines for Large Language Models. Unlike basic file readers, this tool provides intelligent, structured access that LLMs need to effectively navigate and understand vast knowledge repositories.

🚀 Why LLMs Love OpenZIM MCP:

Smart Navigation: Browse by namespace (articles, metadata, media) instead of blind searching
Context-Aware Discovery: Get article structure, relationships, and metadata for deeper understanding
Intelligent Search: Advanced filtering, auto-complete suggestions, and relevance-ranked results
Performance Optimized: Cached operations and pagination prevent timeouts on massive archives
Relationship Mapping: Extract internal/external links to understand content connections

Whether you're building a research assistant, knowledge chatbot, or content analysis system, OpenZIM MCP gives your LLM the structured access patterns it needs to unlock the full potential of offline knowledge archives. No more fumbling through raw text dumps! 🎯

OpenZIM MCP is a modern, secure, and high-performance MCP (Model Context Protocol) server that enables AI models to access and search ZIM format knowledge bases offline.

ZIM (Zeno IMproved) is an open file format developed by the openZIM project, designed specifically for offline storage and access to website content. The format supports high compression rates using Zstandard compression (default since 2021) and enables fast full-text searching, making it ideal for storing entire Wikipedia content and other large reference materials in relatively compact files. The openZIM project is sponsored by Wikimedia CH and supported by the Wikimedia Foundation, ensuring the format's continued development and adoption for offline knowledge access, especially in environments without reliable internet connectivity.

✨ Features

🎯 Dual Mode Support: Choose between Full mode (15 specialized tools) or Simple mode (1 intelligent natural language tool)
🔒 Security First: Comprehensive input validation and path traversal protection
⚡ High Performance: Intelligent caching and optimized ZIM file operations
🧠 Smart Retrieval: Automatic fallback from direct access to search-based retrieval for reliable entry access
🧪 Well Tested: 90%+ test coverage with comprehensive test suite
🏗️ Modern Architecture: Modular design with dependency injection
📝 Type Safe: Full type annotations throughout the codebase
🔧 Configurable: Flexible configuration with validation
📊 Observable: Structured logging and health monitoring

🚀 Quick Start

Installation

bash

# Install from PyPI (recommended)
pip install openzim-mcp

Development Installation

For contributors and developers:

bash

# Clone the repository
git clone https://github.com/cameronrye/openzim-mcp.git
cd openzim-mcp

# Install dependencies
uv sync

# Install development dependencies
uv sync --dev

Prepare ZIM Files

Download ZIM files (e.g., Wikipedia, Wiktionary, etc.) from the Kiwix Library and place them in a directory:

bash

mkdir ~/zim-files
# Download ZIM files to ~/zim-files/

Running the Server

bash

# Full mode (default) - All 15 specialized tools
openzim-mcp /path/to/zim/files
python -m openzim_mcp /path/to/zim/files

# Simple mode - 1 intelligent natural language tool
openzim-mcp --mode simple /path/to/zim/files
python -m openzim_mcp --mode simple /path/to/zim/files

# For development (from source)
uv run python -m openzim_mcp /path/to/zim/files
uv run python -m openzim_mcp --mode simple /path/to/zim/files

# Or using make (development)
make run ZIM_DIR=/path/to/zim/files

Tool Modes

OpenZIM MCP supports two modes:

Full Mode (default): Exposes all 15 specialized MCP tools for maximum control
Simple Mode: Provides 1 intelligent tool (zim_query) that accepts natural language queries

See Simple Mode Guide for detailed information.

MCP Configuration

Full Mode (default):

json

{
  "openzim-mcp": {
    "command": "openzim-mcp",
    "args": ["/path/to/zim/files"]
  }
}

Simple Mode:

json

{
  "openzim-mcp-simple": {
    "command": "openzim-mcp",
    "args": ["--mode", "simple", "/path/to/zim/files"]
  }
}

Alternative configuration using Python module:

json

{
  "openzim-mcp": {
    "command": "python",
    "args": [
      "-m",
      "openzim_mcp",
      "/path/to/zim/files"
    ]
  }
}

For development (from source):

json

{
  "openzim-mcp": {
    "command": "uv",
    "args": [
      "--directory",
      "/path/to/openzim-mcp",
      "run",
      "python",
      "-m",
      "openzim_mcp",
      "/path/to/zim/files"
    ]
  }
}

🛠️ Development

Running Tests

bash

# Run all tests
make test

# Run tests with coverage
make test-cov

# Run specific test file
uv run pytest tests/test_security.py -v

# Run tests with ZIM test data (comprehensive testing)
make test-with-zim-data

# Run integration tests only
make test-integration

# Run tests that require ZIM test data
make test-requires-zim-data

ZIM Test Data Integration

OpenZIM MCP integrates with the official zim-testing-suite for comprehensive testing with real ZIM files:

bash

# Download essential test files (basic testing)
make download-test-data

# Download all test files (comprehensive testing)
make download-test-data-all

# List available test files
make list-test-data

# Clean downloaded test data
make clean-test-data

The test data includes:

Basic files: Small ZIM files for essential testing
Real content: Actual Wikipedia/Wikibooks content for integration testing
Invalid files: Malformed ZIM files for error handling testing
Special cases: Embedded content, split files, and edge cases

Test files are automatically organized by category and priority level.

Code Quality

bash

# Format code
make format

# Run linting
make lint

# Type checking
make type-check

# Run all checks
make check

Project Structure

text

openzim-mcp/
├── openzim_mcp/             # Main package
│   ├── __init__.py        # Package initialization
│   ├── __main__.py        # Module entry point
│   ├── main.py            # Main entry point
│   ├── server.py          # MCP server implementation
│   ├── config.py          # Configuration management
│   ├── security.py        # Security and validation
│   ├── cache.py           # Caching functionality
│   ├── content_processor.py # Content processing
│   ├── zim_operations.py  # ZIM file operations
│   ├── exceptions.py      # Custom exceptions
│   └── constants.py       # Application constants
├── tests/                 # Test suite
├── pyproject.toml        # Project configuration
├── Makefile              # Development commands
└── README.md             # This file

📚 API Reference

Available Tools

list_zim_files - List all ZIM files in allowed directories

No parameters required.

search_zim_file - Search within ZIM file content

Required parameters:

zim_file_path (string): Path to the ZIM file
query (string): Search query term

Optional parameters:

limit (integer, default: 10): Maximum number of results to return
offset (integer, default: 0): Starting offset for results (for pagination)

get_zim_entry - Get detailed content of a specific entry in a ZIM file

Required parameters:

zim_file_path (string): Path to the ZIM file
entry_path (string): Entry path, e.g., 'A/Some_Article'

Optional parameters:

max_content_length (integer, default: 100000, minimum: 1000): Maximum length of returned content

Smart Retrieval Features:

Automatic Fallback: If direct path access fails, automatically searches for the entry and uses the exact path found
Path Mapping Cache: Caches successful path mappings for improved performance on repeated access
Enhanced Error Guidance: Provides clear guidance when entries cannot be found, suggesting alternative approaches
Transparent Operation: Works seamlessly regardless of path encoding differences (spaces vs underscores, URL encoding, etc.)

get_zim_metadata - Get ZIM file metadata from M namespace entries

Required parameters:

zim_file_path (string): Path to the ZIM file

Returns: JSON string containing ZIM metadata including entry counts, archive information, and metadata entries like title, description, language, creator, etc.

get_main_page - Get the main page entry from W namespace

Required parameters:

zim_file_path (string): Path to the ZIM file

Returns: Main page content or information about the main page entry.

list_namespaces - List available namespaces and their entry counts

Required parameters:

zim_file_path (string): Path to the ZIM file

Returns: JSON string containing namespace information with entry counts, descriptions, and sample entries for each namespace (C, M, W, X, etc.).

browse_namespace - Browse entries in a specific namespace with pagination

Required parameters:

zim_file_path (string): Path to the ZIM file
namespace (string): Namespace to browse (C, M, W, X, A, I, etc.)

Optional parameters:

limit (integer, default: 50, range: 1-200): Maximum number of entries to return
offset (integer, default: 0): Starting offset for pagination

Returns: JSON string containing namespace entries with titles, content previews, and pagination information.

search_with_filters - Search within ZIM file content with advanced filters

Required parameters:

zim_file_path (string): Path to the ZIM file
query (string): Search query term

Optional parameters:

namespace (string): Optional namespace filter (C, M, W, X, etc.)
content_type (string): Optional content type filter (text/html, text/plain, etc.)
limit (integer, default: 10, range: 1-100): Maximum number of results to return
offset (integer, default: 0): Starting offset for pagination

Returns: Filtered search results with namespace and content type information.

get_search_suggestions - Get search suggestions and auto-complete

Required parameters:

zim_file_path (string): Path to the ZIM file
partial_query (string): Partial search query (minimum 2 characters)

Optional parameters:

limit (integer, default: 10, range: 1-50): Maximum number of suggestions to return

Returns: JSON string containing search suggestions based on article titles and content.

get_article_structure - Extract article structure and metadata

Required parameters:

zim_file_path (string): Path to the ZIM file
entry_path (string): Entry path, e.g., 'C/Some_Article'

Returns: JSON string containing article structure including headings, sections, metadata, and word count.

extract_article_links - Extract internal and external links from an article

Required parameters:

zim_file_path (string): Path to the ZIM file
entry_path (string): Entry path, e.g., 'C/Some_Article'

Returns: JSON string containing categorized links (internal, external, media) with titles and metadata.

Examples

Listing ZIM files

json

{
  "name": "list_zim_files"
}

Response:

plain

Found 1 ZIM files in 1 directories:

[
  {
    "name": "wikipedia_en_100_2025-08.zim",
    "path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
    "directory": "C:\\zim",
    "size": "310.77 MB",
    "modified": "2025-09-11T10:20:50.148427"
  }
]

Searching ZIM files

json

{
  "name": "search_zim_file",
  "arguments": {
    "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
    "query": "biology",
    "limit": 3
  }
}

Response:

plain

Found 51 matches for "biology", showing 1-3:

## 1. Taxonomy (biology)
Path: Taxonomy_(biology)
Snippet: #  Taxonomy (biology) Part of a series on
---
Evolutionary biology
Darwin's finches by John Gould

  * Index
  * Introduction
  * [Main](Evolution "Evolution")
  * Outline

## 2. Protein
Path: Protein
Snippet: #  Protein A representation of the 3D structure of the protein myoglobin showing turquoise α-helices. This protein was the first to have its structure solved by X-ray crystallography. Toward the right-center among the coils, a prosthetic group called a heme group (shown in gray) with a bound oxygen molecule (red).

## 3. Ant
Path: Ant
Snippet: #  Ant Ants
Temporal range: Late Aptian – Present
---
Fire ants
[Scientific classification](Taxonomy_\(biology\) "Taxonomy \(biology\)")
Kingdom:  | [Animalia](Animal "Animal")
Phylum:  | [Arthropoda](Arthropod "Arthropod")
Class:  | [Insecta](Insect "Insect")
Order:  | Hymenoptera
Infraorder:  | Aculeata
Superfamily:  |
Latreille, 1809[1]
Family:  |
Latreille, 1809

Getting ZIM entries

json

{
  "name": "get_zim_entry",
  "arguments": {
    "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
    "entry_path": "Protein"
  }
}

Response:

plain

# Protein

Path: Protein
Type: text/html
## Content

#  Protein

A representation of the 3D structure of the protein myoglobin showing turquoise α-helices. This protein was the first to have its structure solved by X-ray crystallography. Toward the right-center among the coils, a prosthetic group called a heme group (shown in gray) with a bound oxygen molecule (red).

**Proteins** are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific 3D structure that determines its activity.

A linear chain of amino acid residues is called a polypeptide. A protein contains at least one long polypeptide. Short polypeptides, containing less than 20–30 residues, are rarely considered to be proteins and are commonly called peptides.

... [Content truncated, total of 56,202 characters, only showing first 1,500 characters] ...

Smart Retrieval in Action

Example: Automatic path resolution

json

{
  "name": "get_zim_entry",
  "arguments": {
    "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
    "entry_path": "A/Test Article"
  }
}

Response (showing smart retrieval working):

plain

# Test Article

Requested Path: A/Test Article
Actual Path: A/Test_Article
Type: text/html

## Content

# Test Article

This article demonstrates the smart retrieval system automatically handling
path encoding differences. The system tried "A/Test Article" directly,
then automatically searched and found "A/Test_Article".

... [Content continues] ...

get_server_health - Get server health and statistics

No parameters required.

Returns:

Server status and performance metrics
Cache statistics
Configuration information
Instance tracking information
Conflict detection results

Example Response:

json

{
  "status": "healthy",
  "server_name": "openzim-mcp",
  "allowed_directories": 1,
  "cache": {
    "enabled": true,
    "size": 1,
    "max_size": 100,
    "ttl_seconds": 3600
  },
  "instance_tracking": {
    "active_instances": 1,
    "conflicts_detected": 0
  }
}

get_server_configuration - Get detailed server configuration

No parameters required.

Returns: Comprehensive server configuration including diagnostics, validation results, and conflict detection.

Example Response:

json

{
  "configuration": {
    "server_name": "openzim-mcp",
    "allowed_directories": ["/path/to/zim/files"],
    "cache_enabled": true,
    "config_hash": "abc123...",
    "server_pid": 12345
  },
  "diagnostics": {
    "validation_status": "healthy",
    "conflicts_detected": [],
    "warnings": [],
    "recommendations": []
  }
}

diagnose_server_state - Comprehensive server diagnostics

No parameters required.

Returns: Detailed diagnostic information including instance conflicts, configuration validation, file accessibility checks, and actionable recommendations.

Example Response:

json

{
  "status": "healthy",
  "server_info": {
    "pid": 12345,
    "server_name": "openzim-mcp",
    "config_hash": "abc123..."
  },
  "conflicts": [],
  "issues": [],
  "recommendations": ["Server appears to be running normally"],
  "environment_checks": {
    "directories_accessible": true,
    "cache_functional": true
  }
}

resolve_server_conflicts - Identify and resolve server conflicts

No parameters required.

Returns: Results of conflict resolution including cleanup actions and recommendations.

Example Response:

json

{
  "status": "success",
  "cleanup_results": {
    "stale_instances_removed": 2
  },
  "conflicts_found": [],
  "actions_taken": ["Removed 2 stale instance files"],
  "recommendations": ["No active conflicts detected"]
}

Additional Search Examples

Computer-related search:

json

{
  "name": "search_zim_file",
  "arguments": {
    "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
    "query": "computer",
    "limit": 2
  }
}

Response:

plain

Found 39 matches for "computer", showing 1-2:

## 1. Video game
Path: Video_game
Snippet: #  Video game First-generation _Pong_ console at the Computerspielemuseum Berlin
---
Platforms

## 2. Protein
Path: Protein
Snippet: #  Protein A representation of the 3D structure of the protein myoglobin showing turquoise α-helices. This protein was the first to have its structure solved by X-ray crystallography. Toward the right-center among the coils, a prosthetic group called a heme group (shown in gray) with a bound oxygen molecule (red).

Getting detailed content:

json

{
  "name": "get_zim_entry",
  "arguments": {
    "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
    "entry_path": "Evolution",
    "max_content_length": 1500
  }
}

Response:

plain

# Evolution

Path: Evolution
Type: text/html
## Content

#  Evolution

Part of the Biology series on
---
****
Mechanisms and processes

  * Adaptation
  * Genetic drift
  * Gene flow
  * History of life
  * Maladaptation
  * Mutation
  * Natural selection
  * Neutral theory
  * Population genetics
  * Speciation

... [Content truncated, total of 110,237 characters, only showing first 1,500 characters] ...

🎯 Advanced Knowledge Retrieval Examples

Getting ZIM metadata:

json

{
  "name": "get_zim_metadata",
  "arguments": {
    "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim"
  }
}

Response:

json

{
  "entry_count": 100000,
  "all_entry_count": 120000,
  "article_count": 80000,
  "media_count": 20000,
  "metadata_entries": {
    "Title": "Wikipedia (English)",
    "Description": "Wikipedia articles in English",
    "Language": "eng",
    "Creator": "Kiwix",
    "Date": "2025-08-15"
  }
}

Browsing a namespace:

json

{
  "name": "browse_namespace",
  "arguments": {
    "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
    "namespace": "C",
    "limit": 5,
    "offset": 0
  }
}

Response:

json

{
  "namespace": "C",
  "total_in_namespace": 80000,
  "offset": 0,
  "limit": 5,
  "returned_count": 5,
  "has_more": true,
  "entries": [
    {
      "path": "C/Biology",
      "title": "Biology",
      "content_type": "text/html",
      "preview": "Biology is the scientific study of life..."
    }
  ]
}

Filtered search:

json

{
  "name": "search_with_filters",
  "arguments": {
    "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
    "query": "evolution",
    "namespace": "C",
    "content_type": "text/html",
    "limit": 3
  }
}

Getting article structure:

json

{
  "name": "get_article_structure",
  "arguments": {
    "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
    "entry_path": "C/Evolution"
  }
}

Response:

json

{
  "title": "Evolution",
  "path": "C/Evolution",
  "content_type": "text/html",
  "headings": [
    {"level": 1, "text": "Evolution", "id": "evolution"},
    {"level": 2, "text": "History", "id": "history"},
    {"level": 2, "text": "Mechanisms", "id": "mechanisms"}
  ],
  "sections": [
    {
      "title": "Evolution",
      "level": 1,
      "content_preview": "Evolution is the change in heritable traits...",
      "word_count": 150
    }
  ],
  "word_count": 5000
}

Getting search suggestions:

json

{
  "name": "get_search_suggestions",
  "arguments": {
    "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
    "partial_query": "bio",
    "limit": 5
  }
}

Response:

json

{
  "partial_query": "bio",
  "suggestions": [
    {"text": "Biology", "path": "C/Biology", "type": "title_start_match"},
    {"text": "Biochemistry", "path": "C/Biochemistry", "type": "title_start_match"},
    {"text": "Biodiversity", "path": "C/Biodiversity", "type": "title_start_match"}
  ],
  "count": 3
}

🔧 Server Management and Diagnostics Examples

Getting server health:

json

{
  "name": "get_server_health"
}

Response:

json

{
  "status": "healthy",
  "server_name": "openzim-mcp",
  "uptime_info": {
    "process_id": 12345,
    "started_at": "2025-09-14T10:30:00"
  },
  "cache_performance": {
    "enabled": true,
    "size": 15,
    "max_size": 100,
    "hit_rate": 0.85
  },
  "instance_tracking": {
    "active_instances": 1,
    "conflicts_detected": 0
  }
}

Diagnosing server state:

json

{
  "name": "diagnose_server_state"
}

Response:

json

{
  "status": "healthy",
  "server_info": {
    "pid": 12345,
    "server_name": "openzim-mcp",
    "config_hash": "abc123def456..."
  },
  "conflicts": [],
  "issues": [],
  "recommendations": ["Server appears to be running normally. No issues detected."],
  "environment_checks": {
    "directories_accessible": true,
    "cache_functional": true,
    "zim_files_found": 5
  }
}

Resolving server conflicts:

json

{
  "name": "resolve_server_conflicts"
}

Response:

json

{
  "status": "success",
  "cleanup_results": {
    "stale_instances_removed": 2,
    "files_cleaned": ["/home/user/.openzim_mcp_instances/server_99999.json"]
  },
  "conflicts_found": [],
  "actions_taken": ["Removed 2 stale instance files"],
  "recommendations": ["No active conflicts detected after cleanup"]
}

🎯 ZIM Entry Retrieval Best Practices

Smart Retrieval System

OpenZIM MCP implements an intelligent entry retrieval system that automatically handles path encoding inconsistencies common in ZIM files:

How It Works:

Direct Access First: Attempts to retrieve the entry using the provided path exactly as given
Automatic Fallback: If direct access fails, automatically searches for the entry using various search terms
Path Mapping Cache: Caches successful path mappings to improve performance for repeated access
Enhanced Error Guidance: Provides clear guidance when entries cannot be found

Benefits for LLM Users:

Transparent Operation: No need to understand ZIM path encoding complexities
Single Tool Call: Eliminates the need for manual search-first methodology
Reliable Results: Consistent success across different path formats (spaces vs underscores, URL encoding, etc.)
Performance Optimized: Cached mappings improve repeated access speed

Example Scenarios Handled Automatically:

A/Test Article → A/Test_Article (space to underscore conversion)
C/Café → C/Caf%C3%A9 (URL encoding differences)
A/Some-Page → A/Some_Page (hyphen to underscore conversion)

Usage Recommendations

For Direct Entry Access:

json

{
  "name": "get_zim_entry",
  "arguments": {
    "zim_file_path": "/path/to/file.zim",
    "entry_path": "A/Article_Name"
  }
}

When Entry Not Found: The system will automatically provide guidance:

Entry not found: 'A/Article_Name'.
The entry path may not exist in this ZIM file.
Try using search_zim_file() to find available entries,
or browse_namespace() to explore the file structure.

⚠️ Important Notes and Limitations

Content Length Requirements

The max_content_length parameter for get_zim_entry must be at least 1000 characters
Content longer than the specified limit will be truncated with a note showing the total character count

Search Behavior

Search results may include articles that contain the search terms in various contexts
Results are ranked by relevance but may not always be directly related to the primary meaning of the search term
Search snippets provide a preview of the content but may not show the exact location where the search term appears

File Format Support

Currently supports ZIM files (Zeno IMproved format)
Tested with Wikipedia ZIM files (e.g., wikipedia_en_100_2025-08.zim)
File paths must be properly escaped in JSON (use \\ for Windows paths)

🔄 Multi-Server Instance Management

OpenZIM MCP includes advanced multi-server instance tracking and conflict detection to ensure reliable operation when multiple server instances are running.

Instance Tracking Features

Automatic Instance Registration: Each server instance is automatically registered with a unique process ID and configuration hash
Conflict Detection: Detects when multiple servers with different configurations are accessing the same directories
Stale Instance Cleanup: Automatically identifies and cleans up orphaned instance files from terminated processes
Configuration Validation: Ensures all server instances use compatible configurations

Conflict Types

Configuration Mismatch: Multiple servers with different settings accessing the same directories
Multiple Instances: Multiple servers running simultaneously (may cause confusion)
Stale Instances: Orphaned instance files from terminated processes

Automatic Conflict Warnings

OpenZIM MCP automatically includes conflict warnings in search results and file listings when issues are detected:

plain

🔍 **Server Conflict Detected**
⚠️ Configuration mismatch with server PID 12345. Search results may be inconsistent.
💡 Use 'resolve_server_conflicts()' to fix these issues.

Best Practices

Use diagnose_server_state() regularly to check for conflicts
Run resolve_server_conflicts() to clean up stale instances
Ensure all server instances use the same configuration when accessing shared directories
Monitor server health with get_server_health() for instance tracking information

🔧 Configuration

OpenZIM MCP supports configuration through environment variables with the OPENZIM_MCP_ prefix:

bash

# Cache configuration
export OPENZIM_MCP_CACHE__ENABLED=true
export OPENZIM_MCP_CACHE__MAX_SIZE=200
export OPENZIM_MCP_CACHE__TTL_SECONDS=7200

# Content configuration
export OPENZIM_MCP_CONTENT__MAX_CONTENT_LENGTH=200000
export OPENZIM_MCP_CONTENT__SNIPPET_LENGTH=2000
export OPENZIM_MCP_CONTENT__DEFAULT_SEARCH_LIMIT=20

# Logging configuration
export OPENZIM_MCP_LOGGING__LEVEL=DEBUG
export OPENZIM_MCP_LOGGING__FORMAT="%(asctime)s - %(name)s - %(levelname)s - %(message)s"

# Server configuration
export OPENZIM_MCP_SERVER_NAME=my_openzim_mcp_server

Configuration Options

Setting	Default	Description
`OPENZIM_MCP_CACHE__ENABLED`	`true`	Enable/disable caching
`OPENZIM_MCP_CACHE__MAX_SIZE`	`100`	Maximum cache entries
`OPENZIM_MCP_CACHE__TTL_SECONDS`	`3600`	Cache TTL in seconds
`OPENZIM_MCP_CONTENT__MAX_CONTENT_LENGTH`	`100000`	Max content length
`OPENZIM_MCP_CONTENT__SNIPPET_LENGTH`	`1000`	Max snippet length
`OPENZIM_MCP_CONTENT__DEFAULT_SEARCH_LIMIT`	`10`	Default search result limit
`OPENZIM_MCP_LOGGING__LEVEL`	`INFO`	Logging level
`OPENZIM_MCP_LOGGING__FORMAT`	`%(asctime)s - %(name)s - %(levelname)s - %(message)s`	Log message format
`OPENZIM_MCP_SERVER_NAME`	`openzim-mcp`	Server instance name

🔒 Security Features

Path Traversal Protection: Secure path validation prevents access outside allowed directories
Input Sanitization: All user inputs are validated and sanitized
Resource Management: Proper cleanup of ZIM archive resources
Error Handling: Sanitized error messages prevent information disclosure
Type Safety: Full type annotations prevent type-related vulnerabilities

🚀 Performance Features

Intelligent Caching: LRU cache with TTL for frequently accessed content
Resource Pooling: Efficient ZIM archive management
Optimized Content Processing: Fast HTML to text conversion
Lazy Loading: Components initialized only when needed
Memory Management: Proper cleanup and resource management

🧪 Testing

The project includes comprehensive testing with 90%+ coverage using both mock data and real ZIM files:

Test Categories

Unit Tests: Individual component testing with mocks
Integration Tests: End-to-end functionality testing with real ZIM files
Security Tests: Path traversal and input validation testing
Performance Tests: Cache and resource management testing
Format Compatibility: Testing with various ZIM file formats and versions
Error Handling: Testing with invalid and malformed ZIM files

Test Infrastructure

OpenZIM MCP uses a hybrid testing approach:

Mock-based tests: Fast unit tests using mocked libzim components
Real ZIM file tests: Integration tests using official zim-testing-suite files
Automatic test data management: Download and organize test files as needed

Test Data Sources

Built-in test data: Basic test files included in the repository
zim-testing-suite integration: Official test files from the OpenZIM project
Environment variable support: ZIM_TEST_DATA_DIR for custom test data locations

bash

# Run tests with coverage report
make test-cov

# View coverage report
open htmlcov/index.html

# Run comprehensive tests with real ZIM files
make test-with-zim-data

Test Markers

Tests are organized with pytest markers:

@pytest.mark.requires_zim_data: Tests requiring ZIM test data files
@pytest.mark.integration: Integration tests
@pytest.mark.slow: Long-running tests

📈 Monitoring

OpenZIM MCP provides built-in monitoring capabilities:

Health Checks: Server health and status monitoring
Cache Metrics: Cache hit rates and performance statistics
Structured Logging: JSON-formatted logs for easy parsing
Error Tracking: Comprehensive error logging and tracking

🔄 Versioning

This project uses Semantic Versioning with automated version management through release-please.

Automated Releases

Version bumps and releases are automated based on Conventional Commits:

feat: - New features (minor version bump)
fix: - Bug fixes (patch version bump)
feat!: or BREAKING CHANGE: - Breaking changes (major version bump)
perf: - Performance improvements (patch version bump)
docs:, style:, refactor:, test:, chore: - No version bump

Release Process

The project uses an improved, consolidated release system with automatic validation:

Automatic (Recommended): Push conventional commits → Release Please creates PR → Merge PR → Automatic release
Manual: Use GitHub Actions UI for direct control over releases
Emergency: Push tags directly for critical fixes

Key Features:

✅ Zero-touch releases from main branch
✅ Automatic version synchronization validation
✅ Comprehensive testing before every release
✅ Improved error handling and rollback capabilities
✅ Branch protection prevents broken releases

For detailed instructions, see Release Process Guide.

Commit Message Format

<type>[optional scope]: <description>

[optional body]

[optional footer(s)]

Examples:

bash

feat: add search suggestions endpoint
fix: resolve path traversal vulnerability
feat!: change API response format
docs: update installation instructions

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Run tests (make check)
Use conventional commit messages (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow PEP 8 style guidelines
Add type hints to all functions
Write tests for new functionality
Update documentation as needed
Use conventional commit messages for automatic versioning
Ensure all tests pass before submitting

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Kiwix for the ZIM format and libzim library
MCP for the Model Context Protocol
The open-source community for the excellent libraries used in this project

Star History

Repository Owner

cameronrye

User

Repository Details

Language Python

Default Branch main

Size 448 KB

Contributors 3

License MIT License

MCP Verified Nov 12, 2025

Programming Languages

Python

83.95%

HTML

7.49%

CSS

5.08%

JavaScript

2.51%

Makefile

0.98%

Topics

kiwix mcp mcp-server openzim zim

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Related MCPs

Discover similar Model Context Protocol servers

OpenStreetMap MCP Server

Enhancing LLMs with geospatial and location-based capabilities via the Model Context Protocol.

OpenStreetMap MCP Server enables large language models to interact with rich geospatial data and location-based services through a standardized protocol. It provides APIs and tools for address geocoding, reverse geocoding, points of interest search, route directions, and neighborhood analysis. The server exposes location-related resources and tools, making it compatible with MCP hosts for seamless LLM integration.

⭐ 134
MCP
jagan-shanmugam/open-streetmap-mcp

Zettelkasten MCP Server

A Zettelkasten-based knowledge management system implementing the Model Context Protocol.

Zettelkasten MCP Server provides an implementation of the Zettelkasten note-taking methodology, enriched with bidirectional linking, semantic relationships, and categorization of notes. It enables creation, exploration, and synthesis of atomic knowledge using MCP for AI-assisted workflows. The system integrates with clients such as Claude and supports markdown, advanced search, and a structured prompt framework for large language models. The dual storage architecture and synchronous operation model ensure flexibility and reliability for managing personal or collaborative knowledge bases.

⭐ 114
MCP
entanglr/zettelkasten-mcp

OpenAI WebSearch MCP Server

Intelligent web search with OpenAI reasoning model support, fully MCP-compatible.

OpenAI WebSearch MCP Server provides advanced web search functionality integrated with OpenAI's latest reasoning models, such as gpt-5 and o3-series. It features full compatibility with the Model Context Protocol, enabling easy integration into AI assistants that require up-to-date information and contextual awareness. Built with flexible configuration options, smart reasoning effort controls, and support for location-based search customization. Suitable for environments such as Claude Desktop, Cursor, and automated research workflows.

⭐ 75
MCP
ConechoAI/openai-websearch-mcp

Memory MCP

A Model Context Protocol server for managing LLM conversation memories with intelligent context window caching.

Memory MCP provides a Model Context Protocol (MCP) server for logging, retrieving, and managing memories from large language model (LLM) conversations. It offers features such as context window caching, relevance scoring, and tag-based context retrieval, leveraging MongoDB for persistent storage. The system is designed to efficiently archive, score, and summarize conversational context, supporting external orchestration and advanced memory management tools. This enables seamless handling of conversation history and dynamic context for enhanced LLM applications.

⭐ 10
MCP
JamesANZ/memory-mcp

Kibela MCP Server

MCP server for seamless LLM integration with Kibela knowledge management.

Kibela MCP Server enables integration of Large Language Models (LLMs) with the Kibela note-sharing platform via the Model Context Protocol. It provides search, retrieval, and management of Kibela notes, users, groups, and folders, exposing these capabilities in a standardized MCP interface. The implementation utilizes Kibela's GraphQL API and supports configuration through environment variables and Docker. Designed for interoperability with tools like Cursor, it streamlines access and manipulation of organizational knowledge by AI systems.

⭐ 7
MCP
kiwamizamurai/mcp-kibela-server

ZoomEye MCP Server

Real-time cyberspace asset intelligence for AI assistants via Model Context Protocol.

ZoomEye MCP Server implements the Model Context Protocol (MCP) to provide network asset intelligence to AI assistants and development tools. It enables querying of global internet assets through ZoomEye's cyber asset search engine using structured parameters and dorks. The server includes features like caching, error handling, and compatibility with leading MCP environments, supporting real-time cyber asset data integration for various AI and developer platforms.

⭐ 50
MCP
zoomeye-ai/mcp_zoomeye

View all Alternatives

Didn't find tool you were looking for?

Search AI Tools

OpenZIM MCP Server

Key Features

Use Cases

README

OpenZIM MCP Server

🧠 Built for LLM Intelligence

✨ Features

🚀 Quick Start

Installation

Development Installation

Prepare ZIM Files

Running the Server

Tool Modes

MCP Configuration

🛠️ Development

Running Tests

ZIM Test Data Integration

Code Quality

Project Structure

📚 API Reference

Available Tools

list_zim_files - List all ZIM files in allowed directories

search_zim_file - Search within ZIM file content

get_zim_entry - Get detailed content of a specific entry in a ZIM file

get_zim_metadata - Get ZIM file metadata from M namespace entries

get_main_page - Get the main page entry from W namespace

list_namespaces - List available namespaces and their entry counts

browse_namespace - Browse entries in a specific namespace with pagination

search_with_filters - Search within ZIM file content with advanced filters

get_search_suggestions - Get search suggestions and auto-complete

get_article_structure - Extract article structure and metadata

extract_article_links - Extract internal and external links from an article

Examples

Listing ZIM files

Searching ZIM files

Getting ZIM entries

Smart Retrieval in Action

get_server_health - Get server health and statistics

get_server_configuration - Get detailed server configuration

diagnose_server_state - Comprehensive server diagnostics

resolve_server_conflicts - Identify and resolve server conflicts

Additional Search Examples

🎯 Advanced Knowledge Retrieval Examples

🔧 Server Management and Diagnostics Examples

🎯 ZIM Entry Retrieval Best Practices

Smart Retrieval System

Usage Recommendations

⚠️ Important Notes and Limitations

Content Length Requirements

Search Behavior

File Format Support

🔄 Multi-Server Instance Management

Instance Tracking Features

Conflict Types

Automatic Conflict Warnings

Best Practices

🔧 Configuration

Configuration Options

🔒 Security Features

🚀 Performance Features

🧪 Testing

Test Categories

Test Infrastructure

Test Data Sources

Test Markers

📈 Monitoring

🔄 Versioning

Automated Releases

Release Process

Commit Message Format

🤝 Contributing

Development Guidelines

📄 License

🙏 Acknowledgments

Star History

Repository Owner

Repository Details

Programming Languages

Tags