omniparser-autogui-mcp

omniparser-autogui-mcp

Automated GUI analysis and interaction via the Model Context Protocol.

58
Stars
11
Forks
58
Watchers
4
Issues
omniparser-autogui-mcp is an MCP server that leverages OmniParser to analyze on-screen content and perform automated GUI operations. It integrates with clients such as Claude Desktop and can be configured via a detailed environment setup. The tool supports Windows and can delegate OmniParser processing to external devices, offering flexibility for complex contexts. Multiple environment variables allow customization of backend processing, target window selection, and communication methods, including SSE.

Key Features

Acts as a Model Context Protocol server
Automates GUI operations based on screen analysis
Utilizes OmniParser for advanced on-screen content recognition
Supports integration with Claude Desktop and similar clients
Configurable via environment variables for backend control
Optional SSE communication protocol
Ability to designate specific target windows
External OmniParser server support
Cross-platform consideration with Windows focus
Supports additional model configuration for enhanced customization

Use Cases

Automating repetitive GUI tasks based on screen content
Providing contextual screen data to AI models for enhanced interaction
Integrating screen analysis into desktop automation workflows
Operating legacy GUI applications programmatically
Assisting accessibility tools with real-time screen parsing
Remote screen operation by routing OmniParser through networked devices
Custom window targeting for focused automation tasks
Enhancing virtual assistant capabilities with visual context
Interfacing with applications lacking API access by driving their GUIs
Testing UI applications via programmatic GUI interactions

README

omniparser-autogui-mcp

日本語版はこちら

This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.

License notes

This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).

Installation

  1. Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py

(Other than Windows, use export instead of set.)
(If you want langchain_example.py to work, uv sync --extra langchain instead.)

  1. Add this to your claude_desktop_config.json:
claude_desktop_config.json
{
  "mcpServers": {
    "omniparser_autogui_mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "D:\\CLONED_PATH\\omniparser-autogui-mcp",
        "run",
        "omniparser-autogui-mcp"
      ],
      "env": {
        "PYTHONIOENCODING": "utf-8",
        "OCR_LANG": "en"
      }
    }
  }
}

(Replace D:\\CLONED_PATH\\omniparser-autogui-mcp with the directory you cloned.)

env allows for the following additional configurations:

  • OMNI_PARSER_BACKEND_LOAD
    If it does not work with other clients (such as LibreChat), specify 1.

  • TARGET_WINDOW_NAME
    If you want to specify the window to operate, please specify the window name.
    If not specified, operates on the entire screen.

  • OMNI_PARSER_SERVER
    If you want OmniParser processing to be done on another device, specify the server's address and port, such as 127.0.0.1:8000.
    The server can be started with uv run omniparserserver.

  • SSE_HOST, SSE_PORT
    If specified, communication will be done via SSE instead of stdio.

  • SOM_MODEL_PATH, CAPTION_MODEL_NAME, CAPTION_MODEL_PATH, OMNI_PARSER_DEVICE, BOX_TRESHOLD
    These are for OmniParser configuration.
    Usually, they are not necessary.

Usage Examples

  • Search for "MCP server" in the on-screen browser.

etc.

Star History

Star History Chart

Repository Owner

NON906
NON906

User

Repository Details

Language Python
Default Branch master
Size 178 KB
Contributors 1
License MIT License
MCP Verified Nov 12, 2025

Programming Languages

Python
100%

Tags

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

We respect your privacy. Unsubscribe at any time.

Related MCPs

Discover similar Model Context Protocol servers

  • ScreenMonitorMCP v2

    ScreenMonitorMCP v2

    Real-time screen monitoring and visual analysis for AI assistants via MCP.

    ScreenMonitorMCP v2 is a Model Context Protocol (MCP) server enabling AI assistants to capture, analyze, and interact with screen content in real time. It supports instant screenshots, live streaming, advanced vision-based analysis, and provides performance monitoring across Windows, macOS, and Linux. Integration with clients like Claude Desktop is streamlined, offering easy configuration and broad compatibility. The tool leverages AI vision models to provide intelligent insights into screen content and system health.

    • 64
    • MCP
    • inkbytefo/ScreenMonitorMCP
  • OpenAI MCP Server

    OpenAI MCP Server

    Bridge between Claude and OpenAI models using the MCP protocol.

    OpenAI MCP Server enables direct querying of OpenAI language models from Claude via the Model Context Protocol (MCP). It provides a configurable Python server that exposes OpenAI APIs as MCP endpoints. The server is designed for seamless integration, requiring simple configuration updates and environment variable setup. Automated testing is supported to verify connectivity and response from the OpenAI API.

    • 77
    • MCP
    • pierrebrunelle/mcp-server-openai
  • mcp-cli

    mcp-cli

    A command-line inspector and client for the Model Context Protocol

    mcp-cli is a command-line interface tool designed to interact with Model Context Protocol (MCP) servers. It allows users to run and connect to MCP servers from various sources, inspect available tools, resources, and prompts, and execute commands non-interactively or interactively. The tool supports OAuth for various server types, making integration and automation seamless for developers working with MCP-compliant servers.

    • 391
    • MCP
    • wong2/mcp-cli
  • MCP Manager for Claude Desktop

    MCP Manager for Claude Desktop

    A desktop app to manage Model Context Protocol (MCP) servers for Claude Desktop on MacOS.

    MCP Manager for Claude Desktop provides a user-friendly interface to manage Model Context Protocol (MCP) servers, enabling Claude to access private data, APIs, and local or remote services securely from a MacOS desktop. It facilitates rapid configuration and integration with a wide variety of MCP servers, including productivity tools, databases, and web APIs. The app runs locally to ensure data privacy and streamlines connecting Claude to new sources through simple environment and server settings management.

    • 270
    • MCP
    • zueai/mcp-manager
  • ws-mcp

    ws-mcp

    WebSocket bridge for MCP stdio servers.

    ws-mcp wraps Model Context Protocol (MCP) stdio servers with a WebSocket interface, enabling seamless integration with web-based clients and tools. It allows users to configure and launch multiple MCP servers via a flexible configuration file or command-line arguments. The tool is designed to be compatible with services such as wcgw, fetch, and other MCP-compliant servers, providing standardized access to system operations, HTTP requests, and more. Integration with tools like Kibitz enables broader applications in model interaction workflows.

    • 19
    • MCP
    • nick1udwig/ws-mcp
  • Offorte MCP Server

    Offorte MCP Server

    Bridge AI agents with Offorte proposal automation via the Model Context Protocol.

    Offorte MCP Server enables external AI models to create and send proposals through Offorte by implementing the Model Context Protocol. It facilitates automation workflows between AI agents and Offorte's proposal engine, supporting seamless integration with chat interfaces and autonomous systems. The server provides a suite of tools for managing contacts, proposals, templates, and automation sets, streamlining the proposal creation and delivery process via standardized context handling. Designed for extensibility and real-world automation, it leverages Offorte's public API to empower intelligent business proposals.

    • 4
    • MCP
    • offorte/offorte-mcp-server
  • Didn't find tool you were looking for?

    Be as detailed as possible for better results