ScreenPilot

ScreenPilot

Empower LLMs with full device control through screen automation.

50
Stars
8
Forks
50
Watchers
2
Issues
ScreenPilot provides an MCP server interface to enable large language models to interact with and control graphical user interfaces on a device. It offers a comprehensive toolkit for screen capture, mouse control, keyboard input, scrolling, element detection, and action sequencing. The toolkit is suitable for automation, education, and experimentation, allowing AI agents to perform complex operations on a user’s device.

Key Features

Screen capture and analysis
Mouse movement and click control
Keyboard input and hotkey simulation
Configurable action sequences
Element existence detection on screen
Automated scrolling functionalities
Integration with Claude AI desktop
Extensible MCP server setup
Support for complex interaction chains
Easy local environment installation

Use Cases

Automating routine desktop tasks via LLMs
Educational tools for demonstrating user interface interactions
Testing and prototyping GUI applications
Enabling hands-free device control for accessibility
Workflow automation in multi-step desktop scenarios
Interactive demonstrations and tutorials by AI agents
Remote device control through LLM-powered agents
Automated application configuration and setup
Monitoring and responding to screen changes
Scripting complex desktop actions programmatically

README

MseeP.ai Security Assessment Badge

Trust Score

ScreenPilot

MCP server to let LLM take full control on your device by providing screen automation toolkit for controlling and interacting with graphical user interfaces. Good for automation, education and having fun.

Main Features

  • 📷 Screen capture and analysis
  • 🖱️ Mouse control (clicking, positioning)
  • ⌨️ Keyboard input (typing, key presses, hotkeys)

watch demo

https://github.com/user-attachments/assets/c18380c0-b3dd-4b7c-925d-28ef205ca11f

Installation

  1. Install python 3.12
  2. Clone the repository:
    bash
    git clone https://github.com/Mtehabsim/ScreenPilot.git
    
  3. create virtiual environment
bash

python -m venv venv
  1. activate the env
bash
venv\Scripts\activate
  1. Install the required packages:
    bash
    pip install -r requirements.txt
    
  2. Open Claude AI desktop
  3. file -> settings -> developer -> edit config
  4. open config file and paste this
bash
{
    "mcpServers": {
        "device-controll": {
            "command": "pathToEnv\\venv\\Scripts\\python.exe",
            "args": [
                "pathToProject\\ScreenPilot\\main.py"
            ]
        }
    }
}

  1. Replace     "pathToEnv\venv\Scripts\python.exe" → with the full path to your python.exe     "pathToProject\ScreenPilot\main.py" → with the full path to your main.py file

  2. Save the config file.

  3. Open Claude AI Desktop.

  4. Go to File → Exit

  5. You can now open Claude AI Desktop and enjoy ScreenPilot.

Available Tools

  • Screen Capture: Take screenshots and get screen information
  • Mouse Control: Move the mouse and perform clicks
  • Keyboard Actions: Type text, press keys, and use hotkey combinations
  • Scrolling: Scroll in different directions and to specific positions
  • Element Detection: Check if elements exist on screen and wait for them to appear
  • Action Sequences: Perform multiple actions in sequence

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Star History

Star History Chart

Repository Owner

Mtehabsim
Mtehabsim

User

Repository Details

Language Python
Default Branch main
Size 9,658 KB
Contributors 4
MCP Verified Nov 12, 2025

Programming Languages

Python
100%

Tags

Topics

automation mcp-server

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

We respect your privacy. Unsubscribe at any time.

Related MCPs

Discover similar Model Context Protocol servers

  • ScreenMonitorMCP v2

    ScreenMonitorMCP v2

    Real-time screen monitoring and visual analysis for AI assistants via MCP.

    ScreenMonitorMCP v2 is a Model Context Protocol (MCP) server enabling AI assistants to capture, analyze, and interact with screen content in real time. It supports instant screenshots, live streaming, advanced vision-based analysis, and provides performance monitoring across Windows, macOS, and Linux. Integration with clients like Claude Desktop is streamlined, offering easy configuration and broad compatibility. The tool leverages AI vision models to provide intelligent insights into screen content and system health.

    • 64
    • MCP
    • inkbytefo/ScreenMonitorMCP
  • omniparser-autogui-mcp

    omniparser-autogui-mcp

    Automated GUI analysis and interaction via the Model Context Protocol.

    omniparser-autogui-mcp is an MCP server that leverages OmniParser to analyze on-screen content and perform automated GUI operations. It integrates with clients such as Claude Desktop and can be configured via a detailed environment setup. The tool supports Windows and can delegate OmniParser processing to external devices, offering flexibility for complex contexts. Multiple environment variables allow customization of backend processing, target window selection, and communication methods, including SSE.

    • 58
    • MCP
    • NON906/omniparser-autogui-mcp
  • MCP Manager for Claude Desktop

    MCP Manager for Claude Desktop

    A desktop app to manage Model Context Protocol (MCP) servers for Claude Desktop on MacOS.

    MCP Manager for Claude Desktop provides a user-friendly interface to manage Model Context Protocol (MCP) servers, enabling Claude to access private data, APIs, and local or remote services securely from a MacOS desktop. It facilitates rapid configuration and integration with a wide variety of MCP servers, including productivity tools, databases, and web APIs. The app runs locally to ensure data privacy and streamlines connecting Claude to new sources through simple environment and server settings management.

    • 270
    • MCP
    • zueai/mcp-manager
  • OpenAI MCP Server

    OpenAI MCP Server

    Bridge between Claude and OpenAI models using the MCP protocol.

    OpenAI MCP Server enables direct querying of OpenAI language models from Claude via the Model Context Protocol (MCP). It provides a configurable Python server that exposes OpenAI APIs as MCP endpoints. The server is designed for seamless integration, requiring simple configuration updates and environment variable setup. Automated testing is supported to verify connectivity and response from the OpenAI API.

    • 77
    • MCP
    • pierrebrunelle/mcp-server-openai
  • interactive-mcp

    interactive-mcp

    Enable interactive, local communication between LLMs and users via MCP.

    interactive-mcp implements a Model Context Protocol (MCP) server in Node.js/TypeScript, allowing Large Language Models (LLMs) to interact directly with users on their local machine. It exposes tools for requesting user input, sending notifications, and managing persistent command-line chat sessions, facilitating real-time communication. Designed for integration with clients like Claude Desktop and VS Code, it operates locally to access OS-level notifications and command prompts. The project is suited for interactive workflows where LLMs require user involvement or confirmation.

    • 313
    • MCP
    • ttommyth/interactive-mcp
  • Notion MCP Server

    Notion MCP Server

    Enable LLMs to interact with Notion using the Model Context Protocol.

    Notion MCP Server allows large language models to interface with Notion workspaces through a Model Context Protocol server, supporting both data retrieval and editing capabilities. It includes experimental Markdown conversion to optimize token usage for more efficient communication with LLMs. The server can be configured with environment variables and controlled for specific tool access. Integration with applications like Claude Desktop is supported for seamless automation.

    • 834
    • MCP
    • suekou/mcp-notion-server
  • Didn't find tool you were looking for?

    Be as detailed as possible for better results