ScreenPilot

Empower LLMs with full device control through screen automation.

Stars

Forks

Watchers

Issues

ScreenPilot provides an MCP server interface to enable large language models to interact with and control graphical user interfaces on a device. It offers a comprehensive toolkit for screen capture, mouse control, keyboard input, scrolling, element detection, and action sequencing. The toolkit is suitable for automation, education, and experimentation, allowing AI agents to perform complex operations on a user’s device.

Key Features

Screen capture and analysis

Mouse movement and click control

Keyboard input and hotkey simulation

Configurable action sequences

Element existence detection on screen

Automated scrolling functionalities

Integration with Claude AI desktop

Extensible MCP server setup

Support for complex interaction chains

Easy local environment installation

Use Cases

Automating routine desktop tasks via LLMs

Educational tools for demonstrating user interface interactions

Testing and prototyping GUI applications

Enabling hands-free device control for accessibility

Workflow automation in multi-step desktop scenarios

Interactive demonstrations and tutorials by AI agents

Remote device control through LLM-powered agents

Automated application configuration and setup

Monitoring and responding to screen changes

Scripting complex desktop actions programmatically

README

ScreenPilot

MCP server to let LLM take full control on your device by providing screen automation toolkit for controlling and interacting with graphical user interfaces. Good for automation, education and having fun.

Main Features

📷 Screen capture and analysis
🖱️ Mouse control (clicking, positioning)
⌨️ Keyboard input (typing, key presses, hotkeys)

watch demo

https://github.com/user-attachments/assets/c18380c0-b3dd-4b7c-925d-28ef205ca11f

Installation

Install python 3.12

Clone the repository:

bash

git clone https://github.com/Mtehabsim/ScreenPilot.git

create virtiual environment

bash


python -m venv venv

activate the env

bash

venv\Scripts\activate

Install the required packages:
bash
```
pip install -r requirements.txt
```
Open Claude AI desktop
file -> settings -> developer -> edit config
open config file and paste this

bash

{
    "mcpServers": {
        "device-controll": {
            "command": "pathToEnv\\venv\\Scripts\\python.exe",
            "args": [
                "pathToProject\\ScreenPilot\\main.py"
            ]
        }
    }
}

Replace "pathToEnv\venv\Scripts\python.exe" → with the full path to your python.exe "pathToProject\ScreenPilot\main.py" → with the full path to your main.py file
Save the config file.
Open Claude AI Desktop.
Go to File → Exit
You can now open Claude AI Desktop and enjoy ScreenPilot.

Available Tools

Screen Capture: Take screenshots and get screen information
Mouse Control: Move the mouse and perform clicks
Keyboard Actions: Type text, press keys, and use hotkey combinations
Scrolling: Scroll in different directions and to specific positions
Element Detection: Check if elements exist on screen and wait for them to appear
Action Sequences: Perform multiple actions in sequence

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Star History

Repository Owner

Mtehabsim

User

Repository Details

Language Python

Default Branch main

Size 9,658 KB

Contributors 4

MCP Verified Nov 12, 2025

Programming Languages

Python

100%

Topics

automation mcp-server

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Related MCPs

Discover similar Model Context Protocol servers

ScreenMonitorMCP v2

Real-time screen monitoring and visual analysis for AI assistants via MCP.

ScreenMonitorMCP v2 is a Model Context Protocol (MCP) server enabling AI assistants to capture, analyze, and interact with screen content in real time. It supports instant screenshots, live streaming, advanced vision-based analysis, and provides performance monitoring across Windows, macOS, and Linux. Integration with clients like Claude Desktop is streamlined, offering easy configuration and broad compatibility. The tool leverages AI vision models to provide intelligent insights into screen content and system health.

⭐ 64
MCP
inkbytefo/ScreenMonitorMCP

omniparser-autogui-mcp

Automated GUI analysis and interaction via the Model Context Protocol.

omniparser-autogui-mcp is an MCP server that leverages OmniParser to analyze on-screen content and perform automated GUI operations. It integrates with clients such as Claude Desktop and can be configured via a detailed environment setup. The tool supports Windows and can delegate OmniParser processing to external devices, offering flexibility for complex contexts. Multiple environment variables allow customization of backend processing, target window selection, and communication methods, including SSE.

⭐ 58
MCP
NON906/omniparser-autogui-mcp

MCP Manager for Claude Desktop

A desktop app to manage Model Context Protocol (MCP) servers for Claude Desktop on MacOS.

MCP Manager for Claude Desktop provides a user-friendly interface to manage Model Context Protocol (MCP) servers, enabling Claude to access private data, APIs, and local or remote services securely from a MacOS desktop. It facilitates rapid configuration and integration with a wide variety of MCP servers, including productivity tools, databases, and web APIs. The app runs locally to ensure data privacy and streamlines connecting Claude to new sources through simple environment and server settings management.

⭐ 270
MCP
zueai/mcp-manager

OpenAI MCP Server

Bridge between Claude and OpenAI models using the MCP protocol.

OpenAI MCP Server enables direct querying of OpenAI language models from Claude via the Model Context Protocol (MCP). It provides a configurable Python server that exposes OpenAI APIs as MCP endpoints. The server is designed for seamless integration, requiring simple configuration updates and environment variable setup. Automated testing is supported to verify connectivity and response from the OpenAI API.

⭐ 77
MCP
pierrebrunelle/mcp-server-openai

interactive-mcp

Enable interactive, local communication between LLMs and users via MCP.

interactive-mcp implements a Model Context Protocol (MCP) server in Node.js/TypeScript, allowing Large Language Models (LLMs) to interact directly with users on their local machine. It exposes tools for requesting user input, sending notifications, and managing persistent command-line chat sessions, facilitating real-time communication. Designed for integration with clients like Claude Desktop and VS Code, it operates locally to access OS-level notifications and command prompts. The project is suited for interactive workflows where LLMs require user involvement or confirmation.

⭐ 313
MCP
ttommyth/interactive-mcp

Notion MCP Server

Enable LLMs to interact with Notion using the Model Context Protocol.

Notion MCP Server allows large language models to interface with Notion workspaces through a Model Context Protocol server, supporting both data retrieval and editing capabilities. It includes experimental Markdown conversion to optimize token usage for more efficient communication with LLMs. The server can be configured with environment variables and controlled for specific tool access. Integration with applications like Claude Desktop is supported for seamless automation.

⭐ 834
MCP
suekou/mcp-notion-server

View all Alternatives

Didn't find tool you were looking for?

Search AI Tools

ScreenPilot

Key Features

Use Cases

README

ScreenPilot

Main Features

watch demo

Installation

Available Tools

Contributing

Star History

Repository Owner

Repository Details

Programming Languages

Tags

Topics

Related MCPs