Agent skill

desktop-control

Advanced desktop automation with mouse, keyboard, and screen control

Stars 1,878
Forks 294

Install this agent skill to your Project

npx add-skill https://github.com/LeoYeAI/openclaw-master-skills/tree/main/skills/desktop-control

SKILL.md

Desktop Control Skill

The most advanced desktop automation skill for OpenClaw. Provides pixel-perfect mouse control, lightning-fast keyboard input, screen capture, window management, and clipboard operations.

🎯 Features

Mouse Control

  • Absolute positioning - Move to exact coordinates
  • Relative movement - Move from current position
  • Smooth movement - Natural, human-like mouse paths
  • Click types - Left, right, middle, double, triple clicks
  • Drag & drop - Drag from point A to point B
  • Scroll - Vertical and horizontal scrolling
  • Position tracking - Get current mouse coordinates

Keyboard Control

  • Text typing - Fast, accurate text input
  • Hotkeys - Execute keyboard shortcuts (Ctrl+C, Win+R, etc.)
  • Special keys - Enter, Tab, Escape, Arrow keys, F-keys
  • Key combinations - Multi-key press combinations
  • Hold & release - Manual key state control
  • Typing speed - Configurable WPM (instant to human-like)

Screen Operations

  • Screenshot - Capture entire screen or regions
  • Image recognition - Find elements on screen (via OpenCV)
  • Color detection - Get pixel colors at coordinates
  • Multi-monitor - Support for multiple displays

Window Management

  • Window list - Get all open windows
  • Activate window - Bring window to front
  • Window info - Get position, size, title
  • Minimize/Maximize - Control window states

Safety Features

  • Failsafe - Move mouse to corner to abort
  • Pause control - Emergency stop mechanism
  • Approval mode - Require confirmation for actions
  • Bounds checking - Prevent out-of-screen operations
  • Logging - Track all automation actions

🚀 Quick Start

Installation

First, install required dependencies:

bash
pip install pyautogui pillow opencv-python pygetwindow

Basic Usage

python
from skills.desktop_control import DesktopController

# Initialize controller
dc = DesktopController(failsafe=True)

# Mouse operations
dc.move_mouse(500, 300)  # Move to coordinates
dc.click()  # Left click at current position
dc.click(100, 200, button="right")  # Right click at position

# Keyboard operations
dc.type_text("Hello from OpenClaw!")
dc.hotkey("ctrl", "c")  # Copy
dc.press("enter")

# Screen operations
screenshot = dc.screenshot()
position = dc.get_mouse_position()

📋 Complete API Reference

Mouse Functions

move_mouse(x, y, duration=0, smooth=True)

Move mouse to absolute screen coordinates.

Parameters:

  • x (int): X coordinate (pixels from left)
  • y (int): Y coordinate (pixels from top)
  • duration (float): Movement time in seconds (0 = instant, 0.5 = smooth)
  • smooth (bool): Use bezier curve for natural movement

Example:

python
# Instant movement
dc.move_mouse(1000, 500)

# Smooth 1-second movement
dc.move_mouse(1000, 500, duration=1.0)

move_relative(x_offset, y_offset, duration=0)

Move mouse relative to current position.

Parameters:

  • x_offset (int): Pixels to move horizontally (positive = right)
  • y_offset (int): Pixels to move vertically (positive = down)
  • duration (float): Movement time in seconds

Example:

python
# Move 100px right, 50px down
dc.move_relative(100, 50, duration=0.3)

click(x=None, y=None, button='left', clicks=1, interval=0.1)

Perform mouse click.

Parameters:

  • x, y (int, optional): Coordinates to click (None = current position)
  • button (str): 'left', 'right', 'middle'
  • clicks (int): Number of clicks (1 = single, 2 = double)
  • interval (float): Delay between multiple clicks

Example:

python
# Simple left click
dc.click()

# Double-click at specific position
dc.click(500, 300, clicks=2)

# Right-click
dc.click(button='right')

drag(start_x, start_y, end_x, end_y, duration=0.5, button='left')

Drag and drop operation.

Parameters:

  • start_x, start_y (int): Starting coordinates
  • end_x, end_y (int): Ending coordinates
  • duration (float): Drag duration
  • button (str): Mouse button to use

Example:

python
# Drag file from desktop to folder
dc.drag(100, 100, 500, 500, duration=1.0)

scroll(clicks, direction='vertical', x=None, y=None)

Scroll mouse wheel.

Parameters:

  • clicks (int): Scroll amount (positive = up/left, negative = down/right)
  • direction (str): 'vertical' or 'horizontal'
  • x, y (int, optional): Position to scroll at

Example:

python
# Scroll down 5 clicks
dc.scroll(-5)

# Scroll up 10 clicks
dc.scroll(10)

# Horizontal scroll
dc.scroll(5, direction='horizontal')

get_mouse_position()

Get current mouse coordinates.

Returns: (x, y) tuple

Example:

python
x, y = dc.get_mouse_position()
print(f"Mouse is at: {x}, {y}")

Keyboard Functions

type_text(text, interval=0, wpm=None)

Type text with configurable speed.

Parameters:

  • text (str): Text to type
  • interval (float): Delay between keystrokes (0 = instant)
  • wpm (int, optional): Words per minute (overrides interval)

Example:

python
# Instant typing
dc.type_text("Hello World")

# Human-like typing at 60 WPM
dc.type_text("Hello World", wpm=60)

# Slow typing with 0.1s between keys
dc.type_text("Hello World", interval=0.1)

press(key, presses=1, interval=0.1)

Press and release a key.

Parameters:

  • key (str): Key name (see Key Names section)
  • presses (int): Number of times to press
  • interval (float): Delay between presses

Example:

python
# Press Enter
dc.press('enter')

# Press Space 3 times
dc.press('space', presses=3)

# Press Down arrow
dc.press('down')

hotkey(*keys, interval=0.05)

Execute keyboard shortcut.

Parameters:

  • *keys (str): Keys to press together
  • interval (float): Delay between key presses

Example:

python
# Copy (Ctrl+C)
dc.hotkey('ctrl', 'c')

# Paste (Ctrl+V)
dc.hotkey('ctrl', 'v')

# Open Run dialog (Win+R)
dc.hotkey('win', 'r')

# Save (Ctrl+S)
dc.hotkey('ctrl', 's')

# Select All (Ctrl+A)
dc.hotkey('ctrl', 'a')

key_down(key) / key_up(key)

Manually control key state.

Example:

python
# Hold Shift
dc.key_down('shift')
dc.type_text("hello")  # Types "HELLO"
dc.key_up('shift')

# Hold Ctrl and click (for multi-select)
dc.key_down('ctrl')
dc.click(100, 100)
dc.click(200, 100)
dc.key_up('ctrl')

Screen Functions

screenshot(region=None, filename=None)

Capture screen or region.

Parameters:

  • region (tuple, optional): (left, top, width, height) for partial capture
  • filename (str, optional): Path to save image

Returns: PIL Image object

Example:

python
# Full screen
img = dc.screenshot()

# Save to file
dc.screenshot(filename="screenshot.png")

# Capture specific region
img = dc.screenshot(region=(100, 100, 500, 300))

get_pixel_color(x, y)

Get color of pixel at coordinates.

Returns: RGB tuple (r, g, b)

Example:

python
r, g, b = dc.get_pixel_color(500, 300)
print(f"Color at (500, 300): RGB({r}, {g}, {b})")

find_on_screen(image_path, confidence=0.8)

Find image on screen (requires OpenCV).

Parameters:

  • image_path (str): Path to template image
  • confidence (float): Match threshold (0-1)

Returns: (x, y, width, height) or None

Example:

python
# Find button on screen
location = dc.find_on_screen("button.png")
if location:
    x, y, w, h = location
    # Click center of found image
    dc.click(x + w//2, y + h//2)

get_screen_size()

Get screen resolution.

Returns: (width, height) tuple

Example:

python
width, height = dc.get_screen_size()
print(f"Screen: {width}x{height}")

Window Functions

get_all_windows()

List all open windows.

Returns: List of window titles

Example:

python
windows = dc.get_all_windows()
for title in windows:
    print(f"Window: {title}")

activate_window(title_substring)

Bring window to front by title.

Parameters:

  • title_substring (str): Part of window title to match

Example:

python
# Activate Chrome
dc.activate_window("Chrome")

# Activate VS Code
dc.activate_window("Visual Studio Code")

get_active_window()

Get currently focused window.

Returns: Window title (str)

Example:

python
active = dc.get_active_window()
print(f"Active window: {active}")

Clipboard Functions

copy_to_clipboard(text)

Copy text to clipboard.

Example:

python
dc.copy_to_clipboard("Hello from OpenClaw!")

get_from_clipboard()

Get text from clipboard.

Returns: str

Example:

python
text = dc.get_from_clipboard()
print(f"Clipboard: {text}")

⌨️ Key Names Reference

Alphabet Keys

'a' through 'z'

Number Keys

'0' through '9'

Function Keys

'f1' through 'f24'

Special Keys

  • 'enter' / 'return'
  • 'esc' / 'escape'
  • 'space' / 'spacebar'
  • 'tab'
  • 'backspace'
  • 'delete' / 'del'
  • 'insert'
  • 'home'
  • 'end'
  • 'pageup' / 'pgup'
  • 'pagedown' / 'pgdn'

Arrow Keys

  • 'up' / 'down' / 'left' / 'right'

Modifier Keys

  • 'ctrl' / 'control'
  • 'shift'
  • 'alt'
  • 'win' / 'winleft' / 'winright'
  • 'cmd' / 'command' (Mac)

Lock Keys

  • 'capslock'
  • 'numlock'
  • 'scrolllock'

Punctuation

  • '.' / ',' / '?' / '!' / ';' / ':'
  • '[' / ']' / '{' / '}'
  • '(' / ')'
  • '+' / '-' / '*' / '/' / '='

🛡️ Safety Features

Failsafe Mode

Move mouse to any corner of the screen to abort all automation.

python
# Enable failsafe (enabled by default)
dc = DesktopController(failsafe=True)

Pause Control

python
# Pause all automation for 2 seconds
dc.pause(2.0)

# Check if automation is safe to proceed
if dc.is_safe():
    dc.click(500, 500)

Approval Mode

Require user confirmation before actions:

python
dc = DesktopController(require_approval=True)

# This will ask for confirmation
dc.click(500, 500)  # Prompt: "Allow click at (500, 500)? [y/n]"

🎨 Advanced Examples

Example 1: Automated Form Filling

python
dc = DesktopController()

# Click name field
dc.click(300, 200)
dc.type_text("John Doe", wpm=80)

# Tab to next field
dc.press('tab')
dc.type_text("john@example.com", wpm=80)

# Tab to password
dc.press('tab')
dc.type_text("SecurePassword123", wpm=60)

# Submit form
dc.press('enter')

Example 2: Screenshot Region and Save

python
# Capture specific area
region = (100, 100, 800, 600)  # left, top, width, height
img = dc.screenshot(region=region)

# Save with timestamp
import datetime
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
img.save(f"capture_{timestamp}.png")

Example 3: Multi-File Selection

python
# Hold Ctrl and click multiple files
dc.key_down('ctrl')
dc.click(100, 200)  # First file
dc.click(100, 250)  # Second file
dc.click(100, 300)  # Third file
dc.key_up('ctrl')

# Copy selected files
dc.hotkey('ctrl', 'c')

Example 4: Window Automation

python
# Activate Calculator
dc.activate_window("Calculator")
time.sleep(0.5)

# Type calculation
dc.type_text("5+3=", interval=0.2)
time.sleep(0.5)

# Take screenshot of result
dc.screenshot(filename="calculation_result.png")

Example 5: Drag & Drop File

python
# Drag file from source to destination
dc.drag(
    start_x=200, start_y=300,  # File location
    end_x=800, end_y=500,       # Folder location
    duration=1.0                 # Smooth 1-second drag
)

⚡ Performance Tips

  1. Use instant movements for speed: duration=0
  2. Batch operations instead of individual calls
  3. Cache screen positions instead of recalculating
  4. Disable failsafe for maximum performance (use with caution)
  5. Use hotkeys instead of menu navigation

⚠️ Important Notes

  • Screen coordinates start at (0, 0) in top-left corner
  • Multi-monitor setups may have negative coordinates for secondary displays
  • Windows DPI scaling may affect coordinate accuracy
  • Failsafe corners are: (0,0), (width-1, 0), (0, height-1), (width-1, height-1)
  • Some applications may block simulated input (games, secure apps)

🔧 Troubleshooting

Mouse not moving to correct position

  • Check DPI scaling settings
  • Verify screen resolution matches expectations
  • Use get_screen_size() to confirm dimensions

Keyboard input not working

  • Ensure target application has focus
  • Some apps require admin privileges
  • Try increasing interval for reliability

Failsafe triggering accidentally

  • Increase screen border tolerance
  • Move mouse away from corners during normal use
  • Disable if needed: DesktopController(failsafe=False)

Permission errors

  • Run Python with administrator privileges for some operations
  • Some secure applications block automation

📦 Dependencies

  • PyAutoGUI - Core automation engine
  • Pillow - Image processing
  • OpenCV (optional) - Image recognition
  • PyGetWindow - Window management

Install all:

bash
pip install pyautogui pillow opencv-python pygetwindow

Built for OpenClaw - The ultimate desktop automation companion 🦞

Expand your agent's capabilities with these related and highly-rated skills.

LeoYeAI/openclaw-master-skills

audit-website

Audit websites for SEO, performance, security, technical, content, and 15 other issue cateories with 230+ rules using the squirrelscan CLI. Returns LLM-optimized reports with health scores, broken links, meta tag analysis, and actionable recommendations. Use to discover and asses website or webapp issues and health.

1,878 294
Explore
LeoYeAI/openclaw-master-skills

firecrawl

Web search and scraping via Firecrawl API. Use when you need to search the web, scrape websites (including JS-heavy pages), crawl entire sites, or extract structured data from web pages. Requires FIRECRAWL_API_KEY environment variable.

1,878 294
Explore
LeoYeAI/openclaw-master-skills

computer-use

Full desktop computer use for headless Linux servers. Xvfb + XFCE virtual desktop with xdotool automation. 17 actions (click, type, scroll, screenshot, drag, etc). Unlike OpenClaw's browser tool, operates at the X11 level so websites cannot detect automation. Includes VNC for live viewing.

1,878 294
Explore
LeoYeAI/openclaw-master-skills

social-media-analyzer

Social media campaign analysis and performance tracking. Calculates engagement rates, ROI, and benchmarks across platforms. Use for analyzing social media performance, calculating engagement rate, measuring campaign ROI, comparing platform metrics, or benchmarking against industry standards.

1,878 294
Explore
LeoYeAI/openclaw-master-skills

business-growth-skills

4 production-ready business and growth skills: customer success manager with health scoring and churn prediction, sales engineer with RFP analysis, revenue operations with pipeline and GTM metrics, and contract & proposal writer. Python tools included (all stdlib-only). Works with Claude Code, Codex CLI, and OpenClaw.

1,878 294
Explore
LeoYeAI/openclaw-master-skills

contract-and-proposal-writer

Contract & Proposal Writer

1,878 294
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results