Agent skill

gui-automation

Use when you need to visually interact with a GUI — test buttons, fill forms, verify visual layouts, fuzz web pages, automate user flows, take screenshots, or perform end-to-end QA on any application. Works on cloud VMs, Docker containers, local machines, and sandboxes. Install: pip install cua.

View SKILL.md on GitHub Repository

Stars 13,450

Forks 833

Install this agent skill to your Project

npx add-skill https://github.com/trycua/cua/tree/main/skills/gui-automation

SKILL.md

GUI Automation

CUA gives you eyes and hands on a real computer: see the screen, move the mouse, click, type, drag, and manage windows — like a human at the keyboard.

Use this skill for visual interaction that can't be done via shell or API.

Setup

bash

cua --version          # check install; if missing: pip install cua

# Connect to target (pick one)
cua do switch cloud my-vm
cua do switch docker my-container
cua do-host-consent && cua do switch host   # local machine (one-time consent)

ANTHROPIC_API_KEY is optional. With it, cua do snapshot returns an AI-annotated screen with element coordinates. Without it, use screenshot and read the image yourself.

Workflow

Look → Act → Verify — repeat until done, then share:

bash

cua do screenshot          # look
cua do click 450 280       # act
cua do screenshot          # verify
cua trajectory share       # share replay link with user

Re-screenshot after every UI change — coordinates go stale when the screen changes.

Scenarios

Click a button

bash

cua do screenshot
cua do click 450 280
cua do screenshot

Fill a form

bash

cua do screenshot
cua do click 400 200 && cua do type "Jane Doe"
cua do key tab            && cua do type "jane@example.com"
cua do key tab            && cua do type "SecureP@ss123"
cua do click 400 500
cua do screenshot

File upload dialog

bash

cua do click 350 400       # "Choose File"
cua do type "/home/user/report.pdf"
cua do key enter
cua do screenshot

Zoom in for precision clicks (host or small targets)

When clicking small or dense UI elements — especially on the host machine — zoom into the target window first. Coordinates become window-relative and screenshots show only that window, giving you higher effective resolution.

bash

cua do zoom "Google Chrome"   # crop to Chrome window; coords are now window-relative
cua do screenshot              # zoomed view — easier to locate small elements
cua do click 112 44            # precise click on a small tab or button
cua do screenshot              # verify
cua do unzoom                  # restore full-screen coords when done
cua do screenshot              # back to full desktop view

Use zoom any time click accuracy is uncertain. unzoom before switching windows or when you need to see the full desktop again.

Drag and drop

bash

cua do window ls               # list open windows
cua do drag 150 300 650 400    # source → destination
cua do screenshot

Fuzz a form

bash

cua do screenshot
cua do click 400 200
cua do type "<script>alert(1)</script>"
cua do key tab && cua do type "'; DROP TABLE users; --"
cua do key tab && cua do type "AAAAAAAAAAAAAAAAAAAAAAA"
cua do click 400 500
cua do screenshot              # check for errors, crashes, unexpected behavior

Trajectory

Every action is auto-recorded to ~/.cua/trajectories/{machine}/{session}/.

bash

cua trajectory share           # upload and get shareable HTTPS link (always do this at end)
cua trajectory ls              # list sessions
cua trajectory export          # generate HTML report
cua do --no-record click 100 200   # disable recording for a single action

Tell the user: "Here is the trajectory of my session: {url}"

Quick Reference

Action	Command
Connect to target	`cua do switch <provider> [name]`
Screenshot	`cua do screenshot`
AI-annotated screen	`cua do snapshot ["instructions"]`
Click	`cua do click <x> <y> [left\|right\|middle]`
Double-click	`cua do dclick <x> <y>`
Type text	`cua do type "text"`
Press key	`cua do key <key>`
Hotkey	`cua do hotkey <combo>` (e.g. `ctrl+c`)
Scroll	`cua do scroll <direction> [amount]`
Drag	`cua do drag <x1> <y1> <x2> <y2>`
Move cursor	`cua do move <x> <y>`
Shell command	`cua do shell "command"`
Open URL/file	`cua do open <url\|path>`
List windows	`cua do window ls [app]`
Focus window	`cua do window focus <id>`
Zoom to window	`cua do zoom "App Name"`
Unzoom	`cua do unzoom`
Share trajectory	`cua trajectory share`

Providers

Provider	Example
`cloud`	`cua do switch cloud my-vm`
`cloudv2`	`cua do switch cloudv2 my-vm`
`docker`	`cua do switch docker my-container`
`lume`	`cua do switch lume my-vm`
`lumier`	`cua do switch lumier my-vm`
`winsandbox`	`cua do switch winsandbox`
`host`	`cua do switch host`

See references/command-reference.md for full argument syntax.

Maintainer

trycua Core maintainer

Source details

Full Name: trycua/cua
Branch: main
Path in repo: skills/gui-automation
License: MIT License
Topics: agent ai-agent manus hacktoberfest swift apple macos windows desktop-automation operator computer-use computer-use-agent cua virtualization containerization lume virtualization-framework windows-sandbox

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

zebbern/claude-code-guide

Ethical Hacking Methodology

This skill should be used when the user asks to "learn ethical hacking", "understand penetration testing lifecycle", "perform reconnaissance", "conduct security scanning", "exploit vulnerabilities", or "write penetration test reports". It provides comprehensive ethical hacking methodology and techniques.

3,752 348

Explore

zebbern/claude-code-guide

Red Team Tools and Methodology

This skill should be used when the user asks to "follow red team methodology", "perform bug bounty hunting", "automate reconnaissance", "hunt for XSS vulnerabilities", "enumerate subdomains", or needs security researcher techniques and tool configurations from top bug bounty hunters.

3,752 348

Explore

zebbern/claude-code-guide

Shodan Reconnaissance and Pentesting

This skill should be used when the user asks to "search for exposed devices on the internet," "perform Shodan reconnaissance," "find vulnerable services using Shodan," "scan IP ranges with Shodan," or "discover IoT devices and open ports." It provides comprehensive guidance for using Shodan's search engine, CLI, and API for penetration testing reconnaissance.

3,752 348

Explore

zebbern/claude-code-guide

SMTP Penetration Testing

This skill should be used when the user asks to "perform SMTP penetration testing", "enumerate email users", "test for open mail relays", "grab SMTP banners", "brute force email credentials", or "assess mail server security". It provides comprehensive techniques for testing SMTP server security.

3,752 348

Explore

zebbern/claude-code-guide

Linux Privilege Escalation

This skill should be used when the user asks to "escalate privileges on Linux", "find privesc vectors on Linux systems", "exploit sudo misconfigurations", "abuse SUID binaries", "exploit cron jobs for root access", "enumerate Linux systems for privilege escalation", or "gain root access from low-privilege shell". It provides comprehensive techniques for identifying and exploiting privilege escalation paths on Linux systems.

3,752 348

Explore

zebbern/claude-code-guide

Wireshark Network Traffic Analysis

This skill should be used when the user asks to "analyze network traffic with Wireshark", "capture packets for troubleshooting", "filter PCAP files", "follow TCP/UDP streams", "detect network anomalies", "investigate suspicious traffic", or "perform protocol analysis". It provides comprehensive techniques for network packet capture, filtering, and analysis using Wireshark.

3,752 348

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

GUI Automation

Setup

Workflow

Scenarios

Click a button

Fill a form

File upload dialog

Zoom in for precision clicks (host or small targets)

Drag and drop

Fuzz a form

Trajectory

Quick Reference

Providers

Recommended Agent Skills

Ethical Hacking Methodology

Red Team Tools and Methodology

Shodan Reconnaissance and Pentesting

SMTP Penetration Testing

Linux Privilege Escalation

Wireshark Network Traffic Analysis