UI-TARS-desktop

UI-TARS-desktop

Multimodal AI Agent Desktop Stack for Automated GUI Interaction

18,276
Stars
1,732
Forks
18,276
Watchers
300
Issues
UI-TARS-desktop provides a native desktop application for AI-driven GUI automation, powered by vision-language models. It supports both local and remote computer/browser operators, enabling sophisticated control via natural language. Built on the Model Context Protocol (MCP), it integrates real-world tools and supports out-of-the-box CLI and Web UI workflows. The project enables human-like task completion with features such as visual recognition, context-driven event streaming, and hybrid browser automation.

Key Features

Multimodal AI agent supporting vision and language inputs
Native desktop application with cross-platform support
Local and remote computer/browser operator capabilities
Event stream-driven context engineering
Hybrid browser automation (GUI agent, DOM, and strategy blending)
One-click CLI and Web UI for agent deployment
Integration with real-world tools via MCP servers
Precise mouse and keyboard automation
Screenshot and visual recognition
Private local processing and real-time feedback display

Use Cases

Automating software configuration tasks via natural language
Remote desktop or browser operations for IT support
Automated data extraction or website navigation
Booking flights, hotels, or organizing travel logistics
Generating charts and data visualizations from web content
Compiling transportation or usage guides
Automated QA and regression testing of GUIs
Accessibility assistance for users with disabilities
Multistep workflow orchestration across applications
Community-driven expansion with custom tool integrations

README

Introduction

English | 简体中文

TARS* is a Multimodal AI Agent stack, currently shipping two projects: Agent TARS and UI-TARS-desktop:

Table of Contents

News

  • [2025-06-25] We released a Agent TARS Beta and Agent TARS CLI - Introducing Agent TARS Beta, a multimodal AI agent that aims to explore a work form that is closer to human-like task completion through rich multimodal capabilities (such as GUI Agent, Vision) and seamless integration with various real-world tools.
  • [2025-06-12] - 🎁 We are thrilled to announce the release of UI-TARS Desktop v0.2.0! This update introduces two powerful new features: Remote Computer Operator and Remote Browser Operator—both completely free. No configuration required: simply click to remotely control any computer or browser, and experience a new level of convenience and intelligence.
  • [2025-04-17] - 🎉 We're thrilled to announce the release of new UI-TARS Desktop application v0.1.0, featuring a redesigned Agent UI. The application enhances the computer using experience, introduces new browser operation features, and supports the advanced UI-TARS-1.5 model for improved performance and precise control.
  • [2025-02-20] - 📦 Introduced UI TARS SDK, is a powerful cross-platform toolkit for building GUI automation agents.
  • [2025-01-23] - 🚀 We updated the Cloud Deployment section in the 中文版: GUI模型部署教程 with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.

Agent TARS

Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product. It primarily ships with a CLI and Web UI for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools.

Showcase

Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline

https://github.com/user-attachments/assets/772b0eef-aef7-4ab9-8cb0-9611820539d8

For more use cases, please check out #842.

Core Features

  • 🖱️ One-Click Out-of-the-box CLI - Supports both headful Web UI and headless server) execution.
  • 🌐 Hybrid Browser Agent - Control browsers using GUI Agent, DOM, or a hybrid strategy.
  • 🔄 Event Stream - Protocol-driven Event Stream drives Context Engineering and Agent UI.
  • 🧰 MCP Integration - The kernel is built on MCP and also supports mounting MCP Servers to connect to real-world tools.

Quick Start

bash
# Luanch with `npx`.
npx @agent-tars/cli@latest

# Install globally, required Node.js >= 22
npm install @agent-tars/cli@latest -g

# Run with your preferred model provider
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key

Visit the comprehensive Quick Start guide for detailed setup instructions.

Documentation

🌟 Explore Agent TARS Universe 🌟

UI-TARS Desktop

UI-TARS Desktop is a native GUI agent for your local computer, driven by UI-TARS and Seed-1.5-VL/1.6 series models.

Showcase

Instruction Local Operator Remote Operator
Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting.
Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub?

Features

  • 🤖 Natural language control powered by Vision-Language Model
  • 🖥️ Screenshot and visual recognition support
  • 🎯 Precise mouse and keyboard control
  • 💻 Cross-platform support (Windows/MacOS/Browser)
  • 🔄 Real-time feedback and status display
  • 🔐 Private and secure - fully local processing

Quick Start

See Quick Start

Contributing

See CONTRIBUTING.md.

License

This project is licensed under the Apache License 2.0.

Citation

If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:

BibTeX
@article{qin2025ui,
  title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
  author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
  journal={arXiv preprint arXiv:2501.12326},
  year={2025}
}

Star History

Star History Chart

Repository Owner

bytedance
bytedance

Organization

Repository Details

Language TypeScript
Default Branch main
Size 183,950 KB
Contributors 30
License Apache License 2.0
MCP Verified Sep 5, 2025

Programming Languages

TypeScript
93.18%
MDX
3.71%
JavaScript
1.29%
CSS
1.28%
Less
0.26%
HTML
0.18%
Shell
0.05%
Dockerfile
0.05%

Topics

agent agent-tars browser-use computer-use gui-agent gui-operator mcp mcp-server multimodal tars ui-tars vision vlm

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

We respect your privacy. Unsubscribe at any time.

Related MCPs

Discover similar Model Context Protocol servers

  • 1mcp-app/agent

    1mcp-app/agent

    A unified server that aggregates and manages multiple Model Context Protocol servers.

    1MCP Agent provides a single, unified interface that aggregates multiple Model Context Protocol (MCP) servers, enabling seamless integration and management of external tools for AI assistants. It acts as a proxy, managing server configuration, authentication, health monitoring, and dynamic server control with features like asynchronous loading, tag-based filtering, and advanced security options. Compatible with popular AI development environments, it simplifies setup by reducing redundant server instances and resource usage. Users can configure, monitor, and scale model tool integrations across various AI clients through easy CLI commands or Docker deployment.

    • 96
    • MCP
    • 1mcp-app/agent
  • mcp

    mcp

    Universal remote MCP server connecting AI clients to productivity tools.

    WayStation MCP acts as a remote Model Context Protocol (MCP) server, enabling seamless integration between AI clients like Claude or Cursor and a wide range of productivity applications, such as Notion, Monday, Airtable, Jira, and more. It supports multiple secure connection transports and offers both general and user-specific preauthenticated endpoints. The platform emphasizes ease of integration, OAuth2-based authentication, and broad app compatibility. Users can manage their integrations through a user dashboard, simplifying complex workflow automations for AI-powered productivity.

    • 27
    • MCP
    • waystation-ai/mcp
  • mcpmcp-server

    mcpmcp-server

    Seamlessly discover, set up, and integrate MCP servers with AI clients.

    mcpmcp-server enables users to discover, configure, and connect MCP servers with preferred clients, optimizing AI integration into daily workflows. It supports streamlined setup via JSON configuration, ensuring compatibility with various platforms such as Claude Desktop on macOS. The project simplifies the connection process between AI clients and remote Model Context Protocol servers. Users are directed to an associated homepage for further platform-specific guidance.

    • 17
    • MCP
    • glenngillen/mcpmcp-server
  • Didn't find tool you were looking for?

    Be as detailed as possible for better results