MCP Tools Reference

automata-agent exposes all ui-automata capabilities to Claude as MCP tools. Each tool covers a category of operations; within a tool, an action parameter selects the specific operation. This keeps the tool list short and the schema self-documenting.

Workflow Tools

`workflow`

Manage the workflow lifecycle.

Action	Description
`list`	List available workflow files
`status`	Get the status of the currently running workflow
`cancel`	Cancel the running workflow
`list_runs`	List recent run logs with elapsed time
`lint`	Validate a workflow YAML string without executing it

`start_workflow`

Run a workflow file to completion and return its outputs. Accepts a file path or an inline YAML string and a params map. Streams phase progress notifications back to the client while running.

This is the primary way an agent executes a predefined automation — hand it a workflow from the library and let the executor handle all the UI interaction.

`run_actions`

Execute a list of UI automation steps directly, without a workflow file. Each step has an intent, an action, and an expect condition — the same structure as a workflow step. The executor runs them against a live window and returns the results.

Useful for one-off interactions or for an agent composing its own steps on the fly rather than calling a pre-built workflow.

UI Inspection

`desktop`

Inspect the Windows UI element tree.

Action	Description
`list_windows`	List all top-level windows with HWND, title, process, and bounds
`element_tree`	Dump the full UIA element tree of a window; accepts an optional `selector` to filter to matched subtrees
`find_elements`	Run a live selector query against a window and return all matches with role, name, and bounds

The primary tool for an agent exploring an unfamiliar application before writing selectors or deciding how to interact with it. Works alongside vision — desktop gives the UIA structure; vision gives the visual layout.

`vision`

OCR and visual layout capture.

Action	Description
`window_layout`	Capture visible text and UI regions for a specific window
`screen_layout`	Full-screen capture
`window_probe`	Hover-probe mode — discovers interactable elements not exposed via UIA by simulating mouse movement

Use vision when UIA gives incomplete information — custom-rendered controls or applications that draw their own widgets. Coordinates returned by vision are in screen space and can be passed directly to input mouse_click.

Application Management

`app`

Launch and manage applications and windows.

Action	Description
`list_installed`	List installed applications
`launch`	Launch an application; `wait` strategy: `new_pid`, `new_window`, or `match_any`
`list_taskbar`	List windows pinned or open in the taskbar
`focus`	Bring a window to the foreground by HWND
`show_desktop`	Minimize all windows and show the desktop
`list_task_view_windows`	List windows visible in Task View
`activate_task_view_window`	Switch to a window via Task View

`window`

Manipulate a specific window by HWND.

Action	Description
`minimize`	Minimize the window
`maximize`	Maximize the window
`restore`	Restore to normal size
`close`	Send WM_CLOSE
`set_bounds`	Reposition and resize the window
`screenshot`	Capture a screenshot of the window

Get the HWND from desktop list_windows.

Input

`input`

Raw mouse and keyboard input. Operates at the OS level — works on any window regardless of UIA support.

Action	Description
`mouse_move`	Move the cursor to screen coordinates
`mouse_click`	Click at coordinates; `button`: `left`, `right`, `middle`, `double`, `triple`
`mouse_drag`	Click and drag from one point to another
`mouse_scroll`	Scroll at coordinates
`key_press`	Send a key or chord with modifier syntax (`{ctrl}v`, `{alt}{F4}`)
`type_text`	Type a string as keystrokes
`get_cursor_pos`	Return the current cursor position

`clipboard`

Read or write the Windows clipboard.

Action	Description
`read`	Return the current clipboard text
`write`	Set the clipboard to a string

Useful for extracting text that is easier to copy than to read via UIA, or for injecting data into an application via paste.

Browser

`browser`

Control Microsoft Edge via the Chrome DevTools Protocol.

Action	Description
`ensure`	Start Edge with a CDP debug port (must call first)
`tabs`	List open tabs with id, title, and URL
`navigate`	Navigate a tab to a URL
`eval`	Evaluate JavaScript in a tab and return the result
`dom`	Read the DOM tree of a tab
`screenshot`	Capture a screenshot of a tab
`open`	Open a new tab
`activate`	Switch to a tab by id
`close`	Close a tab by id

For web-based workflows where UIA is insufficient — Edge's UIA tree does not expose full page content. CDP gives direct access to the page DOM and JavaScript runtime.

Filesystem

`file`

Full filesystem access.

Action	Description
`list`	List files in a directory
`read`	Read a file's full contents
`read_lines`	Read a specific line range from a large file
`write`	Write (create or overwrite) a file
`append`	Append to a file
`copy`	Copy a file
`move`	Move or rename a file
`delete`	Delete a file
`mkdir`	Create a directory
`rmdir`	Remove a directory
`stat`	Get file metadata (size, modified time, etc.)
`glob`	Search with a glob pattern
`checksum`	Compute a file checksum

Handles binary files (base64-encoded) and large text files (via read_lines).

System

`system`

Shell execution, process management, and system information.

Action	Description
`exec`	Run a program directly (no shell); pass program and arguments as an array
`ping`	Check connectivity to the agent
`whoami`	Return the current Windows username
`ipconfig`	Return network adapter configuration
`route_table`	Return the IP routing table
`get_path`	Return the current `PATH` environment variable
`list_processes`	List running processes
`kill_process`	Terminate a process by PID

cmd.exe and powershell.exe can be passed as the program to exec for shell-style invocation.

Library

`resources`

Browse the embedded workflow library.

Action	Description
`list`	List available workflows with their descriptions and parameters
`read`	Read a specific workflow file by path

An agent can use this to discover what pre-built workflows are available before deciding how to approach a task.

MCP Tools Reference

Workflow Tools​

workflow​

start_workflow​

run_actions​

UI Inspection​

desktop​

vision​

Application Management​

app​

window​

Input​

input​

clipboard​

Browser​

browser​

Filesystem​

file​

System​

system​

Library​

resources​