Skip to main content

MCP Tools Reference

automata-agent exposes all ui-automata capabilities to Claude as MCP tools. Each tool covers a category of operations; within a tool, an action parameter selects the specific operation. This keeps the tool list short and the schema self-documenting.

Workflow Tools

workflow

Manage the workflow lifecycle.

ActionDescription
listList available workflow files
statusGet the status of the currently running workflow
cancelCancel the running workflow
list_runsList recent run logs with elapsed time
lintValidate a workflow YAML string without executing it

start_workflow

Run a workflow file to completion and return its outputs. Accepts a file path or an inline YAML string and a params map. Streams phase progress notifications back to the client while running.

This is the primary way an agent executes a predefined automation — hand it a workflow from the library and let the executor handle all the UI interaction.

run_actions

Execute a list of UI automation steps directly, without a workflow file. Each step has an intent, an action, and an expect condition — the same structure as a workflow step. The executor runs them against a live window and returns the results.

Useful for one-off interactions or for an agent composing its own steps on the fly rather than calling a pre-built workflow.

UI Inspection

desktop

Inspect the Windows UI element tree.

ActionDescription
list_windowsList all top-level windows with HWND, title, process, and bounds
element_treeDump the full UIA element tree of a window; accepts an optional selector to filter to matched subtrees
find_elementsRun a live selector query against a window and return all matches with role, name, and bounds

The primary tool for an agent exploring an unfamiliar application before writing selectors or deciding how to interact with it. Works alongside visiondesktop gives the UIA structure; vision gives the visual layout.

vision

OCR and visual layout capture.

ActionDescription
window_layoutCapture visible text and UI regions for a specific window
screen_layoutFull-screen capture
window_probeHover-probe mode — discovers interactable elements not exposed via UIA by simulating mouse movement

Use vision when UIA gives incomplete information — custom-rendered controls or applications that draw their own widgets. Coordinates returned by vision are in screen space and can be passed directly to input mouse_click.

Application Management

app

Launch and manage applications and windows.

ActionDescription
list_installedList installed applications
launchLaunch an application; wait strategy: new_pid, new_window, or match_any
list_taskbarList windows pinned or open in the taskbar
focusBring a window to the foreground by HWND
show_desktopMinimize all windows and show the desktop
list_task_view_windowsList windows visible in Task View
activate_task_view_windowSwitch to a window via Task View

window

Manipulate a specific window by HWND.

ActionDescription
minimizeMinimize the window
maximizeMaximize the window
restoreRestore to normal size
closeSend WM_CLOSE
set_boundsReposition and resize the window
screenshotCapture a screenshot of the window

Get the HWND from desktop list_windows.

Input

input

Raw mouse and keyboard input. Operates at the OS level — works on any window regardless of UIA support.

ActionDescription
mouse_moveMove the cursor to screen coordinates
mouse_clickClick at coordinates; button: left, right, middle, double, triple
mouse_dragClick and drag from one point to another
mouse_scrollScroll at coordinates
key_pressSend a key or chord with modifier syntax ({ctrl}v, {alt}{F4})
type_textType a string as keystrokes
get_cursor_posReturn the current cursor position

clipboard

Read or write the Windows clipboard.

ActionDescription
readReturn the current clipboard text
writeSet the clipboard to a string

Useful for extracting text that is easier to copy than to read via UIA, or for injecting data into an application via paste.

Browser

browser

Control Microsoft Edge via the Chrome DevTools Protocol.

ActionDescription
ensureStart Edge with a CDP debug port (must call first)
tabsList open tabs with id, title, and URL
navigateNavigate a tab to a URL
evalEvaluate JavaScript in a tab and return the result
domRead the DOM tree of a tab
screenshotCapture a screenshot of a tab
openOpen a new tab
activateSwitch to a tab by id
closeClose a tab by id

For web-based workflows where UIA is insufficient — Edge's UIA tree does not expose full page content. CDP gives direct access to the page DOM and JavaScript runtime.

Filesystem

file

Full filesystem access.

ActionDescription
listList files in a directory
readRead a file's full contents
read_linesRead a specific line range from a large file
writeWrite (create or overwrite) a file
appendAppend to a file
copyCopy a file
moveMove or rename a file
deleteDelete a file
mkdirCreate a directory
rmdirRemove a directory
statGet file metadata (size, modified time, etc.)
globSearch with a glob pattern
checksumCompute a file checksum

Handles binary files (base64-encoded) and large text files (via read_lines).

System

system

Shell execution, process management, and system information.

ActionDescription
execRun a program directly (no shell); pass program and arguments as an array
pingCheck connectivity to the agent
whoamiReturn the current Windows username
ipconfigReturn network adapter configuration
route_tableReturn the IP routing table
get_pathReturn the current PATH environment variable
list_processesList running processes
kill_processTerminate a process by PID

cmd.exe and powershell.exe can be passed as the program to exec for shell-style invocation.

Library

resources

Browse the embedded workflow library.

ActionDescription
listList available workflows with their descriptions and parameters
readRead a specific workflow file by path

An agent can use this to discover what pre-built workflows are available before deciding how to approach a task.