Skip to main content

Actions in Depth

Actions are the verbs of a workflow. Each step executes exactly one action and then waits for its expect condition. This page covers every action type with practical examples drawn from real workflows.

For a quick reference of all action types, see Actions.

Mouse Actions

Click

Left-clicks the centre of the matched element.

- intent: click the Save button
action:
type: Click
scope: toolbar
selector: ">> [role=button][name=Save]"
expect:
type: DialogPresent
scope: main_window

DoubleClick

Double-clicks the element. Use for opening files, list items, or anything that requires a double-click to activate.

- intent: open the Downloads folder
action:
type: DoubleClick
scope: explorer
selector: ">> [role=list item][name=Downloads]"
expect:
type: ElementFound
scope: explorer
selector: ">> [role=list item][name=Documents]"

Hover

Moves the mouse to the element without clicking. Use to trigger hover menus, tooltips, or reveal hidden controls (such as close buttons on tabs).

- intent: hover tab to reveal close button
action:
type: Hover
scope: tab_list
selector: "> [role='tab item']"
expect:
type: ElementFound
scope: tab_list
selector: "> [role='tab item'] > [role=button][name^=Close]"

ClickAt

Clicks at a fractional position within the element's bounding box. x_pct and y_pct are 0.0–1.0. kind is one of left, double, triple, right, middle.

- intent: right-click the top-left of the canvas
action:
type: ClickAt
scope: canvas_panel
selector: ">> [role=document]"
x_pct: 0.1
y_pct: 0.1
kind: right

Use triple to select all text in an edit field when SetValue is not an option:

- intent: select all text and replace
action:
type: ClickAt
scope: dialog
selector: ">> [role=edit][name=Filename]"
x_pct: 0.5
y_pct: 0.5
kind: triple

ScrollIntoView

Scrolls ancestor containers until the element is within their visible viewport, then stops. Sends wheel events to each scrollable ancestor.

- intent: scroll to the Advanced section
action:
type: ScrollIntoView
scope: settings_panel
selector: ">> [role=group][name=Advanced]"
expect:
type: ElementVisible
scope: settings_panel
selector: ">> [role=group][name=Advanced]"
caution

Do not use ScrollIntoView for items in WinUI / UWP scrollable lists — wheel events trigger elastic scroll and the list snaps back. Use Invoke instead.

Keyboard Actions

TypeText

Types a string into the focused element. The text is sent as keystrokes. Supports {param.*} and {output.*} substitution.

- intent: type the search term
action:
type: TypeText
scope: search_bar
selector: ">> [role=edit]"
text: "{param.search_term}"
expect:
type: ElementHasText
scope: search_bar
selector: ">> [role=edit]"
pattern:
contains: "{param.search_term}"

TypeText appends to whatever is already in the field. To replace existing content, use SetValue instead.

PressKey

Sends a key or key chord. Accepts virtual key names and modifier combinations.

- intent: confirm with Enter
action:
type: PressKey
scope: main_window
selector: "*"
key: "{ENTER}"
expect:
type: DialogAbsent
scope: main_window

Common keys: {ENTER}, {TAB}, {ESC}, {BACKSPACE}, {DELETE}, {F5}, {UP}, {DOWN}, {HOME}, {END}

Modifier combinations: {CTRL+A}, {CTRL+S}, {CTRL+Z}, {ALT+F4}, {SHIFT+TAB}

Value Actions

SetValue

Sets the value of an edit field or combo box directly via UIA's ValuePattern, without simulating keystrokes. Unlike TypeText, it replaces the entire content in one operation.

- intent: set the output path
action:
type: SetValue
scope: save_dialog
selector: ">> [role=edit][name='File name:']"
value: "{output.full_path}"
expect:
type: ElementHasText
scope: save_dialog
selector: ">> [role=edit][name='File name:']"
pattern:
contains: "{output.full_path}"

Prefer SetValue over TypeText when you need to replace an existing value, or when the field processes each keystroke individually (e.g. search fields with live filtering).

UIA Pattern Actions

Invoke

Calls UIA's IInvokePattern::Invoke() on the element directly — no mouse click, no bounding rect required.

Use Invoke instead of Click for elements that are scrolled out of view. Off-screen elements in virtualised lists report a degenerate bounding box of (0, 0, 1, 1), which causes Click to fire at the wrong screen position. Invoke bypasses the bounding box entirely.

- intent: navigate to About page
action:
type: Invoke
scope: settings
selector: ">> [id=SettingsPageAbout_New]"
expect:
type: ElementFound
scope: settings
selector: ">> [name=Device Specifications]"

Invoke falls back to Click when the element does not support InvokePattern.

Focus

Gives keyboard focus to the element without clicking or invoking it.

- intent: focus the search field
action:
type: Focus
scope: toolbar
selector: ">> [role=edit][name=Search]"
expect:
type: Always

Window Actions

ActivateWindow

Brings the anchor's window to the foreground and restores it if minimized. Use this when a workflow needs to interact with a window that may have lost focus.

- intent: bring editor back to foreground
action:
type: ActivateWindow
scope: editor_window
expect:
type: WindowWithState
anchor: editor_window
state: active

MinimizeWindow

Minimizes the anchor's window.

- intent: minimize the file picker
action:
type: MinimizeWindow
scope: file_picker
expect:
type: Always

CloseWindow

Closes the anchor's window via its close button (sends WM_CLOSE). If closing triggers an unsaved-changes dialog, the dialog appears and the next step handles it.

- intent: close Notepad (triggers unsaved-changes dialog)
action:
type: CloseWindow
scope: notepad
expect:
type: DialogPresent
scope: notepad

Dialog Helpers

DismissDialog

Finds the first dialog child of the scope window and closes it. A shortcut for the common "close whatever dialog is open" case.

- intent: dismiss any open dialog
action:
type: DismissDialog
scope: main_window
expect:
type: DialogAbsent
scope: main_window

ClickForegroundButton

Clicks a button by name in the current OS foreground window. Useful in recovery handlers where an unexpected dialog has stolen focus and you do not know its anchor.

actions:
- type: ClickForegroundButton
name: OK

ClickForeground

Like ClickForegroundButton but matches any element type, not just buttons.

Data Actions

Extract

Reads an attribute from a matched element and stores it under a key. See Extracting Data for full details.

- intent: read the current font size
action:
type: Extract
key: font_size
scope: font_panel
selector: ">> [role='combo box'][name='Size'] > [role=edit]"
attribute: text
expect:
type: EvalCondition
expr: "output.font_size != ''"

Eval

Computes an expression and stores the result. Supports arithmetic, string concatenation, comparisons, and a range of built-in functions. See Expressions for the full syntax.

- intent: compute next font size cycling from 12 to 36
action:
type: Eval
key: new_size
expr: "(font_size + 8) % 24 + 12"
expect:
type: Always

Use the optional output: field to also push the result into the workflow output buffer (for use in subflow outputs or later {output.*} substitutions):

- intent: build and publish the output path
action:
type: Eval
key: full_path
expr: "param.output_dir + output.filename"
output: full_path
expect:
type: Always

WriteOutput

Writes all values stored under a key in the output buffer to a file, one CSV-quoted line each. Creates or truncates the file. path supports {output.*} substitution.

- intent: save extracted rows to CSV
action:
type: WriteOutput
key: rows
path: "{param.output_dir}\\results.csv"
expect:
type: FileExists
path: "{param.output_dir}\\results.csv"

System Actions

Exec

Runs an external process and waits for it to exit. Resolves command via PATH. If key is set, stdout is captured line-by-line into that output key.

- intent: run the post-processing script
action:
type: Exec
command: python
args:
- "{workflow.dir}\\process.py"
- "--input"
- "{output.export_path}"
expect:
type: ExecSucceeded

MoveFile

Moves or renames a file. Creates the destination directory if needed. Fails if the destination already exists.

- intent: move the exported file to the archive folder
action:
type: MoveFile
source: "{output.export_path}"
destination: "{param.archive_dir}\\{output.filename}"
expect:
type: FileExists
path: "{param.archive_dir}\\{output.filename}"

Flow Actions

NoOp

Performs no action but still evaluates its expect condition. Use it to wait for a state without touching the UI.

- intent: wait for the progress dialog to disappear
action:
type: NoOp
expect:
type: DialogAbsent
scope: main_window
timeout: 60s

Sleep

Pauses for a fixed duration before the expect condition is evaluated. Use sparingly — prefer polling with NoOp whenever the wait duration is unpredictable.

- intent: wait for tooltip to appear after hover
action:
type: Sleep
duration: 300ms
expect:
type: ElementFound
scope: main_window
selector: ">> [role=tooltip]"

run_actions: Trying Actions Interactively

The run_actions MCP tool lets an agent execute a sequence of steps against a live window without creating a workflow file. It is the fastest way to test a selector, verify an action works, or prototype a phase before committing it to YAML.

All the same action types are available. The agent specifies the target window (by hwnd, title, process, or PID) and a list of steps in the same format as a workflow phase.