UI Automata
Windows Desktop Control and Automation for AI Agents
A declarative workflow engine that gives AI agents structured,
observable, repeatable control over any Windows GUI app.
Declarative
Declarative YAML workflows that are robust by design: expected outcomes on every step, automatic recovery from bad states, and no fragile sleep timers.
Cross-Framework Coverage
Across all UI surfaces — Win32, MFC, WPF, UWP, WinUI 3, terminal window, and web browser. All access modes supported: UIA, DOM, and pure vision.
Agent-Native
Explore live UI element trees interactively, author workflows with schema guidance and linting, and drive everything through rich MCP tools designed for AI agents.
Industrial-Grade Apps
Automate complex, professional-grade applications with deep, dense UI hierarchies. Reliable automation for the apps that matter in real workflows.
UI Automata vs Computer Use
| UI Automata | Computer Use (Cowork) | |
|---|---|---|
| Approach | UIA elements + DOM query + vision | Screenshot only |
| Reliability | Deterministic — same selector works across runs | May vary across runs |
| Speed | Sub-second per step | Round-trip to inference API per step |
| Cost | Low — runs locally, no per-step inference | High — every step consumes token |
| Vision | On-device, used as fallback | Cloud inference, primary approach |
| Platform | Windows (all frameworks) | macOS-first, limited Windows |
| Model dependency | Any agent, any model | Locked to Claude |
| Browser automation | CDP (structured page access) | Screenshot of browser |
| Trace | Structured log with detailed action per step | Sequence of screenshots |