← Library · agents

Computer-use agents — the model that clicks

Computer-use agents — the model that clicks

A computer-use agent sees the screen and controls the UI. It sounds like sci-fi; in practice it is fragile automation over pixels, forms and badly labelled buttons.

What it is

A computer-use agent receives a screenshot or UI tree, decides where to click or what to type, and performs the action through a browser or desktop. It is not the same as an API integration: UI is designed for humans, not deterministic machines.

Why it is tempting

Many tools have poor APIs, internal apps are old, and people work through browsers anyway. An agent that can fill a form, download a report or compare screens can route around years of integration debt.

Why it is dangerous

UI changes, buttons look alike, modals cover pages and the model can click a destructive action. Computer-use agents need confirmations, sandboxes, limited accounts and no access to things outside the task.

What to remember

Computer-use is a great fallback, not an ideal integration layer. If an API exists, use the API. If it does not, expect fragility and log every click.