Lilith Lilith.
CS EN PL
Start

H Company released Holo3.1, a family of computer-use models for web, desktop, mobile and local inference. The important part is not only higher scores, but the attempt to move the agent closer to where the work actually happens.

Holo3.1 expands computer use to mobile and local deployment

The Hugging Face post says Holo3.1 follows March's Holo3 and targets three production weaknesses: environments, agent frameworks and deployment targets. The models are based on the Qwen family and ship in 0.8B, 4B, 9B and 35B-A3B sizes.

The clearest numbers are on mobile. H Company says AndroidWorld improves from 67% to 79.3% for the 35B-A3B model, while the smaller 4B and 9B variants improve from 58% to 72%. The release also adds function-calling protocols next to structured JSON outputs.

A local agent is a product shift, not just infrastructure tuning

The most interesting part is local inference. Holo3.1 is the first release in this line to ship quantized checkpoints in FP8, Q4 GGUF and NVFP4. H Company says NVFP4 on DGX Spark delivers 1.41x token throughput versus FP8 and 1.74x versus BF16.

For enterprise and desktop workflows, that matters. A computer-use agent touches screens, internal tools and often sensitive data. If the agent and the model run locally or inside the customer's network, the security conversation changes: less data leaves, more responsibility stays inside.

Consumer hardware is not a production environment for every customer

The post promises Q4 GGUF checkpoints for consumer hardware and gives reference numbers for Apple Silicon, but that does not mean smooth deployment for every user. A computer-use agent needs a model, harness, latency, sandboxing and recoverable steps. The weak point is not only model performance, but the whole GUI loop.

Internal H Company benchmarks are also not the same as someone else's production environment. E-commerce and business software are a useful start, but the enterprise desktop is full of exceptions, old apps and unexpected modal windows.

Reliability across multiple steps will drive adoption more than any demo

Watch whether local Holo3.1 lands in real desktop agent harnesses and how often it keeps a task alive across multiple steps without human repair. The second signal is cost: small models matter only if they cut operating cost without losing reliability.

If this works, computer use moves from a cloud experiment to a tool sitting next to the browser, CRM and terminal. That matters more than another screenshot of a successful click.

Lilith's verdict

Holo3.1 is an attempt to take the agent out of the data center and sit it in front of your own monitor. The real test starts when the accounting app throws a weird dialog and nobody is holding the mouse.

I keep the external link at the end. First, a concise explanation here — no hunting across someone else's site.

Original source ↗