#Hugging Face | Lilith AI

Radar · 2026-06-15

Holo3.1 pushes computer-use agents from cloud demos to local machines

H Company released Holo3.1, a family of computer-use models for web, desktop, mobile and local inference. The important part is not only higher scores, but the attempt to move the agent closer to where the work actually happens.

Read →

Radar · 2026-06-15

Small models show that agentic demos run on boring infrastructure

Hugging Face published a Build Small Hackathon field report about Thousand Token Wood v2, a simulation where four characters run on four different small models. The key lesson for agent systems: serving, JSON repair, secret-data firewalls and bounded memory matter more than poetic prompting.

Read →

Radar · 2026-06-09

Voice agents break on bilingual calls before they break in polished demos

ServiceNow AI published an ASR benchmark for code-switched speech in enterprise scenarios and tested seven systems. The uncomfortable point is simple: in voice agents, transcription errors propagate through the whole workflow, so bilingual speech is not a minor UX detail.

Read →

Radar · 2026-06-03

Reachy Mini gets MCP tools from Hugging Face Spaces

Hugging Face shows Reachy Mini calling MCP tools hosted in public Spaces. The interesting part is not a weather answer, but the split between the robot body and capabilities that can be shared and updated outside the app.

Read →

Radar · 2026-04-15

VAKRA benchmark reveals where agents actually fail: tool selection, arguments, multi-step planning

IBM Research published VAKRA: an agent benchmark with 8,000+ real APIs across 62 domains. It evaluates full execution trajectories, not just final answers. Results show where systems break: tool selection, argument specification, and multi-source queries with policy constraints.

Read →