#Agents | Lilith AI

From Radar

Radar · 2026-06-16

Anthropic paused Agent SDK billing after agents hit the price list

Anthropic paused its June 15 plan to move Claude Agent SDK, claude -p and some third-party agent use into a separate credit pool. Teams running automations get a short reprieve, not a settled answer on long-running agent costs.

Read →

Radar · 2026-06-16

Android 17 turns Pixel into Gemini’s showroom

Google released Android 17 and Wear OS 7 first for Pixel devices, alongside a Pixel Drop with Gemini Omni, Lyria 3 and translation features for the Pixel 10a. The bigger signal is not the OS update itself, but Google using Android as a distribution layer for AI models on the device.

Read →

Radar · 2026-06-15

Thirteen words on Reddit can poison an AI answer

Research described by 404 Media says a 13 word snippet of retrieved text from sites such as Reddit, Wikipedia, Quora or Facebook can push AI agents toward spam or scam output. For AI search, that turns SEO into a prompt injection and user-generated content moderation problem.

Read →

Radar · 2026-06-15

Holo3.1 pushes computer-use agents from cloud demos to local machines

H Company released Holo3.1, a family of computer-use models for web, desktop, mobile and local inference. The important part is not only higher scores, but the attempt to move the agent closer to where the work actually happens.

Read →

Radar · 2026-06-15

Uber puts a price tag on coding agents: $1,500 per tool each month

Uber is limiting monthly token spend to $1,500 per employee for each agentic coding tool, according to Bloomberg via Simon Willison. Coding agents are becoming a budget line item.

Read →

Radar · 2026-06-15

Google gives enterprise RAG a guard who knows when not to answer

Google introduced an agentic RAG system for Gemini Enterprise Agent Platform that checks whether it has enough context before answering. For companies, that brake matters more than another polished retrieval layer.

Read →

Radar · 2026-06-15

Simon Willison shows why an agent sandbox cannot be just another Python process

Simon Willison released the alpha package micropython-wasm and a Datasette Agent plugin that runs Python inside a WebAssembly sandbox. The important part is not the demo, but the boundary between a useful agent and code that can break its host application.

Read →

Radar · 2026-06-14

DOX: a tiny AGENTS.md trick for the big agent-context problem

Agent Zero released DOX, a tiny self-documenting AGENTS.md framework where agents maintain a hierarchy of local instructions before and after code edits.

Read →

Radar · 2026-06-13

Apple brings AI photo editing into Photos and reopens the old fight over photographic reality

The Verge tried the AI photo editing tools in iOS 27 and describes Reframe, Extend and Clean Up as the iPhone's first serious native set. Apple keeps them relatively restrained, which is exactly why they can reach a much broader audience.

Read →

Radar · 2026-06-10

Niteshift raises $7 million to make AI coding agents less sticky

Niteshift, founded by former Datadog engineers, raised a $7 million seed round led by Greylock and is selling infrastructure for AI coding agents. Its bet is not another autocomplete, but the ability to switch between GPT, Claude and open source models when the model provider becomes a competitor.

Read →

Radar · 2026-06-09

Agent cost is no longer a footnote. It is an engineering expense

Simon Willison shows how he manually added pricing for Claude Fable 5 in AgentsView and immediately saw the cost of local coding agents by project. The small trick points to a bigger shift: AI coding is starting to look like infrastructure consumption, not an app subscription.

Read →

Radar · 2026-06-09

Voice agents break on bilingual calls before they break in polished demos

ServiceNow AI published an ASR benchmark for code-switched speech in enterprise scenarios and tested seven systems. The uncomfortable point is simple: in voice agents, transcription errors propagate through the whole workflow, so bilingual speech is not a minor UX detail.

Read →

Radar · 2026-06-07

datasette-agent-edit tackles the boring part of agents: safe text edits

Simon Willison released datasette-agent-edit 0.1a0, a base plugin for Datasette Agent with view, str_replace and insert tools. It is not a flashy AI demo. It is the layer that decides whether an agent can edit text without casually breaking the file.

Read →

Radar · 2026-06-03

Reachy Mini gets MCP tools from Hugging Face Spaces

Hugging Face shows Reachy Mini calling MCP tools hosted in public Spaces. The interesting part is not a weather answer, but the split between the robot body and capabilities that can be shared and updated outside the app.

Read →

Radar · 2026-06-02

GitHub is preparing for a world where agents write commits at scale

The Latent Space interview with Kyle Daigle frames GitHub as a platform under pressure from agentic coding. The point is not another Copilot feature, but whether infrastructure built for human pace can absorb software produced by machines.

Read →

Radar · 2026-06-01

Search should not be a button. It should be programmable infrastructure for agents

Perplexity describes Search as Code: an architecture where an agent does not call one monolithic search engine, but assembles a retrieval pipeline as code. The point is not a nicer search API. It is control over how evidence is found, filtered and verified.

Read →

Radar · 2026-06-01

Video generation is moving from clip output to canvas agent

Latent Space frames xAI Grok Imagine, through an interview with Ethan He, as a move from one shot video generation toward video agents. The thesis will be proven less by demo quality than by whether the system can iterate through a whole creative task.

Read →

Radar · 2026-05-28

Async agents receive a spec, work in an isolated VM and leave a pull request in the repository by morning

A Latent Space discussion with Cognition and OpenInspect frames coding agents as asynchronous workers: spec-to-PR workflows, full VMs, agent memory, and situations where a PM ships a code change without a developer. The shift is from synchronous chat to delegating an entire work cycle.

Read →

Radar · 2026-05-28

Data Formulator 0.7 tries to rebuild enterprise data analytics around AI agents

Microsoft Research released Data Formulator 0.7, an analytics workspace where AI agents assist with exploration, transformation and visualization of enterprise data. The key question is whether the agent handles messy, permissioned data outside the demo.

Read →

Radar · 2026-05-27

SQLite draws a line: no agentic code, yes reproducible bugs

SQLite added an AGENTS.md file with a blunt rule for people pointing AI agents at the codebase: agentic code is not accepted, but high-quality reproducible bug reports can be useful. A small file, but a big signal for critical open source maintenance.

Read →

Radar · 2026-05-27

ITBench-AA: frontier models score below 50 % on Kubernetes SRE diagnostics

IBM Research and Artificial Analysis released the first benchmark for enterprise IT agents in a realistic Kubernetes environment on 27 May 2026. The top model (Claude Opus 4.7) reached 47 %. No frontier model exceeded 50 %.

Read →

Radar · 2026-05-27

Codex helps build self-improving tax agents

OpenAI, Thrive Holdings and Crete built Tax AI for more than 30 accounting firms. The pilot processed 7,000 returns, saves about one third of practitioner time and improved sharply within six weeks through a feedback loop powered by Codex.

Read →

Radar · 2026-05-27

Warp bets on an open-source agentic terminal with GPT-5.5

Warp is positioning the terminal as an agentic development environment rather than a command line wrapper. By open sourcing its client with OpenAI as a founding sponsor and leaning on GPT-5.5, it wants developers to set objectives and review outcomes while agents plan, code, test and open pull requests.

Read →

Radar · 2026-05-26

Interconnects maps the next phase of model competition

Nathan Lambert writes about Gemini Flash 3.5, Mythos, agent tools and the tension between open and closed models in his May outlook.

Read →

Radar · 2026-05-26

Copilot Cowork turns user permissions into a file exfiltration path via prompt injection

PromptArmor researchers demonstrated an attack chain in which Microsoft Copilot Cowork can help exfiltrate Microsoft 365 files through prompt injection. This is not only a product bug, but a warning for any agentic system with delegated permissions.

Read →

Radar · 2026-05-26

LWiAI #246: one week, four fronts at once. Google I/O, agents, lawyers, safety

LWiAI Podcast episode 246 from 26 May 2026 is a map, not a single thesis. Google I/O, coding agents, legal pressure around OpenAI and safety research landed in the same week and sketch four simultaneous pressures on the AI market.

Read →

Radar · 2026-05-22

AI Snake Oil asks: did Google agents really build an OS for $916, or was it a carefully lit demo?

AI Snake Oil examines the claim that Google AI agents built an operating system for $916. The key point: agentic announcements need a different type of verification than chat benchmarks, because a big goal and a few steps in a demo environment are easy to inflate.

Read →

Radar · 2026-05-22

Gartner names OpenAI a Leader in enterprise coding agents

OpenAI says Gartner named Codex a Leader in enterprise AI coding agents. For companies, this is mainly a procurement and governance signal, not proof of technical superiority.

Read →

Radar · 2026-05-21

MagenticLite combines small models, orchestration and local file access into one workflow without a frontier model

Microsoft Research describes MagenticLite, MagenticBrain and Fara1.5 as an agentic system optimized for small models that connects browser and local file system in a single workflow. The direction is practical: not one expensive model for everything, but orchestration of specialized components.

Read →

Radar · 2026-05-20

OpenAI moves Education for Countries toward national AI programs in education

OpenAI is moving Education for Countries toward national AI education programs. This is not only about ChatGPT access, but about shaping infrastructure, training, and operating habits around AI in the public sector.

Read →

From the Glossary

Glossary

Agent infrastructure — the boring layer agents need to work

An agent is not just a model with a task. In production it needs identity, permissions, inboxes, tools, memory, audit, telemetry and clear boundaries. Without infrastructure, autonomy is just a pretty demo with risk attached.

Read →

Glossary

Agents — when an LLM gets hands and memory

An LLM with tool use, a loop, and memory. Lots of marketing, few definitions. Here's the plain version.

Read →

Glossary

Async agents — work that does not live in chat

An agent that takes a task, runs outside the conversation, and returns a finished artifact. Powerful for long workflows, dangerous without state, limits and review.

Read →

Glossary

Agent safety and sandboxing

An agent with tools is a tiny machine for consequences. Sandboxes, approvals, least privilege and audit logs are not enterprise decoration; they are brakes before the fire.

Read →

Glossary

Coding agents — when the model touches the repo

Claude Code, Codex and friends are not magical juniors. They are a fast loop: read code, edit, run tests, repair fallout. Useful, but only with guardrails.

Read →

Glossary

Computer-use agents — the model that clicks

A computer-use agent sees the screen and controls the UI. It sounds like sci-fi; in practice it is fragile automation over pixels, forms and badly labelled buttons.

Read →

Glossary

Evals and benchmarks — measurement instead of vibes

A benchmark is not truth carved in stone. It is an instrument with error bars. Without it, though, you are only guessing whether a model or agent works.

Read →

Glossary

Koog and Kotlin AI agents — what it is and what it is for

Koog is JetBrains’ framework for building AI agents in Kotlin and Java. It focuses on practical architecture: strategies, tools, memory, tracing, long context and JVM production integration.

Read →

Glossary

Physical AI — when an agent reaches into the world

Physical AI connects models, robots, simulation and actions in the real environment. It is not about a cute robot demo, but about who carries the risk when a model starts moving things.

Read →

Glossary

Prompt injection — hostile instructions in your context

Prompt injection is not a party-trick jailbreak. It is a boundary problem: the model reads untrusted text and may confuse it for instructions. With agents, it burns twice as hot.

Read →

Glossary

Tool use — when a model calls tools

Tool use is the moment an LLM stops merely answering and starts calling APIs, running commands, reading files or touching databases. Useful, sharp and dangerous.

Read →