Tag
#Agents
From Radar
Radar · 2026-06-16
Anthropic paused Agent SDK billing after agents hit the price list
Anthropic paused its June 15 plan to move Claude Agent SDK, claude -p and some third-party agent use into a separate credit pool. Teams running automations get a short reprieve, not a settled answer on long-running agent costs.
Read →Radar · 2026-06-16
Android 17 turns Pixel into Gemini’s showroom
Google released Android 17 and Wear OS 7 first for Pixel devices, alongside a Pixel Drop with Gemini Omni, Lyria 3 and translation features for the Pixel 10a. The bigger signal is not the OS update itself, but Google using Android as a distribution layer for AI models on the device.
Read →Radar · 2026-06-15
Thirteen words on Reddit can poison an AI answer
Research described by 404 Media says a 13 word snippet of retrieved text from sites such as Reddit, Wikipedia, Quora or Facebook can push AI agents toward spam or scam output. For AI search, that turns SEO into a prompt injection and user-generated content moderation problem.
Read →Radar · 2026-06-15
Holo3.1 pushes computer-use agents from cloud demos to local machines
H Company released Holo3.1, a family of computer-use models for web, desktop, mobile and local inference. The important part is not only higher scores, but the attempt to move the agent closer to where the work actually happens.
Read →Radar · 2026-06-15
Uber puts a price tag on coding agents: $1,500 per tool each month
Uber is limiting monthly token spend to $1,500 per employee for each agentic coding tool, according to Bloomberg via Simon Willison. Coding agents are becoming a budget line item.
Read →Radar · 2026-06-15
Google gives enterprise RAG a guard who knows when not to answer
Google introduced an agentic RAG system for Gemini Enterprise Agent Platform that checks whether it has enough context before answering. For companies, that brake matters more than another polished retrieval layer.
Read →Radar · 2026-06-15
Simon Willison shows why an agent sandbox cannot be just another Python process
Simon Willison released the alpha package micropython-wasm and a Datasette Agent plugin that runs Python inside a WebAssembly sandbox. The important part is not the demo, but the boundary between a useful agent and code that can break its host application.
Read →Radar · 2026-06-14
DOX: a tiny AGENTS.md trick for the big agent-context problem
Agent Zero released DOX, a tiny self-documenting AGENTS.md framework where agents maintain a hierarchy of local instructions before and after code edits.
Read →Radar · 2026-06-13
Apple brings AI photo editing into Photos and reopens the old fight over photographic reality
The Verge tried the AI photo editing tools in iOS 27 and describes Reframe, Extend and Clean Up as the iPhone's first serious native set. Apple keeps them relatively restrained, which is exactly why they can reach a much broader audience.
Read →Radar · 2026-06-10
Niteshift raises $7 million to make AI coding agents less sticky
Niteshift, founded by former Datadog engineers, raised a $7 million seed round led by Greylock and is selling infrastructure for AI coding agents. Its bet is not another autocomplete, but the ability to switch between GPT, Claude and open source models when the model provider becomes a competitor.
Read →Radar · 2026-06-09
Agent cost is no longer a footnote. It is an engineering expense
Simon Willison shows how he manually added pricing for Claude Fable 5 in AgentsView and immediately saw the cost of local coding agents by project. The small trick points to a bigger shift: AI coding is starting to look like infrastructure consumption, not an app subscription.
Read →Radar · 2026-06-09
Voice agents break on bilingual calls before they break in polished demos
ServiceNow AI published an ASR benchmark for code-switched speech in enterprise scenarios and tested seven systems. The uncomfortable point is simple: in voice agents, transcription errors propagate through the whole workflow, so bilingual speech is not a minor UX detail.
Read →Radar · 2026-06-07
datasette-agent-edit tackles the boring part of agents: safe text edits
Simon Willison released datasette-agent-edit 0.1a0, a base plugin for Datasette Agent with view, str_replace and insert tools. It is not a flashy AI demo. It is the layer that decides whether an agent can edit text without casually breaking the file.
Read →Radar · 2026-06-03
Reachy Mini gets MCP tools from Hugging Face Spaces
Hugging Face shows Reachy Mini calling MCP tools hosted in public Spaces. The interesting part is not a weather answer, but the split between the robot body and capabilities that can be shared and updated outside the app.
Read →Radar · 2026-06-02
GitHub is preparing for a world where agents write commits at scale
The Latent Space interview with Kyle Daigle frames GitHub as a platform under pressure from agentic coding. The point is not another Copilot feature, but whether infrastructure built for human pace can absorb software produced by machines.
Read →Radar · 2026-06-01
Search should not be a button. It should be programmable infrastructure for agents
Perplexity describes Search as Code: an architecture where an agent does not call one monolithic search engine, but assembles a retrieval pipeline as code. The point is not a nicer search API. It is control over how evidence is found, filtered and verified.
Read →Radar · 2026-06-01
Video generation is moving from clip output to canvas agent
Latent Space frames xAI Grok Imagine, through an interview with Ethan He, as a move from one shot video generation toward video agents. The thesis will be proven less by demo quality than by whether the system can iterate through a whole creative task.
Read →Radar · 2026-05-28
Async agents receive a spec, work in an isolated VM and leave a pull request in the repository by morning
A Latent Space discussion with Cognition and OpenInspect frames coding agents as asynchronous workers: spec-to-PR workflows, full VMs, agent memory, and situations where a PM ships a code change without a developer. The shift is from synchronous chat to delegating an entire work cycle.
Read →Radar · 2026-05-28
Data Formulator 0.7 tries to rebuild enterprise data analytics around AI agents
Microsoft Research released Data Formulator 0.7, an analytics workspace where AI agents assist with exploration, transformation and visualization of enterprise data. The key question is whether the agent handles messy, permissioned data outside the demo.
Read →Radar · 2026-05-27
SQLite draws a line: no agentic code, yes reproducible bugs
SQLite added an AGENTS.md file with a blunt rule for people pointing AI agents at the codebase: agentic code is not accepted, but high-quality reproducible bug reports can be useful. A small file, but a big signal for critical open source maintenance.
Read →Radar · 2026-05-27
ITBench-AA: frontier models score below 50 % on Kubernetes SRE diagnostics
IBM Research and Artificial Analysis released the first benchmark for enterprise IT agents in a realistic Kubernetes environment on 27 May 2026. The top model (Claude Opus 4.7) reached 47 %. No frontier model exceeded 50 %.
Read →Radar · 2026-05-27
Codex helps build self-improving tax agents
OpenAI, Thrive Holdings and Crete built Tax AI for more than 30 accounting firms. The pilot processed 7,000 returns, saves about one third of practitioner time and improved sharply within six weeks through a feedback loop powered by Codex.
Read →Radar · 2026-05-27
Warp bets on an open-source agentic terminal with GPT-5.5
Warp is positioning the terminal as an agentic development environment rather than a command line wrapper. By open sourcing its client with OpenAI as a founding sponsor and leaning on GPT-5.5, it wants developers to set objectives and review outcomes while agents plan, code, test and open pull requests.
Read →Radar · 2026-05-26
Interconnects maps the next phase of model competition
Nathan Lambert writes about Gemini Flash 3.5, Mythos, agent tools and the tension between open and closed models in his May outlook.
Read →Radar · 2026-05-26
Copilot Cowork turns user permissions into a file exfiltration path via prompt injection
PromptArmor researchers demonstrated an attack chain in which Microsoft Copilot Cowork can help exfiltrate Microsoft 365 files through prompt injection. This is not only a product bug, but a warning for any agentic system with delegated permissions.
Read →Radar · 2026-05-26
LWiAI #246: one week, four fronts at once. Google I/O, agents, lawyers, safety
LWiAI Podcast episode 246 from 26 May 2026 is a map, not a single thesis. Google I/O, coding agents, legal pressure around OpenAI and safety research landed in the same week and sketch four simultaneous pressures on the AI market.
Read →Radar · 2026-05-22
AI Snake Oil asks: did Google agents really build an OS for $916, or was it a carefully lit demo?
AI Snake Oil examines the claim that Google AI agents built an operating system for $916. The key point: agentic announcements need a different type of verification than chat benchmarks, because a big goal and a few steps in a demo environment are easy to inflate.
Read →Radar · 2026-05-22
Gartner names OpenAI a Leader in enterprise coding agents
OpenAI says Gartner named Codex a Leader in enterprise AI coding agents. For companies, this is mainly a procurement and governance signal, not proof of technical superiority.
Read →Radar · 2026-05-21
MagenticLite combines small models, orchestration and local file access into one workflow without a frontier model
Microsoft Research describes MagenticLite, MagenticBrain and Fara1.5 as an agentic system optimized for small models that connects browser and local file system in a single workflow. The direction is practical: not one expensive model for everything, but orchestration of specialized components.
Read →Radar · 2026-05-20
OpenAI moves Education for Countries toward national AI programs in education
OpenAI is moving Education for Countries toward national AI education programs. This is not only about ChatGPT access, but about shaping infrastructure, training, and operating habits around AI in the public sector.
Read →From the Glossary
Glossary
Agent infrastructure — the boring layer agents need to work
An agent is not just a model with a task. In production it needs identity, permissions, inboxes, tools, memory, audit, telemetry and clear boundaries. Without infrastructure, autonomy is just a pretty demo with risk attached.
Read →Glossary
Agents — when an LLM gets hands and memory
An LLM with tool use, a loop, and memory. Lots of marketing, few definitions. Here's the plain version.
Read →Glossary
Async agents — work that does not live in chat
An agent that takes a task, runs outside the conversation, and returns a finished artifact. Powerful for long workflows, dangerous without state, limits and review.
Read →Glossary
Agent safety and sandboxing
An agent with tools is a tiny machine for consequences. Sandboxes, approvals, least privilege and audit logs are not enterprise decoration; they are brakes before the fire.
Read →Glossary
Coding agents — when the model touches the repo
Claude Code, Codex and friends are not magical juniors. They are a fast loop: read code, edit, run tests, repair fallout. Useful, but only with guardrails.
Read →Glossary
Computer-use agents — the model that clicks
A computer-use agent sees the screen and controls the UI. It sounds like sci-fi; in practice it is fragile automation over pixels, forms and badly labelled buttons.
Read →Glossary
Evals and benchmarks — measurement instead of vibes
A benchmark is not truth carved in stone. It is an instrument with error bars. Without it, though, you are only guessing whether a model or agent works.
Read →Glossary
Koog and Kotlin AI agents — what it is and what it is for
Koog is JetBrains’ framework for building AI agents in Kotlin and Java. It focuses on practical architecture: strategies, tools, memory, tracing, long context and JVM production integration.
Read →Glossary
Physical AI — when an agent reaches into the world
Physical AI connects models, robots, simulation and actions in the real environment. It is not about a cute robot demo, but about who carries the risk when a model starts moving things.
Read →Glossary
Prompt injection — hostile instructions in your context
Prompt injection is not a party-trick jailbreak. It is a boundary problem: the model reads untrusted text and may confuse it for instructions. With agents, it burns twice as hot.
Read →Glossary
Tool use — when a model calls tools
Tool use is the moment an LLM stops merely answering and starts calling APIs, running commands, reading files or touching databases. Useful, sharp and dangerous.
Read →