Radar | Lilith AI

2026-05-27

00:00 · source ↗

Warp bets on an open-source agentic terminal with GPT-5.5

Warp is positioning the terminal as an agentic development environment rather than a command line wrapper. By open sourcing its client with OpenAI as a founding sponsor and leaning on GPT-5.5, it wants developers to set objectives and review outcomes while agents plan, code, test and open pull requests.

This is more than another AI terminal pitch. If Warp can combine an open source client with permissions, memory, remote execution and observable pull request workflows, the terminal can become a control plane for teams of agents. The hard part is familiar across agentic coding: trust, reproducibility and review quality matter more than the volume of code an agent can generate.

#agents #openai #models

2026-05-26

15:39 · source ↗

Interconnects maps the next phase of model competition

Nathan Lambert writes about Gemini Flash 3.5, Mythos, agent tools and the tension between open and closed models in his May outlook.

Lambert's piece is less a prediction and more a checklist. Anyone waiting for one winning model is standing in front of a board where every arrow points in a different direction.

#agents #models #open-source

15:36 · source ↗

Copilot Cowork turns user permissions into a file exfiltration path via prompt injection

PromptArmor researchers demonstrated an attack chain in which Microsoft Copilot Cowork can help exfiltrate Microsoft 365 files through prompt injection. This is not only a product bug, but a warning for any agentic system with delegated permissions.

An agent with Graph access is an employee holding a general power of attorney, able to open the door even when it thinks it is only sending a harmless recap message.

#agents #simonwillison #commentary

05:10 · source ↗

LWiAI #246: one week, four fronts at once. Google I/O, agents, lawyers, safety

LWiAI Podcast episode 246 from 26 May 2026 is a map, not a single thesis. Google I/O, coding agents, legal pressure around OpenAI and safety research landed in the same week and sketch four simultaneous pressures on the AI market.

This link is not an article to inflate into a grand thesis. It is a radar map of the week: models up front, agents behind them, lawyers at the door and safety people with a hand on the brake.

#agents #openai #models #google #newsletter #roundup

00:00 · source ↗

Anthropic appoints KiYoung Choi to lead Korea before Seoul launch

Anthropic appointed KiYoung Choi as Representative Director of Korea before opening its Seoul office, reflecting unusually strong Claude usage in the country.

This is more than a local hire. Anthropic is signaling that Korea is not just remote demand for Claude, but a market where enterprise deals, government relationships, research links and developer adoption need people on the ground.

#research #models #anthropic

2026-05-25

00:00 · source ↗

Anthropic’s Chris Olah warns the Vatican about frontier AI incentives

Pope Leo XIV released the encyclical Magnifica humanitas on safeguarding the human person in the age of AI. At the Vatican City presentation, Anthropic co-founder Chris Olah warned that frontier AI labs face incentives that can conflict with the public good.

The strongest part of Olah's intervention is institutional rather than technical: even sincere researchers work inside a race shaped by commerce, prestige and geopolitics. If AI is to serve people, trusting labs is not enough. It needs sustained outside pressure, a language of human dignity and the courage to ask who carries the costs and who receives the gains.

#research #models #anthropic

2026-05-22

22:24 · source ↗

AI Snake Oil asks: did Google agents really build an OS for $916, or was it a carefully lit demo?

AI Snake Oil examines the claim that Google AI agents built an operating system for $916. The key point: agentic announcements need a different type of verification than chat benchmarks, because a big goal and a few steps in a demo environment are easy to inflate.

When an agent supposedly builds an operating system for the price of a good dinner, the first reaction should not be admiration. It should be checking the receipt, the exact task and who held the hammer in a controlled environment.

#agents #evals #google #hype

00:00 · source ↗

Gartner names OpenAI a Leader in enterprise coding agents

OpenAI says Gartner named Codex a Leader in enterprise AI coding agents. For companies, this is mainly a procurement and governance signal, not proof of technical superiority.

Gartner does not say which agent writes the best code. It says which vendor is easier to defend in front of procurement, security and leadership. For Codex in enterprise, that can matter as much as new features.

#agents #openai #coding

2026-05-21

17:00 · source ↗

MagenticLite combines small models, orchestration and local file access into one workflow without a frontier model

Microsoft Research describes MagenticLite, MagenticBrain and Fara1.5 as an agentic system optimized for small models that connects browser and local file system in a single workflow. The direction is practical: not one expensive model for everything, but orchestration of specialized components.

The future of agents may not look like one giant cloud brain. It may look like a system of specialized components where each part has clear responsibility and the work does not disappear into the log of a single remote server.

#agents #tool-use #microsoft #small-models

2026-05-20

00:00 · source ↗

OpenAI moves Education for Countries toward national AI programs in education

OpenAI is moving Education for Countries toward national AI education programs. This is not only about ChatGPT access, but about shaping infrastructure, training, and operating habits around AI in the public sector.

Education for Countries is not another classroom aid. The real question is who sets the default rules, habits, and dependencies for a generation of students, teachers, and public workers. Whoever gets the keys to the classrooms today will shape the digital literacy of the public sector for a decade.

#agents #openai

2026-05-18

10:00 · source ↗

OpenAI and Dell bring Codex on-prem: enterprise wants an agent near its data, not in the cloud

OpenAI and Dell want to bring Codex closer to enterprise data, hybrid infrastructure, and on-prem environments. Less flashy than a demo, much more important for enterprise adoption.

Enterprise does not want an agent that is smart only inside an isolated chat. It wants an agent that understands internal systems, sees the right data, and leaves an audit trail. Without that, it is a nice presentation, not an operational layer.

#agents #openai #coding

2026-05-14

19:44 · source ↗

AgentMail gives AI agents their own email inbox as a first-class identity

AgentMail provides real email inbox infrastructure for AI agents: inbox creation, sending, receiving, threads, attachments, webhooks, WebSockets, search, custom domains and MCP integration. The company announced a 6M USD seed round led by General Catalyst with Y Combinator participation.

This is the boring infrastructure agents need before autonomy becomes useful: an inbox, an audit trail, and a durable identity. Practical, and slightly unnerving.

#agents #mcp #infrastructure #ai

20:30 · source ↗

Sea deploys Codex to 87% of the team and treats agents as organizational change, not a plugin

Sea Limited is deploying Codex across engineering, with OpenAI citing 87% weekly active users. The interview with Shopee's David Chen is not just about faster coding. It frames agents as a layer over complex codebases, CI/CD, tests, and system design.

The important shift here is from autocomplete to operational agent. Sea talks about complexity, tests, system design, and work inside a large engineering organization. If it works, this is not an editor plugin. It changes how an engineering org absorbs work.

#openai #coding

13:00 · source ↗

Codex in mobile ChatGPT: the agent stops being a window on a laptop

Codex is moving into the ChatGPT mobile app, not as a travel toy, but as a control layer for long-running work inside real development environments.

This is a quietly big shift. The agent stops being a window on a laptop and becomes a work process you step into when it needs judgment, permission, or a course correction. If you have a real sandbox and audit trail, this makes sense. If you do not, it just adds a button on the chaos.

#openai #coding

2026-05-13

16:15 · source ↗

“11 AI agents” is an empty metric

Simon Willison highlighted Boris Mann's point that saying "11 AI agents" is meaningless by itself. It says about as much as counting spreadsheets or browser tabs. The useful questions are outcomes, responsibility boundaries, workflow, handoff, observability, failure handling, permissions and human review.

A useful antidote to agent marketing that treats count as sophistication. Eleven agents can be a deliberate system, or eleven places where context can get lost. Without clear boundaries, auditability, permissions and human review, it is closer to an inventory than an architecture.

#agents #ai #agent-definitions

02:47 · source ↗

Fine-tuning is not dying. It is just no longer the default answer

Latent Space uses the pullback of part of OpenAI's fine-tuning API as a useful reality check: for most AI products today the first step is not tuning weights but better evaluation, context, retrieval, tool use and workflow. Fine-tuning remains a strong tool, just not a universal fix for a poorly designed system.

Fine-tuning is not dying. The comfortable sentence 'we'll tune that' is. Without evals, quality data and a clear reason to touch model weights, fine-tuning often just preserves the mess in a more expensive form. Scalpel yes. Hammer for every problem, no.

#openai #models #fine-tuning #ai-engineering

2026-05-12

15:00 · source ↗

Codex moves into finance: reporting and variance bridges without manual drudgery

OpenAI Academy positions Codex for finance teams: MBRs, reporting packs, variance bridges, model checks, and planning scenarios from working inputs. Less flashy than an app-generation demo, but more practical: an agent layer over repeated analytical prep work.

This is exactly the kind of enterprise AI that does not look like fireworks, but can save real hours. Finance does not need an agent pretending to be the CFO. It needs something that can go through spreadsheets, explain variance, find broken links, and leave the final judgment to a human.

#openai #models #coding

00:00 · source ↗

Parameter Golf shows how coding agents change the pace of research iteration

OpenAI published lessons from Parameter Golf: more than 1,000 participants, over 2,000 submissions, a 16 MB artifact limit, and 10 minutes of training on 8x H100. The important part is not only model compression. AI coding agents changed the tempo of research iteration.

Parameter Golf is a small format with a large warning label. Agents make weird ideas cheaper to test, which is wonderful for research. The same speed also produces elegant nonsense, overfit tricks, and a fake feeling of breakthrough. Strong evals win. Without them, you just drown faster.

#agents #research #openai #models

2026-05-11

02:41 · source ↗

CodexBar unifies limit tracking for 29 AI coding tools in one icon

CodexBar is an open-source macOS menu-bar app that unifies limit tracking, credits, reset windows, and incident status across 29 AI coding providers including Codex, Claude, Cursor, Gemini, Copilot and OpenRouter.

Not another shiny AI editor. CodexBar is a thermometer for the subscription chaos developers created for themselves. If you juggle several coding agents, visible limits are productivity infrastructure, not decoration. That a separate app had to be built for this says everything about how fragmented the current AI stack is.

#tool-use #ai #coding #open-source

19:48 · source ↗

An AI coding agent that does not cut maintenance costs is just expensive technical debt

James Shore states the uncomfortable math of coding agents: if an agent doubles output but maintenance costs stay flat, the team did not gain speed, it doubled its technical debt burden.

A team with 3x more pull requests that cannot keep up with review is not 3x more productive. It is 3x more indebted. An agent that does not reduce maintenance is just a faster way to dig the hole.

#agents #ai #models #coding #simonwillison #commentary