#coding | Lilith AI

Radar · 2026-06-15

Uber puts a price tag on coding agents: $1,500 per tool each month

Uber is limiting monthly token spend to $1,500 per employee for each agentic coding tool, according to Bloomberg via Simon Willison. Coding agents are becoming a budget line item.

Read →

Radar · 2026-06-10

OpenAI is using Oracle Cloud to solve procurement, not demos

OpenAI is offering its models and Codex to Oracle Cloud customers through existing cloud commitments. For enterprise teams, the interesting part is not the endpoint, but the way AI fits into contracts, governance and billing they already use.

Read →

Radar · 2026-06-03

Wasmer shows Codex as leverage for small teams, not a magic compiler

OpenAI says Wasmer used Codex to build Edge.js in two weeks instead of an estimated year and accelerated development 10x to 20x. The stronger point is not the number. It is the shift in the developer role: less typing, more steering risky model work.

Read →

Radar · 2026-05-28

Async agents receive a spec, work in an isolated VM and leave a pull request in the repository by morning

A Latent Space discussion with Cognition and OpenInspect frames coding agents as asynchronous workers: spec-to-PR workflows, full VMs, agent memory, and situations where a PM ships a code change without a developer. The shift is from synchronous chat to delegating an entire work cycle.

Read →

Radar · 2026-05-27

Codex helps build self-improving tax agents

OpenAI, Thrive Holdings and Crete built Tax AI for more than 30 accounting firms. The pilot processed 7,000 returns, saves about one third of practitioner time and improved sharply within six weeks through a feedback loop powered by Codex.

Read →

Radar · 2026-05-22

Gartner names OpenAI a Leader in enterprise coding agents

OpenAI says Gartner named Codex a Leader in enterprise AI coding agents. For companies, this is mainly a procurement and governance signal, not proof of technical superiority.

Read →

Radar · 2026-05-18

OpenAI and Dell bring Codex on-prem: enterprise wants an agent near its data, not in the cloud

OpenAI and Dell want to bring Codex closer to enterprise data, hybrid infrastructure, and on-prem environments. Less flashy than a demo, much more important for enterprise adoption.

Read →

Radar · 2026-05-14

Sea deploys Codex to 87% of the team and treats agents as organizational change, not a plugin

Sea Limited is deploying Codex across engineering, with OpenAI citing 87% weekly active users. The interview with Shopee's David Chen is not just about faster coding. It frames agents as a layer over complex codebases, CI/CD, tests, and system design.

Read →

Radar · 2026-05-14

Codex in mobile ChatGPT: the agent stops being a window on a laptop

Codex is moving into the ChatGPT mobile app, not as a travel toy, but as a control layer for long-running work inside real development environments.

Read →

Radar · 2026-05-12

Codex moves into finance: reporting and variance bridges without manual drudgery

OpenAI Academy positions Codex for finance teams: MBRs, reporting packs, variance bridges, model checks, and planning scenarios from working inputs. Less flashy than an app-generation demo, but more practical: an agent layer over repeated analytical prep work.

Read →

Radar · 2026-05-11

CodexBar unifies limit tracking for 29 AI coding tools in one icon

CodexBar is an open-source macOS menu-bar app that unifies limit tracking, credits, reset windows, and incident status across 29 AI coding providers including Codex, Claude, Cursor, Gemini, Copilot and OpenRouter.

Read →

Radar · 2026-05-11

An AI coding agent that does not cut maintenance costs is just expensive technical debt

James Shore states the uncomfortable math of coding agents: if an agent doubles output but maintenance costs stay flat, the team did not gain speed, it doubled its technical debt burden.

Read →

Radar · 2026-05-08

Codex gets a safety architecture, not just a README disclaimer

OpenAI details how Codex runs in isolated environments: per-repo sandboxes, network restrictions, approval gates, and agent-native telemetry for safe enterprise adoption.

Read →

Radar · 2026-05-07

Mozilla fixed hundreds of Firefox bugs with Claude Mythos. AI security report quality just shifted.

Simon Willison described how Mozilla used early access to Claude Mythos Preview to systematically find and fix Firefox vulnerabilities. In April 2026 the number of fixed security bugs jumped to 423, compared to the usual 20 to 30 per month. The key shift: AI security reports stopped being noise and started being usable input.

Read →

Radar · 2026-05-06

AlphaEvolve finds algorithms in days that teams spent months on, with production numbers

DeepMind introduced AlphaEvolve as a Gemini-powered evolutionary loop that automatically discovers better algorithms. Concrete production results: 30 % fewer errors in genomics, 20 % lower write amplification for Spanner, Klarna doubled transformer training speed.

Read →

Radar · 2026-05-06

SubQ review: great numbers, but still a test of benchmark faith

Fello AI reviews SubQ's claims: 12M token context window, 52x faster prefill than FlashAttention on 1M tokens and frontier-class benchmark positioning. The numbers are striking enough to need independent verification before they change architecture decisions.

Read →

Radar · 2026-05-01

Coding agents leave the IDE: Codex and Claude show what comes after programming

Latent Space AINews observes a shift they call "breaking containment": coding agents like Codex and Claude are no longer just tools for writing code but are expanding into knowledge work and creative workflows broadly.

Read →

Radar · 2026-01-20

Cisco deployed Codex for enterprise defect fixes, but hard numbers are still missing

Cisco and OpenAI describe deploying Codex as an agent in enterprise engineering workflows: build automation, defect fixes, and a shift toward agent-native development.

Read →

Radar · 2025-12-18

GPT-5.2-Codex targets long-horizon refactors, proof will be independent production tests

GPT-5.2-Codex targets long-horizon coding tasks across large context: large-scale code transformations, security fixes, and multi-file consistency.

Read →

Radar · 2025-11-19

GPT-5.1-Codex-Max system card is worth reading, but trust it in proportion to its limits specificity

The GPT-5.1-Codex-Max system card describes two safety layers: model-level safety training and prompt injection protection, and product-level sandboxing with configurable network access.

Read →

Radar · 2025-11-06

Async coding agents as research threads: fire a task, get a pull request back

Simon Willison describes a fire-and-forget workflow with Claude Code, Codex and other coding agents: pose a research question, the agent works on a server and files a pull request. Code is proof of feasibility, not just text.

Read →

Radar · 2025-10-20

Claude Code for web: an asynchronous coding agent in a sandbox, without your laptop

Simon Willison tested Claude Code for web: Anthropic wrapped the local Claude Code experience in a hosted sandbox and made it usable from web and mobile. The important shift is not a more capable model, but a workflow change: coding agents become more valuable when they can run asynchronously and safely away from your laptop.

Read →

Radar · 2025-09-16

Latent Space: Greg Brockman on GPT-5 and Codex as the agentic layer of software development

Latent Space published a belated episode with Greg Brockman on GPT-5 and Codex, plus editorial takes on the GPT-5-Codex model combination. This is a podcast episode and pointer, not a standalone analytical essay.

Read →

Radar · 2025-07-02

Jack Morris goes against the current: information theory, not agents or benchmarks

Latent Space profiles Jack Morris, a PhD student who deliberately is not working on agents, benchmarks or VS Code forks. He studies the information-theoretic foundations of language models: embeddings, latent space and compression. This is a podcast interview and pointer.

Read →