AlphaEvolve finds algorithms in days that teams spent months on, with production numbers | Radar

AlphaEvolve is not a chatbot or a code generator. It is an evolutionary loop: Gemini proposes algorithms, an automated evaluator tests them, better versions replace worse ones. DeepMind deployed it on problems where a small algorithmic improvement produces large production savings.

AlphaEvolve searches the algorithm space where a chatbot would only write code

Google DeepMind introduced the system in May 2025. AlphaEvolve pairs Gemini as the language brain with an automated evaluator that measures the quality of proposed solutions and feeds results back into the loop. The output is not text about an algorithm but runnable code that can be deployed.

Concrete production results are measurable. In genomics (DeepConsensus), the system reduced variant detection error rates by 30 %. For Google Spanner, it cut write amplification by 20 %. Compiler optimization saved roughly 9 % of the software storage footprint. Finding an optimal cache replacement policy for Spanner took two days where previous development required months. For AC Optimal Power Flow, the share of feasible solutions rose from 14 % to 88 %.

Partner results: Klarna reports doubling transformer training speed. FM Logistic reports 10.4 % routing efficiency improvement, saving over 15,000 km annually. WPP cites a 10 % accuracy gain in campaign optimization.

This approach has a different logic than a standard AI assistant

A standard LLM helps a programmer write code. AlphaEvolve skips the programmer and searches the space of possible algorithms directly. It is not an assistant; it is an automatic optimizer with an evaluator built in.

For domains with a clear evaluation function, this is a significant shift. Planning, databases, compilers, numerical methods and research pipelines are areas where a small algorithmic improvement yields disproportionately large savings. Because access is currently limited to enterprise channels via Google Cloud, the numbers from Klarna, FM Logistic and WPP are the first realistic test outside the lab.

All numbers come from DeepMind or their partners, not from independent verifiers

All the numbers cited above come from DeepMind materials or from partners such as Klarna, FM Logistic and WPP. This is not independent verification. The evolutionary loop works well on problems with a clear, automatable evaluation function. Where evaluation cannot be automated (code safety, readability, edge cases, business logic), the system has nothing to guide it.

Availability is currently limited to Google Cloud partnership for enterprise. No public API or open-source version has been announced.

Independent reproduction outside Google infrastructure is the key signal to watch

The signal to watch is independent reproduction: whether results on Spanner, DeepConsensus or routing hold outside a Google-controlled environment. The second signal is whether the evolutionary approach generalises to domains where the evaluator is not trivial to write, and whether results survive third-party audit.

Lilith's verdict

AlphaEvolve does not help a programmer write. It searches the solution space and returns runnable code. The first team to point it at a problem they did not know could be automated gains an asymmetric edge.