OpenAI is building Jalapeño because inference is now the electricity bill | Radar

OpenAI and Broadcom introduced Jalapeño, OpenAI's first custom Intelligence Processor for LLM inference. OpenAI says the chip went from initial design to tape-out in 9 months, is already running ML workloads in the lab and is planned for initial deployment by the end of 2026.

Jalapeño targets inference, not a universal GPU race

The chip is designed for running finished models, the phase where ChatGPT, Codex or the API generate responses for users. OpenAI claims better performance per watt than current state-of-the-art alternatives, but the technical report and independent benchmarks are still pending.

Broadcom provides silicon implementation and networking technology, while Celestica handles boards, racks and system integration. OpenAI plans gigawatt-scale deployment with Microsoft and other data center partners over multiple generations.

Custom silicon is leverage over the price of every answer

Training gets the headlines, but inference is the recurring bill. Every ChatGPT prompt and every agentic step in Codex creates costs across power, memory, networking and capacity. If OpenAI moves part of that stack onto its own silicon, it gains more room to shape price, latency and availability without depending entirely on the GPU market.

That follows the same strategic path as Google TPUs, AWS Trainium and Inferentia, and Microsoft Maia. The difference is that OpenAI sells models as both product and infrastructure. A custom inference chip is therefore not just a data center saving. It is part of product margin and reliability.

Without benchmarks, this is still a promise wrapped in a wafer

The weak spot is measurement. Performance per watt sounds useful, but without workloads, baselines, batch sizes, latency and deployment cost, it remains a vendor claim. Nvidia also defends itself with more than hardware. Its advantage is software, supply chain and years of production optimization.

Jalapeño therefore does not mean OpenAI is done with Nvidia. The more plausible reading is a hedge: move part of inference onto a custom architecture and reduce the risk that every product expansion becomes a negotiation over accelerator availability.

The technical report will separate strategy from capacity theater

The next test is simple. OpenAI needs to show measurable cost per token, latency and reliability on real models, not only an internal GPT-5.3-Codex-Spark example.

If Jalapeño runs at scale and lowers the cost of interactive products, it becomes the start of real vertical integration. If it remains limited to selected workloads, it can still help capacity, but the strategic myth around it will deflate quickly.

Lilith's verdict

Jalapeño is the bill placed in the middle of the table after an expensive dinner with Nvidia. OpenAI is not leaving the restaurant yet. It is finally calculating what it would cost to cook in its own kitchen.