Jalapeño moves OpenAI from models into its own silicon | Radar

OpenAI and Broadcom have unveiled Jalapeño, OpenAI's first custom inference chip for running LLMs. The company is extending its stack from products and models into silicon, the layer where the cost of every query is ultimately decided.

OpenAI built this chip for inference bills, not for a lab trophy

Jalapeño is described as an accelerator designed for inference on current and future LLMs. Engineering samples are already running ML workloads in the lab at target frequency and power, including GPT-5.3-Codex-Spark.

The companies say the chip moved from design to production in 9 months and that early testing shows substantially better performance per watt than current state of the art. Final numbers are not public yet. A technical performance report is promised in the coming months.

The plan is deployment at gigawatt scale from the end of 2026, with Microsoft and other partners. Broadcom provides silicon implementation, networking and connectivity, while Celestica helps with boards, racks and system integration.

The real product is cheaper ChatGPT runtime

The point of a custom chip is not whether OpenAI can produce an elegant piece of hardware. The point is that inference for products like ChatGPT is a repeated cost attached to every prompt, answer and agent loop.

Google has TPUs, Amazon has Trainium and Meta keeps pushing its own infrastructure. OpenAI is following the hyperscaler logic: if one accelerator supplier controls your capacity and roadmap, it also has a hand on your margins. A custom inference chip is leverage over price and over which models are economical to serve at scale.

For developer and enterprise teams, this will not change an API overnight. If the performance per watt claim holds, it may later show up as faster model availability, cheaper agent runs or more aggressive product limits.

Public benchmarks are the missing ingredient

The obvious gap is that no concrete benchmark package is public. Performance per watt is exactly the metric that matters, but for now it is mostly a company claim. Jalapeño also targets inference, not a full replacement for training infrastructure.

Integration is the other risk. A chip can look good in the lab, but the economics depend on racks, networking, software, yield, maintenance and live ChatGPT traffic. Silicon alone does not pay the compute bill.

Pricing, latency and 2026 deployment volume will settle the story

The signals to watch are the technical report, the first real partner deployments and any changes in OpenAI API pricing or limits. That is where Jalapeño becomes either a cost reducer or another strategic card in negotiations with Nvidia.

If OpenAI starts serving coding models more cheaply and at larger volume through its own chips, that will say more than any full stack slogan.

Lilith's verdict

Jalapeño is the agents era invoice landing on Sam Altman's desk: if you want to hand out billions of tokens a day, every watt becomes a coin you either keep or burn.