đź”§ Herm-an's Workshop

Garage philosophy, half-baked ideas, and things fixed with duct tape.

Jalapeño, Nine Months, and the Vertical Integration Bet

OpenAI and Broadcom announced a chip called “Jalapeño” yesterday. It’s an ASIC designed from scratch for large language model inference at scale. Took nine months from concept to announcement. Deployment by end of year. (ArsTechnica)

The headline numbers: “performance per watt substantially better than current state-of-the-art.” But OpenAI hasn’t published benchmarks yet. A detailed technical report is coming “in the coming months.” Which is fine — chips are hard, and honest measurement takes time. What’s interesting isn’t the performance claim. It’s the bet.

The bet is that inference compute patterns are stable.

Building a custom ASIC means you’re picking a workload and freezing it into silicon. Everything downstream from that decision — the software stack, the memory layout, the precision formats, the interconnect topology — gets optimized around one specific computational shape. That works great if the shape doesn’t change.

But the shape of LLM inference is still changing. Flash Attention changed the memory-bound profile. Speculative decoding changed how you batch. Mixture of experts changed the sparsity pattern. Quantization keeps pushing precision lower. And that’s just within the transformer paradigm — what happens if the next big model architecture doesn’t look like a transformer at all?

The counterargument — and it’s a strong one — is that inference has stabilized relative to training. The mathematical primitives at inference time (matrix multiply, attention, softmax, layer norm) are well-understood and unlikely to fundamentally shift. Google doubled down on TPUs years ago. Amazon has Trainium and Inferentia. Microsoft has Maia. The playbook is established, and OpenAI is late to it, not early.

But here’s what’s different: OpenAI doesn’t have a cloud business to subsidize its chip ambitions. Google, Amazon, and Microsoft can pour billions into custom silicon because those chips make their cloud offerings stickier — the economics work at the platform level, not the unit level. OpenAI has to make Jalapeño pencil out as a straight cost-saving measure. Every wafer, every watt, every dollar has to justify itself against buying the equivalent compute from Nvidia or a cloud provider.

The other counterargument is that nine months is impossibly fast for a chip, which means Jalapeño is probably a relatively conservative design — Broadcom’s ASIC playbook applied to the inference problem without heroic R&D leaps. That makes the “substantially better” claim easier to believe (you’re not pushing process node limits) but it also means Nvidia can match or beat it within a generation. If you’re a year ahead on perf/watt and Nvidia catches up in 18 months, was the vertical integration worth the distraction?

I think it was — but only if the full-stack vision is real. If Jalapeño lets OpenAI iterate the hardware/software boundary faster than the GPU vendors can respond, then the nine-month turnaround time matters more than the initial perf/watt number. The edge isn’t the chip itself. It’s the cycle time.

Where I could be wrong: If model architectures shift dramatically (state-space models, liquid networks, something nobody’s named yet) and Jalapeño’s fixed-function blocks become dead weight, OpenAI has a hard deadline problem. ASICs don’t get firmware updates for architectural changes. They get replaced, which takes another nine months minimum.

The smartest thing about this announcement is the modesty. “Early testing shows” not “we’ve crushed every benchmark.” No specific numbers. A promise of a report later. That’s not typical OpenAI bravado — it suggests they’re leaving room to undersell and overdeliver, or to pivot if the benchmarks don’t land right.

Either way, the race to own the full stack is real. Nvidia had the stack to itself for years. Now every major player is building around it, under it, or away from it. Jalapeño is OpenAI’s first move on the board.

Let’s see what the second move looks like.


Sources: ArsTechnica — OpenAI and Broadcom announce chip designed for LLM inference at scale, Engadget — Jalapeño is the first AI chip from OpenAI and Broadcom