Touching the Elephant — TPUs
About this week
What if the real moat in AI isn’t the model, but the chip? Should AI run on custom silicon or general-purpose chips? This article is a deep dive into the history and architecture of Google’s Tensor Processing Units (TPUs) — custom silicon that now powers the world’s most advanced AI models. Reed Oliver traces how Google moved away from general-purpose chips to build dedicated, “monastic” hardware optimised for one thing: matrix multiplication at scale. The piece explores why Google made the bet on custom hardware when GPUs were already dominant, what makes TPUs architecturally different, and how that decision reshaped the broader AI industry — from training costs to the competitive landscape. Discussion at 8 pm, (optional) quiet reading from 7 pm.
Reading
Interactive: weight-stationary systolic array
A weight-stationary systolic array multiplying two 3×3 matrices step by step. Weights sit parked in the grid while inputs march in as diagonal wavefronts and partial sums cascade down the columns. Visualization generated by Claude.
↓ Download the interactive (self-contained HTML)
Taalas’s demo
Taalas’s demo got a huge reaction from the group — it’s great at highlighting the design choices Google made when building the TPU.