How Google Makes Custom Cloud Chips That Power Apple AI And Gemini

At this sprawling lab at Google's Silicon Valley
headquarters, these racks and racks of servers aren't Running workloads for Google Cloud's millions of
customers. Here, for example, is the very first Trillium system
that we built is a full Trillium system With 256 chips in it for racks. Or for YouTube, or the world's most dominant search
engine. And what is Trillium? Trillium is our latest generation TPU. It'll be public later this year. Instead, they're running tests on its very own
microchips, Tensor Processing Units, that help power it All, search. And of course, video, YouTube ads. Everything Google does has been powered in many ways
by its own homegrown TPU. Now, TPUs are used to train AI models like Google's own
chatbot Gemini. .. and in some big news, Apple's AI, too. Apple, Actually, we found out yesterday they disclosed in a
paper they're using Google made chips. The world sort of has this fundamental belief that all
AI large language models are being trained on Nvidia. But Google took its own path here. And yet, despite being the birthplace of some
foundational concepts behind generative AI, many say Google's fallen behind in the AI race. But it was the first major cloud provider to do custom
AI chips. It was ten years ago, almost to the day, where we
decided that to meet the needs of our users In terms of a particular application, voice
recognition at the time, we needed to design custom Hardware. In the years since, Amazon, Microsoft and Meta have
started making their own AI chips too. Here we're turning on the chips and the boards for the
first time, making sure they're working properly to Specification debugging any issues that might come up,
that sort of thing. And no media has been inside here before. First time? Yep. We went to Google headquarters for an exclusive look
inside the chip lab and sat down with its top executive To ask why and how. Google's betting big on the expensive, complex
business of custom chips. It all started in 2014, when a group at Google
calculated that in order to launch upcoming voice Recognition features, Google would need to double the
number of computers in its data centers.

Amin Vahdat, now the head of custom cloud chips,
started at Google four years before that. A number of leads at the company asked the question,
what would happen if Google users wanted to interact With Google via voice for just 30s a day? And how much compute power would we need to support
our users? We realized that we could build custom hardware, not
general purpose hardware, but custom hardware Tensor Processing Units in this case to support that much,
much more efficiently. In fact, a factor of 100 more efficiently than it
would have been otherwise. What is a Tensor Processing Unit? And did you guys coin that term? We did. We coined the term Tensor Processing Unit. We believe that it was certainly the first large scale
hardware accelerator for AI applications. There's a whole gamut of qualification and validation
tests we do on, you know, power thermals functionality. You're really trying to make sure the design has
enough margin so that it's going to operate well, you Know, at volume, at scale. Principal engineer Andy Swing, who ended up leaving
Google since our visit, was there for the first launch. There's actually four chips inside there. It's connected to. Actually, two of those are connected to a host machine
that has CPUs in it. And then all these colorful cables are actually
linking together all of the Trillium chips to work as One large supercomputer. Google data centers still rely heavily on chip giants
like Intel and AMD for Central Processing Units, CPUs, and Nvidia for Graphics Processing Units. GPUs. Google makes a different category of chips
called ASICs application specific integrated Circuits, which are more efficient because they're
built for a single purpose. Google's best known for its AI-focused ASIC, the TPU. But it also makes ASICs to power YouTube, called VCUs,
Video Coding Units. And just like Apple, Google also makes custom chips
for its devices. The G4 powers the new, fully AI enabled pixel nine,
and the new A1 powers pixel Buds Pro two. But the TPU is what set Google apart because
when it launched in 2015, it was the first of Its kind. So the AI cloud era has completely reordered the way
companies are seen. And this silicon differentiation, the TPU itself may
be one of the biggest reasons that Google Went from the third cloud to being seen truly on
parity, and in some eyes, maybe even ahead of the other Two clouds for its AI prowess.

Amazon Web Services announced its first cloud AI chip,
Inferentia, in 2018, three years after Google's Came out. Microsoft's first custom AI chip, Maya,
wasn't announced until the end of 2023. In order to stay differentiated, to stay competitive,
to stay ahead of the market, and to not become overly Dependent on any supply chain, partner or provider,
they needed to do more, build More in-house. According to Newman's team's research, Google TPUs
dominate among custom cloud AI chips, with 58% of the Market share and Amazon comes in second at 21%. In 2017, a group of eight Google researchers wrote the
now famous paper that invented the transformer, The underpinnings of today's generative AI craze. The invention, Vahdat says, was made possible by TPUs. The transformer computation is expensive, and if we
were living in a world where it had to run on general Purpose compute, maybe we wouldn't have imagined it. Maybe no one would have imagined it. But it was really the availability of TPUs that
allowed us to think, not only could we design Algorithms like this, but we could run them
efficiently at scale. Still, Google has faced criticism for some botched
product releases in the current rat race of generative AI, and its chatbot, Gemini came out more than a year
after OpenAI's ChatGPT. Dozens and dozens of customers are leveraging Gemini
every day, including some of the most familiar names Out there, whether it's Deutsche Bank, Estée Lauder and many, many others that are household names, McDonald's, if you like, and others. Was
Gemini trained on TPUs? Gemini was trained and is served externally entirely on
TPUs. Back in 2018, Google expanded the focus of TPUs from
inference to training AI models. Version two was actually a pod that connected 256 TPUs
together. Now version five is in production, which connects
almost 9000 chips together. The real magic of this TPU system is that you actually
can interconnect everything over fiber optics Dynamically. And so you can build small or as large of
a system as you want. With version two in 2018, Google also made its TPUs
available to third parties alongside market Leading chips like Nvidia's GPUs, which are still used
by most cloud customers. If you're using GPUs, they're more programmable, they're more flexible,
but they've been in tight supply. The AI boom has sent Nvidia's stock through the roof,
catapulting the chipmaker to a $3 trillion market Cap in June, surpassing Google's parent company
alphabet, and jockeying with Apple and Microsoft for Position as the world's most valuable public company.

Being candid, these specialty AI accelerators aren't
nearly as flexible or as Powerful as Nvidia's platform, and that is what the
market is also waiting to see is Can anyone play in that space? Now that we know Apple's using Google's TPUs to train
its AI, the real test will come as it rolls out those Full AI capabilities on iPhones and Macs next year. They were renting chips from Google for about two bucks
an hour, times a gazillion Chips to train their AI models. So they didn't even need Nvidia. All the market pull is coming from Nvidia, but longer
term, people are just going to want to do AI things. And when they want to just do AI things, they may be
just as happy to do it on a TPU or do it on another Homegrown piece of AI dedicated silicon. But developing alternatives to Nvidia's hugely
powerful, and expensive, chips is no small feat. It's expensive. You need a lot of scale. And so it's not something that everybody can do. But these hyperscalers, they've got the scale and the
money and the resources to go down that path. But the process is so complex and costly that even the
Googles of the world can't do it alone. Since the very first TPU ten years ago, Google's
partnered with Broadcom, a chip developer that also Helps Meta design its AI chips. Broadcom says it's spent more than $3 billion on R&D
to make these partnerships happen. AI chips, they're very complex. There's lots of things on there. So Google brings the compute. Broadcom does all the peripheral stuff. They do like the I/O and the SerDes and all of the
different pieces that go around that compute. They also do the packaging. Then the final design is sent off to be manufactured at
a fabrication plant or fab, primarily those owned by The world's largest chip maker, Taiwan Semiconductor
Manufacturing Company, which makes some 92% of the World's most advanced semiconductors. Do you have any safeguards in place should the worst
happen in the geopolitical sphere between China and Taiwan? Yeah, it's an important question. And it's certainly something that we prepare for and
we think about as well. But we're hopeful that actually it's not something
that we're going to have to trigger. I think the entire world is at the same risk.

It's not unique to Google. It's not unique to Amazon. It's not unique to Apple. It's not unique to Nvidia. If Taiwan is not given the appropriate support. If it deals with unexpected end of day circumstances,
it is not only going to set back any One of these companies, it's going to set back the
whole world. That's why the White House is handing out $52 billion
in CHIPS Act funding to companies building fabs in the US, with the biggest portions going to Intel, TSMC and
Samsung, so far. Intel And TSMC are putting a lot of their own money into this
as well. I'm heartened to see that. But I mean, it's going to take a long time to to
duplicate. So let's let's hope that it doesn't need to be
duplicated. Risk aside, Google just made another big chip move,
announcing its first general purpose CPU, Axion, will be available by the end of the year. Now we're able to bring in that last piece of the
puzzle, the CPU. And so a lot of our internal services, whether it's
BigQuery, whether it's Spanner, YouTube, advertising And more running are running on Axion. But Google is late to the CPU game. Amazon launched its processor Graviton in 2018. Alibaba launched its own server chip in 2021, and
Microsoft announced its CPU in November. Why didn't you do it sooner? Our focus has been on where we can deliver the most
value for our customers, and there it has been, Starting with the TPU, our video coding units, our
networking. We really thought that the time was now starting a
couple of years ago. Again, these things are a number of years in the
making to really bring our expertise to bear on the ARM CPUs. I don't fault Google for pacing out the launch of Axion
in a more Delayed fashion. This wasn't as differentiated. It's not as differentiated. To me, it is more of a supply game. It's more of a margin and vertical integration game
for the company.

Whereas the TPU was truly differentiated. Six generations, ten years of experience. All these processors from non chipmakers, including
Google's, are made possible by Arm Chip architecture, a More customizable, power efficient alternative that's
been gaining traction over the traditional x86 model From Intel and AMD. Power efficiency is crucial because by 2027, AI
servers are projected to use up as much Power every year as a small country. With TPUs, the ability to customize greatly boosts
power efficiency. This is our second generation optical circuit switch. So our large TPU supercomputers are actually optically
interconnected. It allows us to dynamically link together collections
of TPU chips to custom tailor the dimensions to the Job that's running. This is developed all in-house by
us. Power is a huge thing now, and you know any anything
you can do to try to improve efficiency, lower Costs and save power, I think you're going to do. Google's latest environmental report showed emissions
rose nearly 50% from 2019 to 2023, Partly due to data center growth for powering AI. Without having the efficiency of these chips, the
numbers could have wound up in a very different place. We remain committed to actually driving these numbers
in terms of carbon emissions from our infrastructure 24/7, driving it towards zero. Training and running AI also takes a massive amount of
water to keep the servers cool so they can run 24/7. That's why with the third generation of TPU, Google
started using direct to chip cooling, a new way to cool Servers that uses far less water, and that's also
being used by Nvidia's latest Blackwell GPUs. We have four chips, and these are our liquid cooling
lines that come in. There's essentially a cold plate here that has little
fins in it, and it picks up the heat from the chip, Puts it into the water, and that comes back out. Despite challenges from geopolitics to power and water,
Google is committed not only to its generative AI Tools, but to making its own chips to handle the
massive compute required by the craze. I've never seen anything like this, and no sign of it
slowing down quite yet. I think it's fair to say that we really can't predict
what's going to be coming as an industry in the next Five years, and hardware is going to play a really
important part there.