Via Tech Crunch
-----
It’s no secret
that Google has developed its own custom chips to accelerate its
machine learning algorithms. The company first revealed those chips,
called Tensor Processing Units (TPUs), at its I/O developer conference
back in May 2016, but it never went into all that many details about
them, except for saying that they were optimized around the company’s
own TensorFlow machine-learning framework. Today, for the first time, it’s sharing more details and benchmarks about the project.
If you’re a chip designer, you can find all the gory glorious details of how the TPU works in Google’s paper.
The numbers that matter most here, though, are that based on Google’s
own benchmarks (and it’s worth keeping in mind that this is Google
evaluating its own chip), the TPUs are on average 15x to 30x faster in
executing Google’s regular machine learning workloads than a standard
GPU/CPU combination (in this case, Intel Haswell processors and Nvidia
K80 GPUs). And because power consumption counts in a data center, the
TPUs also offer 30x to 80x higher TeraOps/Watt (and with using faster
memory in the future, those numbers will probably increase).
It’s worth noting that these numbers are about using machine learning
models in production, by the way — not about creating the model in the
first place.
Google also notes that while most architects optimize their chips for convolutional neural networks
(a specific type of neural network that works well for image
recognition, for example). Google, however, says, those networks only
account for about 5 percent of its own data center workload while the
majority of its applications use multi-layer perceptrons.
Google says it started looking into how it could use GPUs, FPGAs and
custom ASICS (which is essentially what the TPUs are) in its data
centers back in 2006. At the time, though, there weren’t all that many
applications that could really benefit from this special hardware
because most of the heavy workloads they required could just make use of
the excess hardware that was already available in the data center
anyway. “The conversation changed in 2013 when we projected that DNNs
could become so popular that they might double computation demands on
our data centers, which would be very expensive to satisfy with
conventional CPUs,” the authors of Google’s paper write. “Thus, we
started a high-priority project to quickly produce a custom ASIC for
inference (and bought off-the-shelf GPUs for training).” The goal here,
Google’s researchers say, “was to improve cost-performance by 10x over
GPUs.”
Google isn’t likely to make the TPUs available outside of its own
cloud, but the company notes that it expects that others will take what
it has learned and “build successors that will raise the bar even
higher.”