What the heck are TPUs?

I recently became curious about TPUs, a specialised hardware for training Machine- and Deep-Learning models, where TPU stands for Tensor Processing Unit. This fancy chip can provide very high gains for anyone aiming to perform really massive parallelisation of AI tasks such as training, fine-tuning, and inference.

In this blog post, I will touch on what a TPU is, why it could be useful for AI applications when compared to GPUs and briefly discuss associated opportunity costs.

What’s a TPU?

The simplest of putting is that a TPU is a circuit chip specially designed for neuronal network machine learning, optimised for tensor operations, and able to process a high volume of low-precision operations.

TPUs were originally designed and privately used by Google in their own data centres in 2015 and made only available for hiring through the Google Cloud Platform in 2018. At present, only a small chip version has become commercially available – Edge TPU. TPUs have underpinned well-known applications such as Translate, Map Street View and Google Photos. They have been used in the AlphaGo match against the Go World Champion Lee Sedol. And also made available through Kaggle and Colab.

GPUs vs TPUs

  • Scope of applications
    GPUs are general-purpose hardware units popular in the AI community, however, they can be used for several other applications beyond AI, such as scientific simulation, video encoding, graphics rendering (video games), and crypto-mining. TPUs are purpose-built processors designed to accelerate training, fine-tuning, and inference of AI models that involve deep learning, neural networks, or Large Language Models (LLM).
  • Parallelisation and throughput
    GPUs are known to perform massive parallelization and provide high throughput, while TPUs claim to provide way more parallelization and throughput than GPUs.
  • Precision
    Another feature that seems unique about TPUs is their ability to handle reduced precision (bfloat16) to accelerate training without losing model accuracy. GPUs use more traditional single (float32) and double-precision (float64) floating point precision.
    According to Google’s website, bfloat16 is a specialised precision type designed exclusively for AI model training.

Opportunity cost

  • You need to learn TensorFlow to make the most of it
    TensorFlow is a well-known machine and deep learning framework developed by Google, which relies on symbolic maths libraries. However, this has a steep learning curve as you need to understand various abstract concepts. In forums, this seems to be the main reason why many people feel discouraged from trying it, and instead opt for other more user-friendly packages such as Keras or PyTorch (OPIG’s favourite, see screenshot below). However, Google provides a tutorial to learn how to interface TPUs with Pytorch.
  • You can only rent TPUs, and it’s expensive
    While searching on the internet I realised that you cannot physically buy the Google-developed TPUs and plug them into your own server, if you expected that. Instead, you can only pay for their Cloud use, with a wide range of performance and price options, if available in your region. However, if you’re doing research, you can get free ($300 credits of) TPU time via the TPU Research Cloud (TRC) programme, BUT you’ll be expected to share your research with the TCR programme via publications or open-source code, as well as provide detailed feedback with Google.
  • It doesn’t seem ideal for development
    TPUs are a reasonable option only when performing a final long training for your model. During the development phase, a GPU seem a better choice, as you get reasonable parameters for your model – unless you work for OpenAI or Google, in that case, you may have TPUs to burn to run experiments.

The bottom line

TPUs are powerful hardware that can massively boost ML/DL training, fine-tuning, and inference. However, you might still want to use GPUs during the development phase where you need experiments to find sensible parameters before you get a final model. TPUs are expensive, but Google can provide free TPU computing time for your research, but you’ll have to share it with Google. Finally, if you want to make the most of them, you need to learn PyTorch, however, if you’re not familiar with it, Google provides notebooks showing how you can use other more user-friendly frameworks such as Keras and PyTorch – just google them.

Author