The Most ReLU-iable Activation Function?

The Rectified Linear Unit (ReLU) activation function was first used in 1975, but its use exploded when it was used by Nair & Hinton in their 2010 paper on Restricted Boltzmann Machines. ReLU and its derivative are fast to compute, and it has dominated deep neural networks for years. The main problem with the activation function is the so-called dead ReLU problem, where significant negative input to a neuron can cause its gradient to always be zero. To rectify this (har har), modified versions have proposed, including leaky ReLU, GeLU and SiLU, wherein the gradient for x < 0 is not always zero.

A 2020 paper by Naizat et al., which builds upon ideas set out in a 2014 Google Brain blog post seeks to explain why ReLU and its variants seem to be better in general for classification problems than sigmoidal functions such as tanh and sigmoid.

The authors use Betti numbers to measure, in effect, the complexity of the topology of a point cloud (high Betti numbers can be thought of as more difficult to separate into their classes, see the figure below taken from the paper).

Figure from Naizat et al. showing examples of topologies and their Betti numbers

Their main conclusions are that:

  1. “Neural networks operate by changing topology, transforming a topologically complicated data set into a topologically simple one as it passes through the layers”
  2. “The reduction in Betti numbers is significantly faster for ReLU activation compared to hyperbolic tangent activation as the former defines nonhomeomorphic maps that change topology, whereas the latter defines homeomorphic maps that preserve topology”
  3. “[A] shallow network operates mainly through changing geometry and changes topology only in its final layers, a deep one spreads topological changes more evenly across all layers”
Figure from Naizat et al. showing how the topology of labelled points in the input space changes as they progress through the ReLU-activated neural network

Perhaps this explains the ongoing relevance for ReLU and its cousins, or perhaps not. It’s an interesting paper either way.

This is my final blopig post (and it’s only only 9 months late). So long, and thanks for all the fish!

Author