A short intro to machine precision and how to beat it

Most people who’ve ever sat through a Rounding Error is Bad lecture will be familiar with the following example:

> (0.1+0.1+0.1) == 0.3
FALSE

The reason this is so unsettling is because most of the time we think about numbers in base-10. This means we use ten digits \{0, 1, \dots, 9\}, and we perform arithmetic based on this ten digit notation. This doesn’t always matter much for pen and paper maths but it’s an integral part of how we think about more complex operations and in particular how we think about accuracy. We see 0.1 as a finite decimal fraction, and so it’s only natural that we should be able to do accurate sums with it. And if we can do simple arithmetic, then surely computers can too? In this blog post I’m going to try and briefly explain what causes rounding errors such as the one above, and how we might get away with going beyond machine precision.

Take a number x \in [0; 1), say x=1/3. The decimal representation of x is of the form x=\sum_{i=1}^{\infty} a_i \times 10^{-i}. The a_i \in \{0, 1, \dots, 9\} here are the digits that go after the radix point. In the case of x=1/3 these are all equal a_i=3, or x=0.333\dots _{10}. Some numbers, such as our favourite x, don’t have a finite decimal expansion. Others, such as 0.3, do, meaning that after some i \in \mathbb{N}, all a_{i+j}=0. When we talk about rounding errors and accuracy, what we actually mean is that we only care about the first few digits, say i\leq 5, and we’re happy to approximate to x\approx \sum_{i=1}^{5} a_i \times 10^{-i}=0.33333, potentially rounding up at the last digit.

Computers, on the other hand, store numbers in base-2 rather than base-10, which means that they use a different series expansion x=\sum_{i=1}^{\infty} b_i \times 2^{-i}, b_i \in \{0, 1\} to represent the same number. Our favourite number x is actually stored as 0.1010101\dots _{2} rather than 0.3333333\dots _{10}, despite the fact it appears as the latter on a computer screen. Crucially, arithmetic is done in base-2 and, since only a finite number of binary digits are stored (i\leq 52 for most purposes these days), rounding errors also occur in base-2.

All numbers with a finite binary expansion, such as 0.25_{10}=0\times 1/2+1\times 1/4=0.01_{2} also have a finite decimal expansion, meaning we can do accurate arithmetic with them in both systems. However, the reverse isn’t true, which is what causes the issue with 0.1+0.1+0.1\neq 0.3. In binary, the nice and tidy 0.1_{10} = 0.00011001100\dots _{2}. We observe the rounding error because unlike us, the computer is trying to sum over infinite series.

While it’s not possible to do infinite sums with finite resources, there is a way to go beyond machine precision if you wanted to, at least for rational x=p/q, where p, q \in \mathbb{N}. In the example above, the issue comes from dividing by 10 on each side of the (in)equality. Luckily for us, we can avoid doing so. Integer arithmetic is easy in any base, and so

> (1+1+1) == 3 
TRUE

Shocking, I know. On a more serious note, it is possible to write an algorithm which calculates the binary expansion of x=p/q using only integer arithmetic. Usually binary expansion happens in this way:

set x, maxIter
initialise b, i=1
while x>0 AND i<=maxIter {
 if 2*x>=1
    b[i]=1
 else
    b[i]=0
 x = 2*x-b[i]
 i = i+1 
}
return b

Problems arise whenever we try to compute something non-integer (the highlighted lines 3, 4, and 8). However, we can rewrite these using x = p/q and  shifting the division by q to the right-hand side of each inequality / assignment operator:

set p, q, maxIter
initialise b, i=1 
while p>0 AND i<=maxIter { 
 if 2*p>=q 
    b[i]=1 
 else 
    b[i]=0 
 p = 2*p-b[i]*q 
 i = i+1 
} 
return b

Provided we’re not dealing with monstrously large integers (i.e. as long as we can safely double p), implementing the above lets us compute p/q with arbitrary precision given by maxIter. So we can beat machine precision for rationals! And the combination of arbitrarily accurate rationals and arbitrarily accurate series approximations (think Riemann zeta function, for example) means we can also get the occasional arbitrarily accurate irrational.

To sum up, rounding errors are annoying, partly because it’s not always intuitive when and how they happen. As a general rule the best way to avoid them is to make your computer do as little work as possible, and to avoid non-integer calculations whenever you can. But you already knew that, didn’t you?

This post was partially inspired by the undergraduate course on Simulation and Statistical Programming lectured by Prof Julien Berestycki and Prof Robin Evans. It was also inspired by my former maths teacher who used to mark us down for doing more work than necessary even when our solutions were correct. He had a point.

Author