Should scientists learn C++?

Conventional wisdom dictates that compiled languages are slow to develop, can be slow to compile, but are fast to run. Interpreted languages are easy to use and do not require compilation but have sluggish performance. Like most people in scientific computing, the first two languages I learned were C++ and Python; I use Python every day but when, if ever, would I use C++?

Speed

How much faster is C++ than Python? A rule of thumb is that pure Python is between 10 and 100 times slower than pure C++. The main reason for this is that a C++ compiler will convert all of the code into highly optimised assembly language before running; bits of Python code are ‘compiled’ on the fly which is far slower. On top of this, unlike C++, Python is dynamically typed, so the interpreter must infer the type of every variable in real time, which hinders performance.

The difference in speed depends on what you are doing and how you are doing it. Nested loops, for instance, benifit from compilation tricks such as loop unrolling which cause orders of magnitude faster performance. However, most of the time these loops can be avoided by using vector operations in libraries such as Numpy, which uses CPU architecture to speed up the manipulation of arrays. If you really need the benifits of compiler optimisation, tools such as Numba or Cython can be used to remove bottlenecks. If you absoulutely must use C++ for a particularly expensive piece of code, libraries such as boost::python are a cumbersome but efficient way of adding C++ functions to your Python code.

GPU Programming

All serious GPU-based scientific computing is done using CUDA. Originally, the only supported languages for writing CUDA kernels were C, C++ and Fortran – due to their memory manipulation functionality – which made knowledge of one of these essential for GPU programming. Since 2013, however, CUDA has supported Numba-augmented Python code https://developer.nvidia.com/how-to-cuda-python which means that developers can generate parallel code without leaving the comfort of Python.

Readability and Libraries

Python is succinct, dynamically typed and easy to read. C++ is verbose, statically typed and at times difficult to read. If you want your code to be pretty and accessible, stick with Python (although the inexplicable lack of support for block comments is at times irritating). Code readability is further enhanced by the fact that many scientists are at familiar with Python. The same cannot be said for C++.

Libraries are where Python comes into its own. Some of them are so common that they can be thought of as part of the native language (Numpy, Matplotlib), and some are obscure and badly documented (most scientific libraries). If you have a problem, somebody has probably written a library that at least attempts to solve it, and using that library is in most cases as simple as installing it and importing it.

This is not the case with C++. Some libraries are standard, and can be used with a simple #include statement; for third party libraries, we must tell the compiler where to find them during linking. Some of these libraries are pre-compiled, some are source code, some are static, some are dynamic and some are header-only. All of these come with advantages and disadvantages, and all have to be linked to the main program. Tools such as CMake can make this easier, allbeit at the expense of learning a whole new ‘language’, and I once wasted a whole day trying to link against Boost, only to find that my version of CMake did not support the latest version of the library.

In Conclusion

Development in the Python ecosystem over the last 10 years has eaten into the areas where C/C++ was the obvious choice. Scientists can certainly be productive without knowing a compiled language, and in almost all applications the time saved in running the program is outweighed by the extra time spent in development.

I still, think knowledge of a compiled language has been valuable. It taught me a lot about computers in general, and informed my programming style. Sometimes other people write in C++ and being able to read and modify that code has already proved useful. Compiled languages will continue to be the work-horses of the scientific computing world in the years to come, even if they are sometimes overlooked by scientists in favour of their dynamic, fashionable, interpreted cousins.

Author