Improving your Python code quality using git pre-commit hooks

Intro

I recently completed an internship during which I spent a considerable amount of time doing software engineering. One of my main take-aways from this experience was that in industry, a lot more attention is spent on ensuring that code committed to a GitHub repo is clean and bug-free.

This is achieved through several means like code review (get other people to read your code), test-driven development (make sure your code works as you are adding functionality) or paired development (have two people work together on the same piece of code). Here, I will instead focus on a useful tool that is easy to integrate into your existing git workflow: Pre-commit hooks.

Say you have a git repo for your latest bioinformatics tool, ACRONYM. You would like the code to be adherent to PEP8 and also formatted sensibly and uniformly. Now, you could install a linter in your IDE of choice (do that too, linters are useful!) and change your code every time the linter detects that you are not adhering to PEP8. That sounds like a lot of tedious work though…

OR you use scripts to do all of that formatting for you! A good time to run them is probably just before you commit changes to your git branch, i.e. as a pre-commit hook. There are two hooks that I am going to introduce here:

Black is a code formatter, that means every time you make a commit to your repo, black checks if any changes need to be made to your code to make it adhere to PEP8 and a couple of other sensible style choices and if so, rejects the commit and makes those changes for you. Then you just add and commit again and you are good to go.

It would also be quite nice if code you commit is checked a bit more rigorously, for example not letting you commit code that includes unused variables, unnecessary imports or functions that are too complex. That’s where flake8 comes in, a code linter that can be called as a pre-commit hook, that does exactly that and more!

Installation and setup

First, we need to install pre-commit for git:

pip install pre-commit
or 
conda install -c conda-forge pre-commit

Then we have to define the hooks we want to use in .pre-commit-config.yaml in the root directory of the github repo:

repos:
-   repo: https://github.com/ambv/black
    rev: 20.8b1
    hooks:
    - id: black
      language_version: python3.7
-   repo: https://gitlab.com/pycqa/flake8
    rev: 3.8.4
    hooks:
    - id: flake8

… and install them, which adds the hooks to .git/hooks/

pre-commit install

There are many more pre-commit hooks that are easy to install, e.g.:

  • reorder_python_imports standardises the order and spacing of import statements
  • mypy is useful if you want to use static typing in your python project
  • check-yaml makes sure that .yaml files are parseable
  • for more, see https://pre-commit.com/hooks.html

Example

Let’s say the file you are committing is the below:

import numpy as np
import pandas as pd
def fun(a):
    useless_copy = a.copy()#this is pointless
    return np.mean(a)
if __name__ == '__main__':
    arr = np.arrange(5)
    print(fun(arr))

This will run, but obviously there are problems with the formatting of this file.

Let’s try to commit it!

black....................................................................Failed
- hook id: black
- files were modified by this hook

reformatted test_file.py
All done! ✨ 🍰 ✨
1 file reformatted.

The file now looks like this:

import numpy as np
import pandas as pd


def fun(a):
    useless_copy = a.copy()  # this is pointless
    return np.mean(a)


if __name__ == "__main__":
    arr = np.arrange(5)
    print(fun(arr))

Much better! But flake8 (rightly) still complains:

flake8...................................................................Failed
- hook id: flake8
- exit code: 1

test_file.py:2:1: F401 'pandas as pd' imported but unused
test_file.py:6:5: F841 local variable 'useless_copy' is assigned to but never used

Fixing all of those leads us to the nicely formatted and bug-free file below:

import numpy as np


def fun(a):
    return np.mean(a)


if __name__ == "__main__":
    arr = np.arrange(5)
    print(fun(arr))

Author