Consistent plotting with ggplot

Unlike other OPIGlets (looking at you, Claire), I have neither the skill nor the patience to make good figures from scratch. And making good figures — as well as remaking, rescaling and adapting them — is incredibly important, because they play a huge role in the way we communicate our research. So how does an aesthetically impaired DPhil student do her plotting?

The tool I use in my everyday work is the ggplot2 R package. Mind you, this is not going to be a ggplot tutorial. First of all, the best possible tutorial is already out there. Secondly, I’m far from the most qualified person to do this (that would be Hadley Wickham). However, I did want to share some thoughts on the subject, in the hope of converting at least one of you to (the) R side.

What does the “gg” in “ggplot” stand for?

It stands for “grammar of graphics”. The basic idea behind it is that you can think of a plot as the “sum” of different elements, which can be treated independently of each other. This way you can move from one plot to another simply by changing some elements while keeping others fixed.

Now imagine you were writing a thesis.

Wouldn’t it be nice to have all of your figures in a consistent style? The plots I made three years ago look nothing like the ones I make now. Rather than digging out a mixed bag of images stored in dark, long-forgotten folders of my computer, I use ggplot to re-plot all of my data in the same style. Speaking of data, let’s generate some.

snacks <- rpois(40, 7)
words <- rnorm(40, snacks*100, 250)
isGood <- words>=750
Lyuba <- data.frame(snacks, words, isGood)

The code above will make a (completely detached from reality) data frame of my thesis writing progress. Every row represents a day, and every day I eat some snacks and write some words. For the purposes of this exercise, I count a good day to be any day I’ve written over 750 words. And please don’t ask me how much chocolate I go through in any given week.

Realistic or not, this is the type of data frame ggplot uses as its core input. We can now think about visualising the relationship between snacks eaten and thesis progress made.

Aesthetics, geometry, and theme

Perhaps the most counter-intuitive — and simultaneously the most useful — feature of ggplot is that it separates the concepts of an aesthetic, a geometry and a theme in a plot.

An aesthetic tells you what a part of the plot means. For example, the aesthetics of a plot might be “snacks go on the x-axis, words go on the y-axis, and shape shows what kind of day it was”. We can change the last of these to make a slightly different plot, by having colour rather than shape show whether the day was good or not.

What aesthetics don’t do is tell you what type of plot you are making. That’s what geometry is for. In the plots above, the geometric object is a scatter plot. We can make two plots by keeping the exact same aesthetics, but using different geometric objects — e.g. by fitting a curve instead of using the scatter plot. It’s also really common to combine multiple geometric objects within the same plot.

If the aesthetics tell you what each plot component means and the geometry tells you what kind of plot it is you’re making, then the theme controls (some of) the visuals: the background colour, fonts, and so on. Here’s an example of how the same aesthetics and geometry look in three different themes:

So why should I care?

You can use the same theme across all of your plots in order to make them consistent across a thesis, paper, or other long piece of work. You can also use different themes to adapt the same plot for different purposes. For example, you could have:

  • a thesis theme, which uses your favourite colours,
  • a journal theme, which is black & white printer-friendly, and
  • a poster theme with extra large fonts.  

This is neat, especially when you’re like me, i.e. kind of lazy and not very good with Inkscape — once you’ve set up your themes, the difference between each plot is a single line of code.

Anything else?

Tons! Some important features I haven’t talked about here are scales and faceting, but I’ll let you read through the tutorial for those. In terms of making your plots consistent, though, I have two tips.

In addition to setting up your own themes, you can also do things like automate arranging multiple plots in a single figure. As you can see above, I like to label mine alphabetically and have a function which will do this and arrange them in a grid with a fixed number of columns.

Another thing I can thoroughly recommend investing time in is setting up your own colour palettes. A great thing about palettes in R is that they’re not just a collection of HEX codes you like. You can start with a bunch of colours, group them in different ways (I routinely use a “main” set of four, and an “alternative” colour pair), and automatically extend them to create gradients. In all plots above, I used the default colours. If I was making the same figure for my thesis, it would actually look something like this:

I’m by no means an expert in this, but if you fancy having a look through my code and stealing and adapting it, please do. Content warning: I forgot to seed my random generation right at the start. I hope on this occasion you will forgive me; after all, I do rather desperately need to go eat some snacks and write some words.

Author