ggPlotting tips with OPIG data

Ever wondered whether opiglet keep their ketchup in the fridge or cupboard. Perhaps you’ve wanted to know how to create nice figure to display lots of information simultaniously. Publication quality figures are easy within R with the ggplot package. We may also learn some good visualisation.

We will start with the OPIG dataset called `opigdata` (available on polite request). The explores some basic feature of opiglets and the column names are heighlighted below. These include their Age, Subject, left-or-right handed, where they store their ketchup, programming language of preference, height amd Myers-Briggs type. Data contains malicious missingness.

require(dplyr)
require(ggplot2)
require(scales)
require(RColorBrewer)
require(ggupset)
require(patchwork)

opigdata <-  read.csv(file = "opigdata.csv")
colnames(opigdata) <- c("Age", "Subject", "Handedness", "Ketchup", "Language", "Height", "Myers-Briggs")

First, let’s produce a histogram of the ages of opiglets. In ggplot we start with aesthetic (aes), that tell us how to link plotting coordinates to a particular variable. We then specificy the geom, telling us how we wish to plot the data. We then make the plot pretty by specificy certain colours, text sizes and themes. Let’s add a title as well. Note that R is completely gramatical, adding layers of complexity and customisation with `+`.

gg1 <- opigdata %>% as_tibble() %>% ggplot(aes(x = Age)) + 
  geom_histogram(fill= "steelblue", col = "black") + 
  theme_bw() + theme(text = element_text(size = 20)) + 
  scale_x_continuous(breaks = scales::breaks_pretty(n = 10), limits = c(22, 35)) + 
  ggtitle("OPIG Age Distribution")
gg1

How about a pie chart of the subjects that we studied at undegraduate. A pie chart is actually a bar chart in polar coordinates, but first we need to help the data by tabulate with different subjects. We also need to be a bit careful with colours as the number of different options gets rather large – we are diverse group! We learn that some opiglets cannot spell.

gg2 <- opigdata %>% as_tibble() %>% group_by(Subject) %>% summarise(count = n()) %>%
  ggplot(aes(x = "", y = count, fill = Subject)) + 
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) + xlab("") + 
  theme_bw() + theme(text = element_text(size = 20),
                     axis.text = element_blank(),
                     axis.ticks = element_blank(),
                     panel.grid  = element_blank()) + 
  scale_fill_manual(values = c(
    RColorBrewer::brewer.pal(12,'Set3'),
    RColorBrewer::brewer.pal(12,'Paired')
  ))
gg2

Let make a little bit more of a numeric plot. Does the height vs age relationship hold up amongst opiglets. For the general population there is a correlation between height and age (with many confounders). Here, we demonstrate how to make the x-y plot easier to read by specifiying easy to read x and y tick sizes. We also plot the points carefully so they have high contrast to the background linear model fit and we annotate with the correlation in mathematical notation. The standard errors are also plotted and everything is formatted nicely in a straightforward manner. It seems the relationship just about hold, but the model does not look like a great fit to the data.

prho <-  round(cor(opigdata$Age, opigdata$Height, method = "pearson", use = "complete.obs"), 2)


gg3 <- opigdata %>% as_tibble() %>%
  ggplot(aes(x = Age, y = Height)) + 
  geom_point(size = 3, fill = "darkgrey", shape = 21, col = "black") + 
  theme_bw() + theme(text = element_text(size = 20)) + 
  scale_fill_manual(values = c(
    RColorBrewer::brewer.pal(12,'Set3'),
    RColorBrewer::brewer.pal(12,'Paired')
  )) + 
  scale_x_continuous(breaks = scales::breaks_pretty(n = 10), limits = c(22, 35)) + 
  scale_y_continuous(breaks = scales::breaks_pretty(n = 10)) + 
  geom_smooth(method  = "lm", lwd = 2, alpha = 0.5, fill = "steelblue", col = "darkgreen") + 
  geom_text(x = 32, y = 165, 
            label = parse(text = paste0("~rho ==", (prho))), size = 8) + 
  ggtitle("Height-Age Relationship")
gg3

OPIGlets provided some interesting catagorical data such as what they hand they like better, where they put their ketchup and their personality type. It is often difficult to understand relationships between these variables. This is an excellent use case for facetting whereby we can stratify the different groups and relationships. We can see that no left handed people store their ketchup in the cupboard – a very sensible bunch. We can also infer more people are right handed and more people store their ketchup in the fridge. The most commen personality type is INFP and som curious correlations between these groups – read into it whatever you think! Use the paired colour them to draw out further relationships that may or may not be there.

gg4 <- opigdata %>% as_tibble() %>% ggplot(aes(x = `Myers-Briggs`, fill = `Myers-Briggs`)) + geom_bar() + 
  theme_bw() + theme(text = element_text(size = 20)) + 
  scale_y_continuous(breaks = scales::breaks_pretty(n = 5)) + 
  scale_fill_manual(values = c(
    RColorBrewer::brewer.pal(12,'Paired')
  ))  + facet_grid(rows = vars(Handedness), cols = vars(Ketchup)) + coord_flip() +
  ggtitle("Visualising Multiple Catagories")
gg4 

It we want to submit this work to the journal of spurious relationships (sometimes called Cell) then we need to organise into panels. This is super easy in R, as follows:

(gg1 + gg2)/ (gg3 + gg4) + plot_annotation(tag_levels = 'A')

The plot is easy to read. The colours make it easier to draw out relationships. The titles are informative and there is a good balance of white space and numeric consistency.

Author