Quality Stats

Disclaimer – the title is a Quality Street pun only and bears no relation to the quality of the data or analysis presented below. This whole blog post is basically to discredit the personal chocolate preferences of a group member who shall remain nameless. Safe to say though, they Vostly overestimated people’s love for the Toffee Finger. Long live the Orange Creme.

In the run-up to Christmas, I was arguing with another member of the group about which are the best and worst Quality Street chocolates. This is clearly an important topic, with YouGov previously dedicating vast resources (I assume) attempting to answer this very question.

However, as the YouGov poll did not perfectly align with my personal and very accurate preferences, I decided to run another, better experiment. For this experiment, I bought a tub of Quality Street, counted all the chocolates, and then left the tub out in the common area for hungry opiglets to consume. I then recounted the chocolates at various points over the next two days to find out which flavours disappeared first, and perhaps more importantly, which sad chocolates were taken only after all other options were exhausted.

As expected, crowd favourites The Purple One and The Green Triangle were quick to go, along with the Fudge and Milk Choc Block. The cremes, controversially my personal favourites, sadly performed only averagely. However, to my great delight, the Toffee Finger comprehensively beat all other competition (including the Coconut Eclair!) to take the wooden spoon and provide me with a moderate degree of smugness in the end.

In an attempt to make this blog post somewhat useful, I’ve included the code I used to make the results plot below. This code should allow you to sort a DataFrame using a custom list, pivot the data when you’re an idiot and type it up the wrong way round, and make a DIY colour palette for your plots.

And remember, even though you may be appalled by others’ chocolate preferences, this actually makes them the perfect person to sit down and share a box with during these festive times.

Happy holidays!

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Read in data

# data I manually recorded in a csv
quality_df = pd.read_csv("Quality_data.csv", names=["flavour", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9"])

# YouGov ranking - https://twitter.com/yougov/status/940868550700527616?lang=en-GB
yougov_ranking = ["The Purple One", "The Green Triangle", "Caramel Swirl", "Strawberry Delight", "Orange Creme", "Milk Choc Block", "Fudge", "Toffee Finger", "Orange Chocolate Crunch", "Toffee Penny", "Coconut Eclair"]

# sort data (reverse so that plot is in sensible order later)
quality_df["flavour"] = quality_df["flavour"].astype("category")
quality_df["flavour"] = quality_df["flavour"].cat.set_categories(yougov_ranking)
quality_df = quality_df.sort_values(["flavour"], ascending=False).reset_index(drop=True)

quality_df
flavourt1t2t3t4t5t6t7t8t9
0Coconut Eclair555222220
1Toffee Penny666331110
2Orange Chocolate Crunch555542100
3Toffee Finger766555542
4Fudge864000000
5Milk Choc Block431000000
6Orange Creme666332200
7Strawberry Delight755320000
8Caramel Swirl775211111
9The Green Triangle443100000
10The Purple One544000000

Pivot data for plotting

quality_df = quality_df.T

new_header = quality_df.iloc[0]   # grab the first row for the header
quality_df = quality_df[1:]       # take the data less the header row
quality_df.columns = new_header   # set the header row as the df header

quality_df = quality_df.reset_index(drop=True)
quality_df.index.names = ["time"]
quality_df
flavourCoconut EclairToffee PennyOrange Chocolate CrunchToffee FingerFudgeMilk Choc BlockOrange CremeStrawberry DelightCaramel SwirlThe Green TriangleThe Purple One
time
056578467745
156566365744
256564165534
323550033210
423450032100
521250020100
621150020100
721040000100
800020000100

Plot data

flavours = quality_df.columns.tolist()
time = quality_df.index.tolist()

data = [quality_df[flavour].tolist() for flavour in flavours]
normalised_data = np.zeros_like(data).astype(float)
for i in range(len(time)):
    normalised_data[:,i] = (np.array(data)[:,i]) / np.array(data)[:,i].sum()
flavour_to_colour = {"The Purple One":          "purple",
                     "The Green Triangle":      "limegreen",
                     "Caramel Swirl":           "gold",
                     "Strawberry Delight":      "red",
                     "Orange Creme":            "darkorange",
                     "Milk Choc Block":         "darkgreen",
                     "Fudge":                   "fuchsia",
                     "Toffee Finger":           "chocolate",
                     "Orange Chocolate Crunch": "orangered",
                     "Toffee Penny":            "goldenrod",
                     "Coconut Eclair":          "mediumblue"}

palette = [colour for colour in flavour_to_colour.values()]
palette.reverse()
# stacked area plot
plt.stackplot(time, normalised_data, labels=flavours, colors=palette)

plt.legend(reversed(plt.legend().legendHandles), reversed(flavours), bbox_to_anchor=(1.04, 1), loc="upper left")

plt.xlim([time[0], time[-1]])
plt.ylim([0,1])
plt.xlabel("Random times I checked the tub")
plt.title("Quality Street tub composition over time\n(chocolates ordered according to YouGov ranking)")
plt.show()

Author