Category Archives: Statistics

No labels, no problem! A quick introduction to Gaussian Mixture Models

Statistical Modelling Big Data AnalyticsTM is in vogue at the moment, and there’s nothing quite so fashionable as the neural network. Capable of capturing complex non-linear relationships and scalable for high-dimensional datasets, they’re here to stay.

For your garden-variety neural network, you need two things: a set of features, X, and a label, Y. But what do you do if labelling is prohibitively expensive or your expert labeller goes on holiday for 2 months and all you have in the meantime is a set of features? Happily, we can still learn something about the labels, even if we might not know what they are!

Continue reading

K-Means clustering made simple

The 21st century is often referred to as the age of “Big Data” due to the unprecedented increase in the volumes of data being generated. As most of this data comes without labels, making sense of it is a non-trivial task. To gain insight from unlabelled data, unsupervised machine learning algorithms have been developed and continue to be refined. These algorithms determine underlying relationships within the data by grouping data points into cluster families. The resulting clusters not only highlight associations within the data, but they are also critical for creating predictive models for new data.

Continue reading

How to be a Bayesian – ft. a completely ridiculous example

Most of the stats we are exposed to in our formative years as statisticians are viewed through a frequentist lens. Bayesian methods are often viewed with scepticism, perhaps due in part to a lack of understanding over how to specify our prior distribution and perhaps due to uncertainty as to what we should do with the posterior once we’ve got it.

Continue reading