Category Archives: Statistics

No labels, no problem! A quick introduction to Gaussian Mixture Models

~~Statistical Modelling~~ Big Data Analytics^TM is in vogue at the moment, and there’s nothing quite so fashionable as the neural network. Capable of capturing complex non-linear relationships and scalable for high-dimensional datasets, they’re here to stay.

For your garden-variety neural network, you need two things: a set of features, X, and a label, Y. But what do you do if labelling is prohibitively expensive or your expert labeller goes on holiday for 2 months and all you have in the meantime is a set of features? Happily, we can still learn something about the labels, even if we might not know what they are!

Continue reading →

K-Means clustering made simple

The 21^st century is often referred to as the age of “Big Data” due to the unprecedented increase in the volumes of data being generated. As most of this data comes without labels, making sense of it is a non-trivial task. To gain insight from unlabelled data, unsupervised machine learning algorithms have been developed and continue to be refined. These algorithms determine underlying relationships within the data by grouping data points into cluster families. The resulting clusters not only highlight associations within the data, but they are also critical for creating predictive models for new data.

Continue reading →

Where do OPIGlets come from?

Now you might think the answer to this question is OSOWs, but in fact they come from a wide variety of Undergraduate degrees!

Continue reading →

How to be a Bayesian – ft. a completely ridiculous example

Most of the stats we are exposed to in our formative years as statisticians are viewed through a frequentist lens. Bayesian methods are often viewed with scepticism, perhaps due in part to a lack of understanding over how to specify our prior distribution and perhaps due to uncertainty as to what we should do with the posterior once we’ve got it.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: Statistics

No labels, no problem! A quick introduction to Gaussian Mixture Models

K-Means clustering made simple

Where do OPIGlets come from?

How to be a Bayesian – ft. a completely ridiculous example