Author Archives: Thomas Hadfield

Tidbits from YouGov Polls

Some recent verdicts from the British public on YouGov polls

The Queen (97%) is less well known than her husband Prince Philip (98%)
Liz Truss’ UK popularity rating (21%) is lower than George W. Bush’s (22%)
The most popular British dish is ‘Chips’ (84%) followed by ‘Fish and Chips’ (83%)
Oxford (55%) is a less popular university than Cambridge (58%)

So much for Aristotle’s ‘wisdom of crowds’!

Tracking Changes in LaTeX

Tracking changes in Microsoft Word is easy – we just click the ‘Track Changes’ button. It requires a little more work in LaTeX but here is a quick guide to doing it as painlessly as possible!

First, we want our original document to be stored in a *.tex file in an Overleaf project (as shown below)

Continue reading →

An A-Z of Oxford

The 2021/2 academic year is now well underway in Oxford, which means a fresh batch of new students getting to grips with some of the bewildering terminology employed here, as well as prospective applicants for next year trying to figure out what on earth a college is and which one they should apply to. As a wizened final year DPhil student I decided to compile an A-Z of Oxford related terms in the hope that someone might find it useful.

A – Ashmolean Museum

Britain’s first public museum, established all the way back in 1678. Home to exhibits covering Ancient Egypt to Modern Art and everything in between.

The Ashmolean Museum of Art and Archaeology | Art UK — The front of the Ashmolean, right in the middle of Oxford City Centre

B – Battels

A termly bill students receive from their college which might cover things like charges for food and accommodation, or fines for not returning books to the library on time.

C – College

The 39 colleges are small educational institutions which together comprise the University of Oxford. Every student is a member of a college, each of which has their own set of facilities, including a dining hall, bar, library and student accommodation. Colleges also have their own student unions, called the Junior Common Room (for undergraduates) and Middle Common Room (for postgraduates), which are excellent places to socialise and meet people studying lots of different subjects.

Aerial view of Oxford, UK, a very well preserved city with one of the most beautiful university campuses I know about.: ArchitecturalRevival — An aerial view of many of the university’s colleges

Continue reading →

Multiple Testing: What is it, why is it bad and how can we avoid it?

P-values play a central role in the analysis of many scientific experiments. But, in 2015, the editors of the Journal of Basic and Applied Social Psychology prohibited the usage of p-values in their journal. The primary reason for the ban was the proliferation of results obtained by so-called ‘p-hacking’, where a researcher tests a range of different hypotheses and publishes the ones which attain statistical significance while discarding the others. In this blog post, we’ll show how this can lead to spurious results and discuss a few things you can do to avoid engaging in this nefarious practice.

The Basics: What IS a p-value?

Under a Hypothesis Testing framework, a p-value associated with a dataset is defined as the probability of observing a result that is at least as extreme as the observed one, assuming that the null hypothesis is true. If the probability of observing such an event is extremely small, we conclude that it is unlikely the null hypothesis is true and reject it.

But therein lies the problem. Just because the probability of something is small, that doesn’t make it impossible. Using the standard significance test threshold of 0.05, even if the null hypothesis is true, there is a 5% chance of obtaining a p-value below the significance threshold and therefore rejecting it. Such false positives are an inescapable part of research; there’s always a possibility that the subset you were working with isn’t representative of the global data and sometimes we take the wrong decision even though we analysed the data in a perfectly rigorous fashion.

Continue reading →

A Smattering of Olympic Trivia!

Tokyo 2020 is now firmly in our rearview mirror, and I for one will be sad to be deprived of the opportunity to wake up at 4AM to passionately cheer on someone I’ve never heard of in an event I know nothing about as they go for Gold. The heyday of amateurism in the Olympics may be long gone, but it’s never been better for the amateur fan, with 24/7, on-demand, coverage, unprecedented access to the athletes via social media and remote working offering the opportunity to watch the games on a second screen without worrying about one’s boss noticing (not that I would ever engage in such an irresponsible practice, in case my Supervisor is reading this…).

To indulge both my post-Olympics melancholy and my addiction to sports trivia, I’ve trawled the internet to find some interest factoids related to the Summer Games and present them below for your mild enjoyment:

Continue reading →

CAML: Courses in Applied Machine Learning

*Shameless self-promotion klaxon!! Have a look at my new website!*

I’m excited to share a project I’ve been working on for the past few months! One of the biggest challenges of working on an interdisciplinary research project is getting to grips with the core principles of the disciplines which you don’t have much formal training in. For me, that means learning the basics of Medicinal Chemistry and Structural Biology so that when someone mentions pi-stacking I don’t think they’re talking about the logistics of managing a bakery; for people coming from Bio/Chem backgrounds it can mean understanding the Maths and Statistics necessary to make sense of the different algorithms which are central to their work.

Continue reading →

Drawing Wavy Lines That Match Your Data, or, An Introduction to Kernel Density Estimation

One of the fundamental questions of statistics is “How likely is it that event X will occur, given what we’ve observed already?”. It’s a question that pops up in all sorts of different fields, and in our daily lives as well, so it’s well worth being able to answer rationally. Under the statistician’s favourite assumption that the observed data are independent and identically distributed (i.i.d.), we can use the data to construct a probability distribution; that is, if we’re about to observe a new data point, x*, we can say how likely it is that x* will take a specific value.

Continue reading →

No labels, no problem! A quick introduction to Gaussian Mixture Models

~~Statistical Modelling~~ Big Data Analytics^TM is in vogue at the moment, and there’s nothing quite so fashionable as the neural network. Capable of capturing complex non-linear relationships and scalable for high-dimensional datasets, they’re here to stay.

For your garden-variety neural network, you need two things: a set of features, X, and a label, Y. But what do you do if labelling is prohibitively expensive or your expert labeller goes on holiday for 2 months and all you have in the meantime is a set of features? Happily, we can still learn something about the labels, even if we might not know what they are!

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Author Archives: Thomas Hadfield

Tidbits from YouGov Polls

Tracking Changes in LaTeX

An A-Z of Oxford

A – Ashmolean Museum

B – Battels

C – College

Multiple Testing: What is it, why is it bad and how can we avoid it?

The Basics: What IS a p-value?

A Smattering of Olympic Trivia!

CAML: Courses in Applied Machine Learning

Drawing Wavy Lines That Match Your Data, or, An Introduction to Kernel Density Estimation

No labels, no problem! A quick introduction to Gaussian Mixture Models