Author Archives: Mark

Slippery slopes and slippery flats

In this episode of my decade-long quest to correct popular British misconceptions, I wish to turn to one of my most geeky obsessions: trains. In particular, I would like to address a particularly British obsession, which many take as a signal of the lapse of British know-how from its mid-Empire industrial-revolution heights.

This is, of course, ‘leaves on the line‘. Why – demand the British public – must timetables run five to ten minutes slower when there are more leaves on the ground? Why do no other modern countries suffer from these ills? And why does the railway take no action over this commuting scourge?

Continue reading →

The evolution of contact prediction – a new paper

I’m so pleased to be able to write about our work on The evolution of contact prediction: evidence that contact selection in statistical contact prediction is changing (Bioinformatics btz816). Contact prediction – the prediction of parts of the amino-acid chain that are close together – has been critical to improving the ability of scientists to predict protein structures over the last decade. Here we look at the properties of these predictions, and what that might mean for their use.

The paper begins with a question. If contact prediction methods are based on statistical properties of sequence alignments, and those alignments are generated in the presence of ecological and physical constraints, what effect do the physical constraints have on the statistical properties of real sequence alignments? More concisely: when we predict contacts, do we predict particularly important contacts?

Continue reading →

You’re getting on my biscuits

Jaffa cakes are God’s own snacks and I will brook no opposition. I don’t mind if they’re McVitie’s brand Jaffa cakes, or Pim’s, the suspicious European variety. Even Sainsbury’s Basics Jaffa cakes float my balloon. Take a soft sponge base, slap some jam and chocolate on that puppy, and you’re golden.

But if you describe your love of these glorious creations, the conversation takes a familiar turn. Are they cakes or are they biscuits? it goes. HMRC tried to classify them as cakes – or was it biscuits? Something like that. It had to do with VAT…

Continue reading →

A new way of eating too much

Fresh off the pages of Therapeutic Advances in Endocrinology and Metabolism comes a warning no self-respecting sweet tooth should ignore.

“Liquorice is not just a candy,” write a team of ten from Chicago. “Life-threatening complications can occur with excess use.” Hold on to your teabags. Liquorice – the Marmite of sweets – is about to become a lot more sinister.

Continue reading →

Oh lord, they comin’ – a diversity of units

For scientists, units are like money: a few people obsess about them, but the less you have to think about units, the better. And, like switching a bank account, changing your units is usually tiresome and complicated for little real advantage. But spare a thought for the many units that have been lost to the inexorable march of scientific advance, and for the few that are still in regular use.

Continue reading →

Quick Python tricks

It’s always fun when you stumble across something in your programming toolkit that you had never noticed. Here are three things I’ve recently enjoyed learning.

Ternary syntax

a = int(raw_input())
is_even = True if a % 0 == 0 else False

Enumerate

I’ve been looping over the length of my list, all these years, like a chump. It turns out you can do this:

for index, item in enumerate(some_list):
    # now the index of each item is available as well as the item

# Don't do do this
for index in range(len(some_list)):
    item = some_list[index]

for… else

Every so often, you really need to know that a for loop has run to completion. That’s what for…else is for!

for item in iterable:
    if item % 0 == 0:
       first_even_number = item
else:
    raise ValueError('No even numbers')

How to get seasick without leaving your desk

It’s always easy to get caught up in computational work, so let me describe a quick experiment you can do to relieve the boredom. Sit down in your spinny chair and cross your legs. (You do have a spinny chair, right? If not, get one – they prevent repetitive strain injuries, and more importantly, they are great fun. This experiment works best if the chair has no arms.) Start spinning that spinny chair until you feel like you’ve achieved a stable rotation. Then – smartly – stop the rotation and sit up straight. Continue reading →

OPIG at the Oxford Maths Festival

Men with glasses poring over long columns of numbers. Tabulation of averages and creation of data tables. Lots of counting. The public image of statistics hardly corresponds to what OPIG do – even where OPIG’s work is at its most formally statistical.

OPIG exhibited a street stall at the Oxford Maths Festival to try to change that perception. How do you interest passers-by in real statistics without condescending and without oversimplifying? Data is becoming more important in the lives of all kinds of people and we need to be clear that it isn’t magic, but neither is it trivial. We need to prove that the kind of thoughtful reasoning that people put into managing their lives is the same kind of thing we do in data analysis.

Let’s look at one activity that OPIG did on the street at the Oxford Maths Festival.

The idea

We started with a compelling story: rumors were flying during the Second World War about how many tanks the Germans were producing. Allied intelligence needed to figure out if these numbers were true. In its simple retelling, the Germans simply numbered their tanks, but in truth there were two sequences of gearbox numbers, complicated chassis and engine numbers, and a set of numbered wheel moulds which left numbers imprinted in each of the wheels of the tanks.

However you parse it, the relevant problem is finding out how many gearboxes, chassis, wheels, or engines were in use, which tells you how many tanks are being produced, since no tank can have, for example, more than one engine. It’s a little hard to get close to a German tank, so Allied soldiers collected these numbers when they were destroyed.

This problem – finding the largest number of a range 1, 2, 3, 4, … N – is known generally as the German Tank Problem. Most mathematics educators have some familiarity with it, but on the street we found it works really well. This observation presumably speaks to the paucity of mathematics educators on the street.

How it works

The demonstration, while straightforward and quick, has a few subtleties. We cut open a box and cut 4750 slips of paper, numbered from 1 to 4750.

The first step is to ask how many slips of paper are in the box. Answers varied from one hundred to more than 10,000. Shaking the box helped encourage unreliable guesses and prevented people from reading the numbers off the slips in the box.

Then we asked people to pick one piece of paper out of the box and update their guess. We found a few aspects of this to be interesting. It is trivial to convince people that they will not get the top number in their guess. We all, then, have an intuition that the number drawn at this stage should be ‘somewhere in the middle’, which is completely wrong. There’s no particular reason to think that the guess should be in any interval over the distribution. It is true, however, that the mean of the first number chosen after many runs of the demonstration is 2375 and reasoning about average behaviour in this way turns out to be very powerful.

We then would ask people to draw up to four more slips. The point that people should absorb at this stage is that those guesses should assort evenly over the possible distribution, and you only need to add ‘a little bit more’ to compensate for this effect. We then precomputed the best guess for various numbers because the calculation is too tedious for a streetcorner – from which it becomes obvious that having more than five slips is nearly unnecessary to guess the maximum well. (See the histogram of estimates from five-slip draws below.) And 60% of guesses were slightly high of the true number, but the guesses that were too low tended to be much too low.

Going from blind indifference to a really solid guess is a powerful experience, and we can take people through it with every step of the reasoning on display. It shows what data analysis looks like at the research level and can be a great experience for the public.

Paper review: “Inside the black box”

There are nearly 17,000 Oxford students on taught courses. They turn up reliably every October. We send them to an army of lecturers and tutors, drawn from every rank of the research hierarchy. As members of that hierarchy, we owe it to the students – all 17,000 of them – to teach them as best we can.

And where can we learn the most about how to teach? There are 438,000 professional teachers in the UK. Maybe people who spend all of their working time on the subject might have good strategies to help people learn.

The context of the paper

Teachers obsess over assessment. Assessment is the process by which teachers figure out what students have learned. It is probably true that assessment is the only reason we have classrooms at all.

Inside the Black Box is of the vanguard of recent changes in educational thinking. Modern teaching regards good pedagogy as a practical skill. Like other types of performance, it depends on a specific set of concrete actions which can be taught and learned. Not everyone is a natural teacher – but nearly everyone can become a competent teacher.

Formative assessment is the focus of Inside the Black Box. The article argues that this process, in which teachers figure what students know and tell them how it’s going wrong, is essential to good classroom practice.

What is the black box?

The black box is the classroom. After societal convulsions over class sizes, funding deficits, curriculum reforms, and examination structure, it’s time – says the article, in 2001 – that we focus on what actually goes on inside the classroom. These social changes, it says, adjust the inputs to the black box, and society expects better things out of the black box. But what if changing the inputs makes the work inside the black box harder? Don’t we have an obligation to figure out what needs to happen to get students to learn?

The article touches three questions:

Is there evidence that improving formative assessment raises standards?
Is there evidence that there is room for improvement?
Is there evidence about how to improve formative assessment?

The answers are yes, yes, and yes. In meta-analyses of educational experiments, formative assessment consistently raises standards. These experiments match the experience of teachers, who know that the least effective lessons are those which do not respond to students’ needs. Standard observations – such as those from Ofsted – ask teachers to answer what are they learning, and then how do you know, and then what are you doing about it?

The second question – is there room for improvement? – is one they address in great detail in the context of primary and secondary education. Some criticisms (the giving of grades for its own sake, unintentional encouragement of “rote or superficial learning”, relentless competition between students) seem applicable in different parts of our university context. A greater weakness is a lack of emphasis. People engaged in university teaching frequently center the delivery of knowledge instead of learning, an idea exacerbated by our obsession with lectures and masked by the long lag between those lectures and the exams in which we assess them.

Recommendations

Inside the Black Box makes specific recommendations for instructors about how to engage in formative assessment. Those recommendations – unusually, for an item in the educational literature – are specific and detailed. But rather than focus on them, it is worth examining three themes which run across the article.

The overriding focus is the importance of formative assessment. If we care about what students learn, then we’ve got to be checking what it is that they actually are learning. Opportunities for formative assessment should be “designed into any piece of teaching”. In extremis, this idea has interesting implications for the institution of lectures, which generally lack them entirely.

A subsidiary idea is the importance of setting clear objectives for learning. Too many students view learning as a series of exercises rather than a step in the formation of a coherent body of knowledge. The overarching direction should be made clear. And on a more detailed level, we need to be explicit about what outcomes we want our students to obtain so that they know whether they are making satisfactory progress. Formative assessment must make reference to expectations, and formative self- or peer assessment becomes impossible if those expectations are not well-understood.

And this discussion ties into a final point: when students truly apply themselves to the task of learning, their self-perception and self-esteem becomes bound up in it. Ineffective expectation-setting and insufficient clarity about the means for improvement result in students feeling demotivated, which causes them to revise their goals downward. They put in less effort and achieve outcomes that are worse. These effects are costly and can be avoided by effective formative assessment.

Inside the Black Box is a diversion from our diet of scientific articles, but I think it is worth our attention. Pedagogy is difficult to get right. In the university context, good practice is the subject of little attention and rarely assessed. Thinking about good asssessment means that our students benefit.

But all communication activities are a form of teaching. Really good teachers communicate really well. When good communication happens, everyone benefits, inside and outside the black box.

In MATLAB, it’s colormaps all the way down

My overriding emotion, working in R, has been incomprehension: incomprehension at the gallery of ugly gnomes that populate the namespace and worried puzzlement over the strange incantations required to get them to dance in a statistically harmonious way. But all that aside, I resolved, joining the group, to put aside my misgivings and give the gnomes another try.

Soon, I found myself engaged in a reassessment of my life choices. I realized that life’s too short to spend it tickling gnomes – especially when only one of them knows how to do linear regression, but he won’t tell you your p value unless you give him the right kinds of treats. I fired up MATLAB and I haven’t looked back.

However, there was issue of continued perplexity, and I’m not referring to why MATLAB insists on shouting itself at you. I need to make a lot of 2-D plots of protein distance matrices. The trouble is that I like to highlight parts of them, and that’s not straightforward in MATLAB. Let’s have a look at an example:

>> dists=dlmread('1hel.distances');
>> colormap gray;
>> imagesc(dists>8);
>> axis square;

Now, let’s load up a set of residues and try to overlay them on top of the first image:

>> resn=dlmread('1hel.resn');
>> mask = zeros(size(dists));
>> mask(resn,resn)=1;
>> hold on
>> imagesc(1-mask, 'AlphaData',mask*.5);

So far, so easy. To review the main points:

mask is a matrix which has a one at all the pixels that we want to highlight. But we use imagesc(1-mask) because the gray colormap displays black at 0 and white at 1. If we did imagesc(mask), we would end up with grey everywhere and white only where we hoped to highlight – the opposite effect from the one that we sought.

AlphaData is a property which sets the transparency of the image. We want the image to be fully transparent where mask is 0 – so as not to fog out the underlying image – and partially transparent where mask is 1. 0.5*mask is a matrix which is 0.5 everywhere that mask is 1 and 0 everywhere else. If we set 0.5*mask as the AlphaData property, then the colour we add will be at half transparency and the white areas will be fully transparent.

But this isn’t a very pleasant image. We want to be able to highlight the regions in some colour other than grey. Let’s try.

>> close all
>> imagesc(dists>8)
>> colormap gray
>> axis square
>> imagesc(1-mask, 'AlphaData',mask*.3,'ColorMap','jet');
Error using image
There is no ColorMap property on the Image class.

Error in imagesc (line 39)
hh = image(varargin{:},'CDataMapping','scaled');

No luck! What’s more, setting the colormap between calls to image() and imagesc() also doesn’t work. Here’s the problem: the colormap is a property of the figure, not the data. (More precisely, it is not a property of the MATLAB axes.) When you change the colormap, you change the colors of every datapoint in the image.

The fix

MATLAB’s colormap mechanism is just simple enough to be confusing. MATLAB stores colours as 1×3 vectors, where each element in the vector is the proportion of red, green, or blue, respectively. [1 1 1] is white, [0 0 0] is black, and [1 0 0] is a frightfully iridescent red. A colormap is just a list of colors – 64 will normally do – which change smoothly from from one colour to another. To have a look at the built-in MATLAB colormaps, see here.

image rounds every value in the matrix to the nearest whole number (call that number i) and plots that pixel with the color given by colormap(i,:). Zero or below gets the first entry in the colormap and any index higher than the maximum is displayed with the last color in the colormap. So: if we construct a new colormap by concatenating two colormaps – the first running from rows 1 to 64 and the second running from 65 to 128 – if we scale our data so that the minimum is 65 and the maximum is 128, the data will never use the first set of colors. And, likewise, if we scale so that the lowest value is 1 and the highest is 64, we will use the first colormap. This seems like the sort of thing that we could manage automatically – and should, in fact. So I set myself to replace image and imagesc so that they would accept a ColorMap parameter.

How would it work?

>> colormap bone
>> imagesc(dists>8)
>> hold on
>> imagesc(mask,'ColorMap',[0.5 0 0.5],'AlphaData',0.5*(mask>0))
>> axis square

Beautiful!

Implementation notes

image is implemented in the MATLAB Java source code, but imagesc is a wrapper to image, written directly in MATLAB code. Therefore, overloading image requires the new function to be placed in a special directory called @double, while imagesc can be placed anywhere (except it cannot be placed in @double). If you then want to call the original version of image(), you can use builtin(‘image’,arg1,arg2,…), whereas if you want to call the original imagesc, it is a right pain. Instead, I used type imagesc to extract the source of imagesc and I modified that source directly – obviating any need to call the original imagesc. For reference, though, the most efficient way works out to be to find the function with which('imagesc'), cd into the containing directory, create a function handle to imagesc, and then cd out. As I said, it’s a mess.
These edits break colorbars. I added a spacer entry in each colormap which stores the maximum and minimum ‘real’ values of the data – in case that is useful for when I get around to extending colorbar. colormap entries must be inside [0,1] so these data are stored in the first twelve decimal places of the colormap entries: a strange burlesque on floating points. It’s a hack, but for my purposes it works.
In addition to the standard colormaps, I often require a mask in a particular color. For this purpose it helps to have a colormap that smoothly varies from white to the color in question. It actually doesn’t matter if it varies from white or any other color – ultimately, I only use the full colour value, since I set the transparency of all other pixels to maximum – but either way, passing the colour on [0,1] scale or [0,255] scale sets a colormap which varies from white to that color.

The code is available on MATLAB File Exchange at this link and is installable by copying imagesc.m, bootleg_fp.m, and the directory @double into your working directory. The idea to concatenate colormaps is widely available online – for example, here.

Oxford Protein Informatics Group

or "OPIG" to friends

Author Archives: Mark

Slippery slopes and slippery flats

The evolution of contact prediction – a new paper

You’re getting on my biscuits

A new way of eating too much

Oh lord, they comin’ – a diversity of units

Quick Python tricks

How to get seasick without leaving your desk

OPIG at the Oxford Maths Festival

The idea

How it works

Paper review: “Inside the black box”

In MATLAB, it’s colormaps all the way down

The fix

Implementation notes