Author Archives: Jinwoo Leem

Computational Antibody Affinity Maturation

In this week’s journal club, we reviewed a paper by Lippow et al. in Nature Biotechnology, which features a computational pipeline that is capable of maturing antibodies (Abs) by up to 140-fold. The paper itself discusses 4 test case Abs (D44.1, cetuximab, 4-4-20, bevacizumab) and uses changes in electrostatic energy to identify favourable mutations. Up to the point when this paper was published back in 2007, computational antibody design was an (almost) unexplored field of research – except for a study by Clark et al. in 2006, no one else had done anything like the work presented in this paper.

The idea behind the paper is to identify certain positions within the Ab structure for mutation and hopefully find an Ab with a higher binding affinity.

The idea behind the paper is to identify certain positions within the Ab structure for mutation and hopefully find an Ab with a higher binding affinity.


Briefly speaking, the group generated a mutant Ab-antigen (Ag) complex using a series of algorithms (dead-end elimination and A*), which was then scored by the group’s energy function for identifying favourable mutations. Lippow et al. used the electrostatics term of their binding affinity prediction in order to estimate the effects of mutations on an Ab’s binding affinity. In other words, instead of examining their entire scoring function, which includes terms such as van der Waal’s energy, the group only used changes in the electrostatic energy term as an indicator for proposing mutations. Overall, in 2 of the 4 mentioned test cases (D44.1 & cetuximab), the proposed mutations were experimentally tested to confirm their computational design pipeline – a brief overview of these two case studies will be described.


In the case of the D44.1 anti-lysozyme Ab, the group proposed 9 single mutations by their electrostatics-based calculation method; 6/9 single mutants were confirmed to be beneficial (i.e., the mutant had an increased binding affinity). The beneficial single mutants were combined, ultimately leading to a quadruple mutant structure with a 100-fold improvement in affinity. The quadruple mutant was then subjected to a second round of computer-guided affinity maturation, leading to a new variant with six mutations (effectively a 140-fold improvement over the wild-type Ab). This case study was a solid testimony to the validity of their method; since anti-lysozyme Abs are often used as model systems, these results demonstrated that their design pipeline had taken, in principle, a suitable approach to maturing Abs in silico.

The second case study with cetuximab was arguably the more interesting result. Like the D44.1 case above, mutations were proposed to increase the Ab’s binding affinity on the basis of the changes in electrostatics. Although the newly-designed triple mutant only showed a 10-fold improvement over its wild-type counterpart, the group showed that their protocols can work for therapeutically-relevant Abs. The cetuximab example was a perfect complement to the previous case study — it demonstrated the practical implications of the method, and how this pipeline could potentially be used to mature existing Abs within the clinic today.

Effectively, the group suggested that mutations that either introduce hydrophobicity or a net charge at the binding interface tend to increase an Ab’s binding affinity. These conclusions shouldn’t come with huge surprise, but it was remarkable that the group had reached these conclusions with just one term from their energy function.


Effectively, the paper set off a whole new series of possibilities and helped us to widen our horizons. The paper was by no means perfect, especially with respect to predicting the precise binding affinities of mutants – much of this error could be bottled down to the modelling stage of their pipeline. However, the paper showed that computational affinity maturation is not just a dream – in fact, the paper showed that it’s perfectly doable, and immediately applicable. Interestingly, Lippow et al.’s manipulation of an Ab’s electrostatics seemed to be a valid approach, with recent publications on Ab maturation showing that introducing charged residues can enhance binding affinity (e.g. Kiyoshi et al., 2014).

More importantly, the paper was a beautiful showcase of how computational analyses could inform the decision making process in an in vitro framework, and I believe it exemplified how we should approach our problems in bioinformatics. We should not think of proteins as mere text files and numbers, but realise that they are living systems, and we’re not yet at a point where we fully understand how proteins behave. This shouldn’t discourage us from research; instead, it should give us the incentive to take things more slowly, and develop a method/product that could be used to solve greater, pragmatic problems.

Journal Club: Human Germline Antibody Gene Segments Encode Polyspecific Antibodies

This week’s paper by Willis et al. sought to investigate how our limited antibody-encoding gene repertoire has the ability to recognise the unlimited array of antigens. There is a finite number of V, D, and J genes that encode our antibodies, but it still has the capacity to recognise an infinite number of antigens. Simply, the authors’ notion is that an antibody from the germline (via V(D)J recombination; see entry by James) is able to adopt multiple conformations, thus allowing the antibody to bind multiple antigens.

Three antibodies derived from the germline gene 5*51-01, all binding to very different antigens.

Three antibodies derived from the germline gene 5*51-01 bind to very different antigens.

To test this hypothesis, the authors performed a multiple sequence alignment for the amino acid sequence between the mature antibodies and the germline antibody sequence from which the antibodies are derived from. if a single position from ONE mature antibody showed a difference to the germline sequence, it was identified as a ‘variable’ position, and allowed to be changed by Rosetta’s multi-state design (MSD) and single-state design (SSD) protocols.

Pipeline: align mature antibodies (2XWT, 2B1A, 3HMX) to the germline sequence (5-51) , identify 'variable' positions from the alignment, then allow Rosetta to change those residues during design.

Figure 1) from Willis et al., showing the pipeline: align mature antibodies (2XWT, 2B1A, 3HMX) to the germline sequence (5-51) , identify ‘variable’ positions from the alignment, then allow Rosetta to change those residues.

Surprisingly, without any prior information of the germline sequence, the MSD yielded a sequence that was closer to the germline sequence, and the SSD for each mature antibody had retained the mature sequence. In short, this indicated that the germline sequence is a harmonising sequence that can accommodate the conformations of each of the mature antibodies (as proven by MSD), whereas the mature sequence was the lowest energy amino acid sequence for the particular antibody’s conformation (as proven by SSD).

To further demonstrate that the germline sequence is indeed the more ‘flexible’ sequence, the authors then aligned the mature antibodies and determined the deviation in ψ-ϕ angles at each of the variable positions that were used in the Rosetta study. They found that the ψ-ϕ angle deviation in the positions that recovered to the germline residue was much larger than the other variable positions along the antibody. In other words, for the positions that tend to return to the germline amino acid in MSD, the ψ-ϕ angles have a much larger degree of variation compared to the other variable positions, suggesting that the positions that returned to the germline amino acid are prone to lots of movement.

In addition to the many results that corroborate the findings mentioned in this entry, it’s neat that the authors took a ‘backwards’ spin to conventional antibody design. Most antibody design regimes aim to find amino acid(s) that give the antibody more ‘rigidity’, and hence, mature its affinity, but this paper went against the norm to find the most FLEXIBLE antibody (the most likely germline predecessor*). Effectively, they argue that this type of protocol can be exported to extract new antibodies that can bind to multiple antigens, thus increasing the versatility of antibodies as potential therapeutic agents.

Life in Colour – Vim

Among programmers, there are occasional debates on what editor is best — some love Eclipse, some are die-hard Emacs supporters, or some have no preference, and use the default text editor(s) with their OS. Whatever your choice, you can never underestimate how useful Vim can be, e.g. if you SSH into another machine. And so, here is a vim config that I’ve been using (thanks to Ben Frot), which makes your vim environment very colourful and easy to read. Code available here.

Plus, you can do awesome things in vim:

Edit multiple files in Vim. Can get a little crazy but, hey, why not?

Edit multiple files in Vim. Can get a little crazy but, hey, why not?

So, to do some of the crazier things (e.g. what I’ve shown in this blog post), try this:

# Open a file of choice

# First split to two screens; change between screens by Ctrl + ww

# Now open a second file

# Repeat for more screens & lines.

Happy vim-ing!

Making Protein-Protein Interfaces Look (decently) Good

This is a little PyMOL script that I’ve used to draw antibody-antigen interfaces. If you’d like a commented version on what each and every line does, contact me! This is a slight modification of what has been done in PyMOL Wiki.

set_name FILENAME, complex	

set bg_rgb, [1,1,1]  	

color white 	     		

hide lines
show cartoon

select antibody, chain a
select antigen, chain b

select paratopeAtoms, antibody within 4.5 of antigen 
select epitopeAtoms, antigen within 4.5 of antibody

select paratopeRes, byres paratopeAtoms
select epitopeRes, byres epitopeAtoms

distance interactions, paratopeAtoms, epitopeAtoms, 4.5, 0

color red, interactions
hide labels, interactions

show sticks, paratopeRes
show sticks, epitopeRes

set cartoon_side_chain_helper, on

set sphere_quality, 2
set sphere_scale, 0.3
show spheres, paratopeAtoms
show spheres, epitopeAtoms
color tv_blue, paratopeAtoms
color tv_yellow, epitopeAtoms

set ray_trace_mode, 3
unset depth_cue
set specular, 0.5

Once you orient it to where you’d like it and ray it, you should get something like this.

Building an Antibody Benchmark Set

In this so-called ‘big data’ age, the quest to find the signal amidst the noise is becoming more difficult than ever. Though we have sophisticated systems that can extract and parse data incredibly efficiently, the amount of noise has equally, if not more so, expanded, thus masking the signals that we crave for. Oddly enough, it sometimes seems that we are churning and gathering a vast amount data just for the sake of it, rather than looking for highly-relevant, high-quality data.

One such example is antibody (Ab) binding data. Even though there are several Ab-specific databases (e.g. AbySis, IMGT), none of these, to our knowledge, has any information on an Ab’s binding affinity to its antigen (Ag), despite the fact that an Ab’s affinity is one of the few quantitative metrics of its performance. Therefore, gathering Ab binding data would not only help us to create more accurate models of Ab binding, it would, in the long term, facilitate the in silico maturation and design/re-design of Abs. If this seems like a dream, have a read of this paper – they made an incredibly effective Ab from computationally-inspired methods.

Given the tools at our disposal, and the fact that several protein-protein binding databases are available in the public domain, this task may seem somewhat trivial. However, there’s the ever-present issue of gathering only the highest quality data points in order to perform some of the applications mentioned earlier.

Over the past few weeks, we have gathered the binding data for 228 Ab-Ag complexes across two major protein-protein binding databases; PDB-Bind and the structure-based benchmark from Kastritis et al. Ultimately, 36 entries were removed from further analyses as they had irrelevant data (e.g. IC50 instead of KD; IC50 relates to inhibition, which is not the same as the Ab’s affinity for its Ag). Given the dataset, we performed some initial tests on existing energy functions and docking programs to see if there is any correlation between the programs’ scores and protein binding affinities.

Blue = Abs binding to proteins, Red = Abs binding to peptides

Blue = Abs binding to proteins, Red = Abs binding to peptides

As the graphs show, there is no distinctive correlation between a program/function’s score and the affinity of an Ab. Having said this, these programs were trained on general protein-protein interfaces (though that does occasionally include Abs!) and we thus trained DCOMPLEX and RAPDF specifically for Ab structures (~130 structures). The end results were poor nonetheless (top-centre and top-right graphs, above), but the interatomic heatmaps show clear differences in the interaction patterns between Ab-Ag interfaces and general protein-protein interfaces.

Interatomic contact map between Ab-Ag or two general proteins. Warmer colours represent higher counts.

Interatomic contact map between Ab-Ag or two general proteins. Warmer colours represent higher counts.

Now, with this new information, the search for signals continues. It is evident that Ab binding has distinctive differences with respect to protein-protein interfaces. Therefore, the next step is to gather more high-quality data and see if there is any correlation between an Ab’s distinct binding mode and its affinity. However, we are not interested in just getting whatever affinity data is available. As we have done for the past few weeks, the rigorous standards we have used for building the current benchmark set must be maintained – otherwise we risk in masking the signal with unnecessary noise.

Currently, the results are disappointing, but if the past few weeks in OPIG has taught me anything, this is only the beginning of a long and difficult search for a good model. BUT – this is what makes research so exciting! We learn from the low Pearson correlation coefficients, the (almost) random distribution of data, and the not-so-pretty plots of our data in order to form useful models for practical applications like Ab design. I think a quote from The Great Gatsby accurately ‘models’ my optimism for making sense of the incoming stream of data:

Gatsby believed in the green light, the orgastic future that year by year recedes before us. It eluded us then, but that’s no matter — to-morrow we will run faster, stretch out our arms farther. . . . And one fine morning ——

So we beat on, boats against the current, borne back ceaselessly into the past.