Antibody Modelling: CDR-H3 Structure Prediction

As regular readers of this blog will know (I know you’re out there somewhere!), one of the main focusses of OPIG at the moment is antibody structure. For the last ten weeks (as one of my short projects for the Systems Approaches to Biomedical Science program of the DTC) I have been working on predicting the structure of the CDR-H3 loop.

CDRs_rainbow_labelled

So, a quick reminder on antibody structure: antibodies, which have a characteristic shape reminiscent of the letter `Y’, consist of two identical halves, each containing a heavy and a light chain. Heavy chains are made up of four domains (three constant domains, CH1, CH2 and CH3; and one variable domain, VH), while light chains have two (one constant domain, CL; and one variable domain, VL). The variable domains of both the heavy and light chain together are known as the Fv region; most naturally occurring antibodies have two. At the ends of these Fv regions are six loops, known as the complementarity determining regions, or CDRs. There are three CDRs on each of the VH and VL domains; those located on the VL domain are labelled L1, L2 and L3, while those found on the VH domain are labelled H1, H2 and H3. It is these loops that form the most variable parts of the whole antibody structure, and so it is these CDRs that govern the binding properties of the antibody. Of the six CDRs, by far the most variable is the H3 loop, found in the centre of the antigen binding site. A huge range of H3 lengths have been observed, commonly between 3 and 25 residues but occasionally much longer. This creates a much larger structural diversity when compared to the other CDRs, each of which has at most 8 different lengths. It is the H3 loop that is thought to contribute the most to antigen binding properties. Being able to model this loop is therefore an important part of creating an accurate model, suitable for use in therapeutic antibody design.

Predicting the structure of the loop requires three steps: sampling, filtering and ranking. There are two types of loop modelling method, which differ in the way they perform the sampling step: knowledge-based methods, and ab initio methods. Knowledge-based methods, or database methods, rely upon databases of known loop structures that can be searched in order to find fragments that would form feasible structures when placed in the gap. Whilst predictions are made relatively quickly in this way, one disadvantage is that the database of fragments may not contain anything suitable, and in this situation no prediction would be made. Ab initio (or conformational searching) methods, on the other hand, do not rely upon a set of previously known loop structures – loop conformations are generated computationally, normally by sampling dihedral angles from distributions specific to each amino acid. The loops generated in this way, however, are not ‘closed’, i.e. the loop does not attach to both anchor regions, and therefore some sort of loop closure method must be implemented. The assumption is made that the native loop structure should represent the global minimum of the protein’s free energy. Ab initio methods are generally much slower than knowledge-based ones, and their accuracy is dependent on loop length (long loops are harder to predict using this method), however unlike the database methods, an answer will always be produced.

3juyE

For my project, I have examined the performance of FREAD (a knowledge-based method) and MECHANO (an ab initio method) when predicting the structure of the H3 loop. At the moment, FREAD produces better results than MECHANO, however we hope to improve the predictions made by both. By optimising the performance of both methods, we hope to create a ‘hybrid’ loop modelling method, thereby exploiting the advantages of both approaches. Since I’ve decided that this is the project I want to continue with, this will be the aim of my DPhil!

Author