Author Archives: Aleksandr Kovaltsuk

Journal Club post: Interface between Ig-seq and antibody modelling

Hi everyone! In this blog post, I would like to review a couple of relatively recent papers about antibody modelling and immunoglobulin gene repertoire NGS, also known as Ig-seq. Previously I used to work as a phage display scientist and I initially struggled to understand all new terminology about computational modelling when I joined Charlotte’s group last January. Hence, the paramount aim of my blog post is to decipher commonly used jargon in the computational world into less complicated text.

The three-dimensional structure of an antibody dictates its function. Antibody sequences obtained from Ig-seq cannot be directly translated into antibody folding, aggregation and function. Several ways exist to interrogate antibody structure, including X-ray crystallography and NMR spectroscopy, expression, and computational modelling. These methods vary in throughput as well as precision. Here, I will concentrate my attention on computational modelling. First of all, the most commonly confused term is a decoy. In antibody structure prediction, a decoy is a modelled antibody structure that can be ranked and selected by a tool as the closest to the native antibody structure. A number of antibody modelling tools exist, each employing a different methodology and a number of generated decoys. Good reviews on the antibody structure prediction are here (1,2). I will try to draw a very gross summary about how all these unique modelling tools work. To do so, I assume that people are familiar with antibody sequence/structure relationship – if not please check (3). Antibody framework region are sequence invariant, hence their structure can be deduced from sequence identity with high confidence. PDB (4) act as the source of structures for antibody modelling. Canonical CDRs (all CDRs except for CDR-H3) can be put into a limited number of structures. Several definitions of canonical classes exist (5,6), but, in essence, the canonical CDR must contain residues that define a particular class. Next, antibody orientation is calculated or copied from PDB. CDR-H3 modelling is very challenging and different approaches have been devised (7–9). The structure space of CDR-H3 is very vast (10) and hence, this loop cannot be put into a canonical class. Once CDR-H3 is modelled, the resultant decoy is checked for clashes (like impossible orientation of side chains).

Here, I would like to mention several examples on how antibody modelling can help to accelerate drug discovery. Dekosky et al. (11) mapped two Ig-seq datasets to antibody structures to interrogate how an antibody paratope changes in response to antigenic stimulation. The knowledge of paired full length VH-VL is crucial for the best antibody structure prediction. In this study they employed paired chain Ig-seq (12). However, this technique cannot sequence full length VH/VL, hence the V gene sequence had to be approximated. Computational paratope identification was employed to examine paratope convergences. There were several drawbacks of this paper: only 2,000 models (~1% of Ig-seq data) were modelled in 570,000 CPU time, and antibody sequences with longer than 16 aa long CDR-H3 were not included into analysis. The generation of a reliable configuration of long CDR-H3 is considered a hard task at this moment. Recently, Laffy et al. (13) investigated antibody promiscuity by mapping sequence to structure and validating the results with ELISA. The cohort of 10 antibodies, all with long CDR-H3 >= 15 aa were interrogated. They used a homology modelling tool to devise CDR-H3 structures. However, the availability of the appropriate structural template can be questioned, since CDR-H3 loops deposited in the PDB are predominantly shorter due to crystallographic constraints. As mentioned before, the paired VH/VL data is crucial for structure determination. Here, they used Dekosky et al. (11) data to devise the pairing. The approach can be streamlined once more paired data become available.

In conclusion, antibody modelling enables researchers to circumvent the cost and time associated with experimental approaches of antibody characterizations. The field of antibody modelling still needs improvements for faster and better structure prediction to achieve tasks such as modelling the entirety of Ig-seq data or long CDR-H3 loops. Currently, the fastest tool of antibody modelling is ABodyBuilder (8). It generates a model in 30 sec and its version is available online ( The availability of more structural information as well as algorithm improvements will facilitate more confident antibody modelling.


  1. Kuroda D, Shirai H, Jacobson MP, Nakamura H. Computer-aided antibody design. Protein Eng Des Sel (2012) 25:507–521. doi:10.1093/protein/gzs024
  2. Krawczyk K, Dunbar J, Deane CM. “Computational Tools for Aiding Rational Antibody Design,” in Methods in molecular biology (Clifton, N.J.), 399–416. doi:10.1007/978-1-4939-6637-0_21
  3. Georgiou G, Ippolito GC, Beausang J, Busse CE, Wardemann H, Quake SR. The promise and challenge of high-throughput sequencing of the antibody repertoire. Nat Biotech (2014) 32:158–168. doi:10.1038/nbt.2782
  4. Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): Ensuring a single, uniform archive of PDB data. Nucleic Acids Res (2007) 35: doi:10.1093/nar/gkl971
  5. Nowak J, Baker T, Georges G, Kelm S, Klostermann S, Shi J, Sridharan S, Deane CM. Length-independent structural similarities enrich the antibody CDR canonical class model. MAbs (2016) 8:751–760. doi:10.1080/19420862.2016.1158370
  6. North B, Lehmann A, Dunbrack RL. A new clustering of antibody CDR loop conformations. J Mol Biol (2011) 406:228–256. doi:10.1016/j.jmb.2010.10.030
  7. Weitzner BD, Jeliazkov JR, Lyskov S, Marze N, Kuroda D, Frick R, Adolf-Bryfogle J, Biswas N, Dunbrack RL, Gray JJ. Modeling and docking of antibody structures with Rosetta. Nat Protoc (2017) 12:401–416. doi:10.1038/nprot.2016.180
  8. Leem J, Dunbar J, Georges G, Shi J, Deane CM. ABodyBuilder: Automated antibody structure prediction with data–driven accuracy estimation. MAbs (2016) 8:1259–1268. doi:10.1080/19420862.2016.1205773
  9. Marks C, Nowak J, Klostermann S, Georges G, Dunbar J, Shi J, Kelm S, Deane CM. Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction. Bioinformatics (2017) 33:1346–1353. doi:10.1093/bioinformatics/btw823
  10. Regep C, Georges G, Shi J, Popovic B, Deane CM. The H3 loop of antibodies shows unique structural characteristics. Proteins Struct Funct Bioinforma (2017) 85:1311–1318. doi:10.1002/prot.25291
  11. DeKosky BJ, Lungu OI, Park D, Johnson EL, Charab W, Chrysostomou C, Kuroda D, Ellington AD, Ippolito GC, Gray JJ, et al. Large-scale sequence and structural comparisons of human naive and antigen-experienced antibody repertoires. Proc Natl Acad Sci U S A (2016)1525510113-. doi:10.1073/pnas.1525510113
  12. Dekosky BJ, Kojima T, Rodin A, Charab W, Ippolito GC, Ellington AD, Georgiou G. In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire. Nat Med (2014) 21:1–8. doi:10.1038/nm.3743
  13. Laffy JMJ, Dodev T, Macpherson JA, Townsend C, Lu HC, Dunn-Walters D, Fraternali F. Promiscuous antibodies characterised by their physico-chemical properties: From sequence to structure and back. Prog Biophys Mol Biol (2016) doi:10.1016/j.pbiomolbio.2016.09.002

Using Antibody Next Generation Sequencing data to aid antibody engineering

       I consider myself a wet lab scientist and I had not done any dynamic programming language like Python before starting my DPhil. My main interests lie in development of improved antibody humanization campaigns, rational antibody phage display library constructions and antibody evolution. Having completed industrial placement at MedImmune, I saw the biotechnology industry from the inside and realized that scientists who could bridge computer science and wet lab fields are in high demand.

      The title of my DPhil is very broad, and research itself is data rather than hypothesis driven. Our research group collaborates with UCB Pharma, which has sequenced whole antibody repertoires across a number of species. Datasets might contain more than 10 million sequences of heavy and light variable chains. But even these datasets do not cover more than 1% of the theoretical repertoire, hence looking at entropies of sequences rather than mere sequences could provide insights into differences between intra- and inter- species datasets.

        NGS of antibody repertoires provides snapshots of repertoire diversity, entropy as well as sequences. Reddy, S.T. et al 2010 showed that this information could be successfully used to pull target specific variable chains. But most of research groups believe that main application of NGS is immunodiagnostics (Grieff et al., 2015).

       My project involves applying software developed by our research group namely, Anarci (Dunbar J and Deane CM., 2016) and ABodyBuilder (Leem J. et al 2016). Combination of both softwares allows analysis of NGS datasets at an unprecedented rate (1 million sequences per 7 hours). A number of manipulations can be performed on datasets to standardize them and make data reproducible, which is a big issue in science. It is possible to re-assign germlines, numbering schemes and complementary determining region (CDR) definitions of a 10 million dataset in less than a day. For instance, UCB provided data required our variable chains to be re-numbered according to IMGT numbering and CDR definition (Lefranc M., 2011). The reason for the IMGT numbering scheme selection is that it supports symmetrical amino acid numbering of CDRs, which allows for improved assignment of positions to amino acids that are located in the same structural space between different length CDRs (Figure 1).

                Figure 1. IMGT numbering and CDR definition of CDR3. Symmetrical assignment of positions to amino acids in HCDR3 allows for better localization of V,D,J genes: V gene encodes for the amino terminus, J gene encodes the carboxyl terminus of CDR3, and D gene the mid portion.

       To sum up, analysis of CDR lengths, CDR and framework amino acid compositions, finding novel patterns in antibody repertoires will open up new rational steps of antibody humanization and affinity maturation. The key step will be to determine amino acid scaffolds that define humanness of antibody or in other words, scaffolds that are not immunogenic in humans.


  1. Dunbar J., and Deane CM., ANARCI: Antigen receptor numbering and receptor classification. Bioinformatics (2016)
  2. Grieff V., A bioinformatic framework for immune repertoire diversity profiling enables detection of immunological status. Genome Medicine (2015)
  3. Leem J., et al. ABodyBuilder: automated antibody structure prediction with data-driven accuracy estimation. mAbs. (2016)
  4. Lefranc M., IMGT, the International ImMunoGeneTics Information System. Cold Spring Harb Protoc. (2011)
  5. Reddy ST., et al. Monoclonal antibodies isolated without screening by analyzing the variable-gene repertoire of plasma cells. Nat Biotech. (2010)