Author Archives: Aleksandr Kovaltsuk

Adding paired BCR data to OAS

Hello,

Today is the day for my final blog post before I enter a thesis writing mode. Using this given opportunity, I would like to present to you our recent update to the Observed Antibody Space (OAS) resource where we included paired antibody data (http://opig.stats.ox.ac.uk/webapps/oas).

Continue reading →

Observed Antibody Space + miAIRR

Today is the day for another (potentially penultimate) blog post from me. Using this opportunity, I would like to introduce to you our recent update to the Observed Antibody Space (OAS) resource.

Continue reading →

Parallelising antigen-specific B-cell isolation with LIBRA-seq

Today is the day when I write a blog about an exciting research paper in the field of B-cell receptor (BCR) repertoires analysis. At OPIG, we (antibody people) are working hard to model and characterise antibody 3D configuration from its sequence. Significant progress has been made in modelling software development, so that we can predict antibody structures with high confidence. This task becomes considerably harder when we model the entirety of BCR repertoire sequences. Current methods of BCR repertoire sequencing operate primarily on the heavy chain only. This limits our capacity to generate refined 3D antibody models to just approximation of shapes of complementarity determining regions(CDRs).

Continue reading →

Exciting new studies in OAS

Hi everyone!

Today is the day for another blog post from me. Here, I would like to give you an update on new studies, which were deposited in the Observed Antibody Space (OAS) resource and take a closer look at one of these studies. To date, we have curated 57 studies in OAS, where we provide raw nucleotide and numbered amino acid sequences for download. These amino acid sequences have been filtered using ANARCI parsing, which ensures that the sequences align to respective species HMM profiles and do not have unusual indels and frameshifts. More than 660 million numbered amino acid sequences are deposited in OAS, where every sequence keeps a link to its corresponding nucleotide sequence. Recently we added two more studies to OAS: Sheng et al., (2017) and Setliff et al., (2018). We numbered roughly 2.8 and 46 million sequences in Sheng et al., and Setliff et al., studies respectively. In this blog post, I would like to talk more about the uniqueness of Setliff et al., data.

Continue reading →

OPunting 2018

Hi everyone!

Today is the day to present to you my belated blogpost on OPIG punting (or OPunting for short). I promise I was not procrastinating on writing it. I am currently not in Oxford, as I am visiting beautiful Zurich as proved by the photo below. Continue reading →

New avenues in antibody engineering

Hi everyone,

In this blog post I would like to review an unusual antibody scaffold that can potentially give rise to a new avenue in antibody engineering. Here, I will discuss a couple of papers that complement each others research.

My DPhil is centered on antibody NGS (Ig-seq) data analysis. I always map an antibody sequence to its structure as the three-dimensional antibody configuration dictates its function, the piece of information that cannot be obtained from just the nucleotide or amino acid sequence. When I work with human Ig-seq data, I bear in mind that antibodies are composed of two pairs of light and heavy chains that tune the antibody towards its cognate antigen. In the light of recent research discoveries, Tan et al., found that antibody repertoires of people that live in malaria endemic regions have adopted a unusual property to defend the body from the pathogen (1). Several studies followed up on this discovery to further dissect the yet uncharacterized property of antibodies.

Malaria parasites in the erythrocytic stage produce RIFIN proteins that are displayed on the surface of the erythrocytes. The main function of RIFINs is to bind to the LAIR1 receptors that are found on the surface on the immune cells. The LAIR1 receptor is inhibitory, which leads to inhibition of the immune system. The endogenous ligand of the LAIR1 receptor is collagen, which is found on the surface of body cells. This is to make sure that the immune cells will not be activated against its own body. Activating the LAIR1 receptors is one of the escape mechanisms that the malaria parasite has evolved.

Tan et al., (1) showed that in an evolutionary arms race between human and malaria, our immune system has harnessed the property of RIFINs to bind to LAIR1 against the parasite itself. By doing single B cell isolation and sequencing, it was discovered that antibodies, which are the effector molecules of our immune system, can incorporate the LAIR1 protein in its structure. Taking into account our knowledge of antibody engineering, the idea of incorporating a 100 amino acid long protein into antibody structure is very hard to comprehend. Sequences of these antibodies showed that the LAIR1 insertion was introduced to CDR-H3. Recently, the crystal structure of this construct has become available (2). The crystal structure revealed that the LAIR1 insertion indeed is structurally functional. All 5 of antibody canonical CDRs interact with the LAIR1 protein and its linkers to accommodate the insertion. The CDR-L3 forms two disulfide bonds with the liker to orientate the LAIR1 protein in the way, it will interact with RIFINs. It is worth to stress that LAIR1 sequence differs from the wild type, but the structure is very similar (<0.5 RMSD). The change in sequence and structure is crucial to prevent the LAIR1 containing antibody from interacting with collagen, but only with RIFINs.

Pieper et al., (3) tried to interrogate the modality of LAIR1 insertions into antibody structures. It was performed by single cell sequences as well as NGS of the antibody shift region. It turns out that human antibodies can accommodate two types of insertion modalities and can form camelid-like antibodies. The insertion of LAIR1 can happen to CDR-H3, leading to the loss of antibody binding to its cognate antigen. Another modality is the incorporation of the LAIR1 protein to the shift region of the antibody. This kind of insertion does not interfere with the Fv domain binding properties, which leads to creating of bi-specific antibodies. The last finding was the insertion of the LAIR1 into antibody structure where D, J and most of V genes, and the light chain were deleted. The resultant scaffold is structurally viable and only possesses the heavy chain. Hence, it is the evidence that human antibodies can also form camelid-like antibodies. Interestingly, these insertions into the shift region are not exclusive to people that live in malaria endemic regions. By doing NGS of the shift domain from European donors, around 1 in 1000 antibody sequences had an insertion of varying lengths. These insertions are introduced from different chromosomes of both intergenic and genic regions.

To sum up, it is very intriguing that our immune system has evolved to create camelid-like and bi-specific antibodies. It will be very informative to try to crystallize these structures to see how these antibodies accommodate the insertion of LAIR1. Current antibody NGS data analysis primarily concentrates on the heavy chain due to sequencing technology limitations. It will be invaluable information if we could sequence the entire heavy chain as well as adjacent shift region to see how our immune system matures and activates against pathogens.

Tan J, Pieper K, Piccoli L, Abdi A, Foglierini M, Geiger R, Maria Tully C, Jarrossay D, Maina Ndungu F, Wambua J, et al. A LAIR1 insertion generates broadly reactive antibodies against malaria variant antigens. Nature (2016) 529:105–109. doi:10.1038/nature16450
Hsieh FL, Higgins MK. The structure of a LAIR1-containing human antibody reveals a novel mechanism of antigen recognition. Elife (2017) 6: doi:10.7554/eLife.27311
Pieper K, Tan J, Piccoli L, Foglierini M, Barbieri S, Chen Y, Silacci-Fregni C, Wolf T, Jarrossay D, Anderle M, et al. Public antibodies to malaria antigens generated by two LAIR1 insertion modalities. Nature (2017) 548:597–601. doi:10.1038/nature23670

Journal Club post: Interface between Ig-seq and antibody modelling

Hi everyone! In this blog post, I would like to review a couple of relatively recent papers about antibody modelling and immunoglobulin gene repertoire NGS, also known as Ig-seq. Previously I used to work as a phage display scientist and I initially struggled to understand all new terminology about computational modelling when I joined Charlotte’s group last January. Hence, the paramount aim of my blog post is to decipher commonly used jargon in the computational world into less complicated text.

The three-dimensional structure of an antibody dictates its function. Antibody sequences obtained from Ig-seq cannot be directly translated into antibody folding, aggregation and function. Several ways exist to interrogate antibody structure, including X-ray crystallography and NMR spectroscopy, expression, and computational modelling. These methods vary in throughput as well as precision. Here, I will concentrate my attention on computational modelling. First of all, the most commonly confused term is a decoy. In antibody structure prediction, a decoy is a modelled antibody structure that can be ranked and selected by a tool as the closest to the native antibody structure. A number of antibody modelling tools exist, each employing a different methodology and a number of generated decoys. Good reviews on the antibody structure prediction are here (1,2). I will try to draw a very gross summary about how all these unique modelling tools work. To do so, I assume that people are familiar with antibody sequence/structure relationship – if not please check (3). Antibody framework region are sequence invariant, hence their structure can be deduced from sequence identity with high confidence. PDB (4) act as the source of structures for antibody modelling. Canonical CDRs (all CDRs except for CDR-H3) can be put into a limited number of structures. Several definitions of canonical classes exist (5,6), but, in essence, the canonical CDR must contain residues that define a particular class. Next, antibody orientation is calculated or copied from PDB. CDR-H3 modelling is very challenging and different approaches have been devised (7–9). The structure space of CDR-H3 is very vast (10) and hence, this loop cannot be put into a canonical class. Once CDR-H3 is modelled, the resultant decoy is checked for clashes (like impossible orientation of side chains).

Here, I would like to mention several examples on how antibody modelling can help to accelerate drug discovery. Dekosky et al. (11) mapped two Ig-seq datasets to antibody structures to interrogate how an antibody paratope changes in response to antigenic stimulation. The knowledge of paired full length VH-VL is crucial for the best antibody structure prediction. In this study they employed paired chain Ig-seq (12). However, this technique cannot sequence full length VH/VL, hence the V gene sequence had to be approximated. Computational paratope identification was employed to examine paratope convergences. There were several drawbacks of this paper: only 2,000 models (~1% of Ig-seq data) were modelled in 570,000 CPU time, and antibody sequences with longer than 16 aa long CDR-H3 were not included into analysis. The generation of a reliable configuration of long CDR-H3 is considered a hard task at this moment. Recently, Laffy et al. (13) investigated antibody promiscuity by mapping sequence to structure and validating the results with ELISA. The cohort of 10 antibodies, all with long CDR-H3 >= 15 aa were interrogated. They used a homology modelling tool to devise CDR-H3 structures. However, the availability of the appropriate structural template can be questioned, since CDR-H3 loops deposited in the PDB are predominantly shorter due to crystallographic constraints. As mentioned before, the paired VH/VL data is crucial for structure determination. Here, they used Dekosky et al. (11) data to devise the pairing. The approach can be streamlined once more paired data become available.

In conclusion, antibody modelling enables researchers to circumvent the cost and time associated with experimental approaches of antibody characterizations. The field of antibody modelling still needs improvements for faster and better structure prediction to achieve tasks such as modelling the entirety of Ig-seq data or long CDR-H3 loops. Currently, the fastest tool of antibody modelling is ABodyBuilder (8). It generates a model in 30 sec and its version is available online (http://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/Modelling.php). The availability of more structural information as well as algorithm improvements will facilitate more confident antibody modelling.

Kuroda D, Shirai H, Jacobson MP, Nakamura H. Computer-aided antibody design. Protein Eng Des Sel (2012) 25:507–521. doi:10.1093/protein/gzs024
Krawczyk K, Dunbar J, Deane CM. “Computational Tools for Aiding Rational Antibody Design,” in Methods in molecular biology (Clifton, N.J.), 399–416. doi:10.1007/978-1-4939-6637-0_21
Georgiou G, Ippolito GC, Beausang J, Busse CE, Wardemann H, Quake SR. The promise and challenge of high-throughput sequencing of the antibody repertoire. Nat Biotech (2014) 32:158–168. doi:10.1038/nbt.2782
Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): Ensuring a single, uniform archive of PDB data. Nucleic Acids Res (2007) 35: doi:10.1093/nar/gkl971
Nowak J, Baker T, Georges G, Kelm S, Klostermann S, Shi J, Sridharan S, Deane CM. Length-independent structural similarities enrich the antibody CDR canonical class model. MAbs (2016) 8:751–760. doi:10.1080/19420862.2016.1158370
North B, Lehmann A, Dunbrack RL. A new clustering of antibody CDR loop conformations. J Mol Biol (2011) 406:228–256. doi:10.1016/j.jmb.2010.10.030
Weitzner BD, Jeliazkov JR, Lyskov S, Marze N, Kuroda D, Frick R, Adolf-Bryfogle J, Biswas N, Dunbrack RL, Gray JJ. Modeling and docking of antibody structures with Rosetta. Nat Protoc (2017) 12:401–416. doi:10.1038/nprot.2016.180
Leem J, Dunbar J, Georges G, Shi J, Deane CM. ABodyBuilder: Automated antibody structure prediction with data–driven accuracy estimation. MAbs (2016) 8:1259–1268. doi:10.1080/19420862.2016.1205773
Marks C, Nowak J, Klostermann S, Georges G, Dunbar J, Shi J, Kelm S, Deane CM. Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction. Bioinformatics (2017) 33:1346–1353. doi:10.1093/bioinformatics/btw823
Regep C, Georges G, Shi J, Popovic B, Deane CM. The H3 loop of antibodies shows unique structural characteristics. Proteins Struct Funct Bioinforma (2017) 85:1311–1318. doi:10.1002/prot.25291
DeKosky BJ, Lungu OI, Park D, Johnson EL, Charab W, Chrysostomou C, Kuroda D, Ellington AD, Ippolito GC, Gray JJ, et al. Large-scale sequence and structural comparisons of human naive and antigen-experienced antibody repertoires. Proc Natl Acad Sci U S A (2016)1525510113-. doi:10.1073/pnas.1525510113
Dekosky BJ, Kojima T, Rodin A, Charab W, Ippolito GC, Ellington AD, Georgiou G. In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire. Nat Med (2014) 21:1–8. doi:10.1038/nm.3743
Laffy JMJ, Dodev T, Macpherson JA, Townsend C, Lu HC, Dunn-Walters D, Fraternali F. Promiscuous antibodies characterised by their physico-chemical properties: From sequence to structure and back. Prog Biophys Mol Biol (2016) doi:10.1016/j.pbiomolbio.2016.09.002

Using Antibody Next Generation Sequencing data to aid antibody engineering

I consider myself a wet lab scientist and I had not done any dynamic programming language like Python before starting my DPhil. My main interests lie in development of improved antibody humanization campaigns, rational antibody phage display library constructions and antibody evolution. Having completed industrial placement at MedImmune, I saw the biotechnology industry from the inside and realized that scientists who could bridge computer science and wet lab fields are in high demand.

The title of my DPhil is very broad, and research itself is data rather than hypothesis driven. Our research group collaborates with UCB Pharma, which has sequenced whole antibody repertoires across a number of species. Datasets might contain more than 10 million sequences of heavy and light variable chains. But even these datasets do not cover more than 1% of the theoretical repertoire, hence looking at entropies of sequences rather than mere sequences could provide insights into differences between intra- and inter- species datasets.

NGS of antibody repertoires provides snapshots of repertoire diversity, entropy as well as sequences. Reddy, S.T. et al 2010 showed that this information could be successfully used to pull target specific variable chains. But most of research groups believe that main application of NGS is immunodiagnostics (Grieff et al., 2015).

My project involves applying software developed by our research group namely, Anarci (Dunbar J and Deane CM., 2016) and ABodyBuilder (Leem J. et al 2016). Combination of both softwares allows analysis of NGS datasets at an unprecedented rate (1 million sequences per 7 hours). A number of manipulations can be performed on datasets to standardize them and make data reproducible, which is a big issue in science. It is possible to re-assign germlines, numbering schemes and complementary determining region (CDR) definitions of a 10 million dataset in less than a day. For instance, UCB provided data required our variable chains to be re-numbered according to IMGT numbering and CDR definition (Lefranc M., 2011). The reason for the IMGT numbering scheme selection is that it supports symmetrical amino acid numbering of CDRs, which allows for improved assignment of positions to amino acids that are located in the same structural space between different length CDRs (Figure 1).

Figure 1. IMGT numbering and CDR definition of CDR3. Symmetrical assignment of positions to amino acids in HCDR3 allows for better localization of V,D,J genes: V gene encodes for the amino terminus, J gene encodes the carboxyl terminus of CDR3, and D gene the mid portion.

To sum up, analysis of CDR lengths, CDR and framework amino acid compositions, finding novel patterns in antibody repertoires will open up new rational steps of antibody humanization and affinity maturation. The key step will be to determine amino acid scaffolds that define humanness of antibody or in other words, scaffolds that are not immunogenic in humans.

References:

Dunbar J., and Deane CM., ANARCI: Antigen receptor numbering and receptor classification. Bioinformatics (2016)
Grieff V., A bioinformatic framework for immune repertoire diversity profiling enables detection of immunological status. Genome Medicine (2015)
Leem J., et al. ABodyBuilder: automated antibody structure prediction with data-driven accuracy estimation. mAbs. (2016)
Lefranc M., IMGT, the International ImMunoGeneTics Information System. Cold Spring Harb Protoc. (2011)
Reddy ST., et al. Monoclonal antibodies isolated without screening by analyzing the variable-gene repertoire of plasma cells. Nat Biotech. (2010)

Oxford Protein Informatics Group

or "OPIG" to friends

Author Archives: Aleksandr Kovaltsuk

Adding paired BCR data to OAS

Observed Antibody Space + miAIRR

Parallelising antigen-specific B-cell isolation with LIBRA-seq

Exciting new studies in OAS

OPunting 2018

New avenues in antibody engineering

Journal Club post: Interface between Ig-seq and antibody modelling

Using Antibody Next Generation Sequencing data to aid antibody engineering