Identifying shared antibodies using deep learning

Antibody convergence is the presence of similar antibodies in different individuals – suggesting that the individuals have had exposure to a common antigen, which has stimulated the production of similar, antigen-specific antibodies. We want to be able to identify these shared antibodies, sometimes referred to as ‘public clones’, as it could lead to development of immunodiagnostic tests against the shared antibodies, and potentially assist in the design of vaccines and therapeutic antibodies. A recent paper on bioRxiv by Sai Reddy’s group[i] has applied deep learning techniques – variational autoencoders (VAE) and support vector machines (SVM) – to the problem of how to identify shared antibodies.

Typical methods for identifying these antibodies include clustering together highly similar antibody sequences found in multiple individuals, often using a high percentage sequence identity score as a similarity threshold for clustering. However, this means the methods may fail to identify similar antibodies which bind to the same antigen, but are more variational in sequence. By using a VAE, Friedensohn et al. were able to cluster antibody sequences based on features learned by the VAE, for example, sequence motifs or lengths. VAEs learn clustering thresholds, resulting in clusters with varying degrees of similarity (unlike with standard antibody sequence-similarity methods with fixed cluster thresholds). The researchers demonstrated they could cluster antibody sequences from mice immunised with different antigens, into antigen-specific clusters. By using the VAE to transform antibody repertoires into cluster-based vectors, these vectors could then be used by an SVM to make a classification of antigen exposure based on the antibody repertoire. Here, the VAE performed better than a traditional public clone identification method at predicting antigenic exposure, with 80% accuracy vs 42%.

Another advantage with using a VAE is that sequence space within clusters can be sampled – meaning new antibody sequences can be generated that would fit into a chosen cluster. The authors generated 5005 novel variants that would be grouped into a respiratory syncytial virus fusion antigen specific cluster, which has not been included in the original biological training set. Of the 5005, 99 were assessed, 74% of which bound to the antigen – a pretty good prediction rate.

In summary, this is an interesting application of deep learning as a sequence-based method for clustering antibodies – and led to me finally learning how VAEs work. The success in generating some novel antigen-specific antibody sequences is particularly exciting, and I’ll be following to see if they release any more information on their predicted antibody sequences.


[i] Friedensohn, S., Neumeier, D., Khan, T. A., Csepregi, L., Parola, C., de Vries, A. R. G., … & Reddy, S. T. (2020). Convergent selection in antibody repertoires is revealed by deep learning. bioRxiv.

Author