AIRR community meeting | Oxford Protein Informatics Group

Hi everyone,

Today is the day for another blog post from me. Last month I attended an AIRR conference in Genoa, Italy (https://www.antibodysociety.org/airrc/meetings/communityiv/). It was the fourth AIRR conference, and I was nice to see lots of field-leading people participating. Compared to the last AIRR meeting almost 2 years ago, the agenda of the conference was dominated by machine learning and big data topics. In my short blog post, I will discuss two talks that covered these two exciting topics.

Recent advances in leukapheresis enabled researchers to collect all B-cells that are found in peripheral blood in human body. However, the majority of individual’s B-cells resides in lymphoid organs, which are outside of leukapheresis reach. Briney et al., (1) employed next-generation sequencing technology (NGS) to interrogate B-cell receptor (BCR) sequences across ten individuals of approximately same age, but from different gender and ethnicity groups. By employing leukopaks as a source of peripheral B-cells, they generated 3 billion sequences, of which 300 million sequences were high quality. They showed that despite the vastness of allowed antibody space, non-trivial sequence convergences were observed. They did not find any statistically significant differences in BCR repertoire overlaps between different ethnicities and genders. Authors mentioned that they were performing another round of leukopak blood analysis on the same cohort of donors after one year to interrogate dynamics of the human immune system. It will be interesting to see the longitudinal sequence overlap analysis within the same B-cell donor. The authors also discussed ways to analyse AIRR data and to present annotated sequences upon publication.

Another exciting talk was given by Lindsay G Cowell from UT Southwestern (2). In their work, they have developed a classifier to identify tumor samples based on collections of T-cell receptors (TCR). Instead of simply employing TCR sequences as features in their classifier, they mapped TCR sequences to amino acid biophysical properties. Every TCR sequence in TCR repertoires was split into 4-mer snippets. Five Atchley Factors were applied to every amino acid in the snippet and the overall score was devised. Atchley Factors include amino acid hydrophobicity, secondary structure associations, size, codon usage and electric charge. These scores were used in their machine learning model to identify tumor and healthy samples. Using this model, they successfully identified all colorectal tumor samples. With the ever-increasing amount of available BCR data from various disease states, it will exciting to see if their model works with BCR sequences to distinguish disease states. Since somatic hypermutation (SHM) changes BCR sequences, it will be exciting to see structural paratope convergence in disease states using Atchley Factors.

Thanks!

Best,

Alex

Having been a member of OPIG for an entire 5 weeks, I thought it was about time I attended a conference abroad. Becoming the self-nominated chief Foccaccia Taster at the AIRR meeting was a tough role, but one I believe I fulfilled with tenacity and boundless enthusiasm. A special shout out goes to Mario and Alessandro for the neverending cups of coffee, and for beers being provided every lunchtime. Leaving aside the fantastic culinary experiences for a moment, the introductory workshops held on Day 1 were a really great overview for someone relatively new to the field. I attended Victor Greiff’s “AIRR-seq data analysis and processing”, a 3 hour crash course in AIRR-seq data acquisition, error-correction, and useful types of analysis. Following this, the talks and poster sessions held over subsequent days were of a very high quality. My favourite poster was presented by Michael Poeschla from the Max Planck Institute; the work focussed on characterising antibody repertoire aging in the killifish, a short-lived vertebrate model that interestingly, upon eating bacteria taken from the gut of younger killifish, lives longer than it’s usual lifespan.

– Sarah

We had a great time at the AIRR community meeting in Genoa in early May, where we joined some 150 researchers from across North America and Europe (which is appreciably larger than the 50-strong turn-out seen at previous meetings). We had a range of short talks in both basic and translational science, and concerning both sides of the great lymphocyte divide. The theme of the meeting was “Bridging the Gaps” and there were a number of on-theme talks on the challenges related to storing and analysing data sets in excess of hundreds of millions of sequences.

We enjoyed the keynote sessions, the first of which was given by Prof. Sai Reddy of ETH Zurich. As Alex mentioned, machine learning techniques are becoming increasingly commonplace in repertoire analysis. While the other ML talks tackled the problem of repertoire-wide classification of immune status, Sai Reddy presented work relating to the identification of antigen-specificity at the level of individual sequences, which made for a really exciting talk.

The second keynote speaker was the distinguished immunologist Antonio Lanzavecchia MD from USI. His talk was a return to more classical immunology in the theme of the day which was Clinical and Translational Science. The audience was particularly wowed by a result which was published in Nature in 2016, the natural occurrence of an insertion of an entire 98 a.a. collagen-binding domain either right at the tip of the H3 or in the VH-CH1 elbow of antibodies. This sort of insertion is apparently more common than we might anticipate, if not largely invisible to those in Ig-seq because of sequencing read length limitations. The PDB code of the solved structure of a LAIR1 fusion antibody is 5NST if you want to have a look.

All best wishes

Eve

Author

Eve Richardson

View all posts