Author Archives: Wing Ki (Catherine) Wong

Epitope mapping with structural data for SARS-CoV-2 RBD and 10 known binders

In the past few months we have seen a lot of papers reporting antibodies that they found to bind to SARS-CoV-2 (a database can be found here: http://opig.stats.ox.ac.uk/webapps/covabdab/). Some of them were from the analysis of a patient’s immune system. Some of them come with crystal structures to show where they bind. Some don’t have structures, but they have the sequences and some competition assay data to show approximately where on the spike protein they bind. The main focus is around an area called the Receptor Binding Domain (RBD) which is where the spike protein engages the human ACE2 receptor and causes the downstream problems. In this paper, the authors ran a complete mutagenesis on the RBD of the SARS-CoV-2 spike protein. 

Continue reading

Non-specialist intro: Convalescent sera and some thoughts on its relevance to structural biology

A couple of weeks ago, I gave a group meeting talk on my current research. Interestingly most of the questions I received were not directly related to my research methods, but rather, on the broader application of antibody-related therapies, as I used the example of convalescent sera as a potential ‘quick fix’ in the current COVID-19 pandemic, to motivate why antibody research is important! So I thought in this blog post, I would give a quick introduction to convalescent sera. (Disclaimer: This does not contain any clinical information.)

Continue reading

TCRBuilder: Multi-state T-cell receptor structure prediction

Hello friends of OPIG,

From my last blopig blog post [link: https://www.blopig.com/blog/2019/10/comparative-analysis-of-the-cdr-loops-of-antigen-receptors/], I summarised our findings that TCR CDRs are more flexible than their antibody counterparts. Because of this observation, we believe that it is more appropriate to represent TCR binding sites using an ensemble of conformations.

Continue reading

Automated testing with doctest

One of the ways to make your code more robust to unexpected input is to develop with boundary cases in your mind. Test-driven code development begins with writing a set of unit tests for each class. These tests often includes normal and extreme use cases. Thanks to packages like doctest for Python, Mocha and Jasmine for Javascript etc., we can write and test codes with an easy format. In this blog post, I will present a short example of how to get started with doctest in Python. N.B. doctest is best suited for small tests with a few scripts. If you would like to run a system testing, look for some other packages!

Continue reading

Maps are useful. But first, you need to build, store and read a map.

Recently we embarked on a project that required the storage of a relatively big dictionary with 10M+ key-value pairs. Unsurprisingly, Python took over two hours to build such dictionary, taking into accounts all the time for extending, accessing and writing to the dictionary, AND it eventually crashed. So I turned to C++ for help.

In C++, map is one of the ways you can store a string-key and an integer value. Since we are concerned about the data storage and access, I compared map and unordered_map.

An unordered_map stores a hash table of the keys and the mapped value; while a map is ordered. The important consideration here includes:
  • Memory: map does not have the hash table and is therefore smaller than an unordered_map.
  • Access: accessing an unordered_map takes O(1) while accessing a map takes log(n).
I have eventually chosen to go with map, because it is more memory efficient considering the small RAM size that I have access to. However, it still takes up about 8GB of RAM per object during its runtime (and I have 1800 objects to run through, each building a different dictionary). Saving these seems to open another can of worm.
In Python, we could easily use Pickle or JSON to serialise the dictionary. In C++, it’s common to use the BOOST library. There are two archival functions in BOOST: text or binary archives. Text archives are human-readable but I don’t think I am really going to open and read 10M+ lines of key-value pairs, I opted for binary archives that are machine readable and smaller. (Read more: https://stackoverflow.com/questions/1058051/boost-serialization-performance-text-vs-binary-format .)
To further compress the memory size when I save the maps, I used zlib compression. Obviously there are ready-to-use codes from these people half a year ago, which saved me debugging:
Ultimately this gets down to 96GB summing 1800 files, all done within 6 hours.

Comparing naive and immunised antibody repertoire

Hi! This is my first post on Blopig as I joined OPIG in July 2017 for my second rotation project and DPhil.

During immune reactions to foreign molecules known as antigens, surface receptors of activated B-cells undergo somatic hypermutation to attain its high binding affinity and specificity to the target antigen. To discover how somatic hypermutation occurs to adapt the antibody from its germline conformation, we can compare the naive and antigen-experienced antibody repertoires. In this paper, the authors developed a protocol to carry out such comparison, detected, synthesised, expressed and validated the observed antibody genes against their target antigen.

What they have done:

  1. Mice immunisation: Naive (no antigens), CGG (a large protein), NP-CGG (hapten attached to a large protein).
  2. Sequencing: Total RNA was extracted from each spleen, cDNA was synthesised according to standard procedures, and amplified with the universal 5’-RACE primer (as oppose to the degenerate 5’-Vh primers) and the 3’-CH1 primer to distinguish between immunoglobulin-classes (IgG1, IgG2c and IgM). High throughput pyrosequencing was then used to recover the heavy chain sequences only.
  3. VDJ recombination analysis: V, D and J segments were assigned and the frequency of the VDJ combinations were plotted in a 3D graph.
  4. Commonality of the VDJ combination: For each VDJ combination, the “commonality” was counted from the average occurrence if n mice have the combination: if n=1, it’s the average occurrence if any 1 mouse has the combination; if n=5, the combination must be observed in all mice to generate a degree of commonality – otherwise it’s 0.
    • The effect of increasing n on commonality scores in IgG1 class: As we tighten the requirement for the commonality calculation, it becomes clear that IGHV9-3 is likely to target the CGG carrier, while IGHV1-72 is against the NP hapten.
    • IGHV9-3 can accommodate a wider range of D gene when targeting CGG alone. IGHV1-72 only uses IGHD1-1.
  5. Clustering V gene usage: Sequences were aligned to the longest sequence in the set (of VDJ combination), and the pairwise distance between sequences in the set were used to cluster the sequences using the UPGMA method.
    • A number of sequences were commonly found in different individuals. Among these sequences, one was randomly selected to proceed to the next step.
  6. Synthesis and validation of the detected antibody against the NP hapten: by comparing the antibody repertoires against the CGG and NP-CGG, the gene of the antibody against NP can be recovered. The authors in this paper chose to pair 3 different light chains to the chosen heavy chain, and assess the binding of the 3 antibodies.
    • NP-CGG bind well to both IGHV1-72 and IGHV9-3 antibodies; NP-BSA to IGHV1-72 only; and CGG to IGHV9-3 only.
    • The binding capabilities are affected by the light chain in the pair.

Key takeaway:

This work presented a metric of defining the “commonality” between individuals’ antibody repertoire and validated the identified antibody against its target antigen by combining with different light chains.