Category Archives: Uncategorized

Interesting Antibody Papers

Hints how broadly neutralizing antibodies arise (paper here). (Haynes lab here) Antibodies can be developed to bind virtually any antigen. There is a stark difference however between the ‘binding’ antibodies and ‘neutralizing’ antibodies. Binding antibodies are those that make contact with the antigen and perhaps flag it for elimination. This is in contrast to neutralizing antibodies, whose binding eliminates the biological activity of the antigen. A special class of such neutralizing antibodies are ‘broad neutralizing antibodies’. These are molecules which are capable of neutralizing multiple strains of the antigen. Such broadly neutralizing antibodies are very important in the fight against highly malleable diseases such as Influenza or HIV.

The process how such antibodies arise is still poorly understood. In the manuscript of Williams et al., they make a link between the memory and plasma B cells of broadly neutralizing antibodies and find their common ancestor. The common ancestor turned out to be auto-reactive, which might suggest that some degree of tolerance is necessary to allow for broadly neutralizing abs (‘hit a lot of targets fatally’). From a more engineering perspective, they create chimeras of the plasma and memory b cells and demonstrate that they are much more powerful in neutralizing HIV.

Ineresting data: their crystal structures are different broadly neutralizing abs co-crystallized with the same antigen (altought small…). Good set for ab-specific docking or epitope prediction — beyond the other case like that in the PDB (lysozyme)! At the time of writing the structures were still on hold in the PDB so watch this space…

Interesting Antibody Papers

Below are two somewhat recent papers that are quite relevant to those doing ab-engineering. The first one takes a look at antibodies as a collection — software which better estimates a diversity of an antibody repertoire. The second one looks at each residue in more detail — it maps the mutational landscape of an entire antibody, showing a possible modulating switch for VL-CL interface.

Estimating the diversity of an antibody repertoire. (Arnaout Lab) paper here. High Throughput Sequencing (or next generation sequencing…) of antibody repertoires allows us to get snapshots of the overall antibody population. Since the antibody population ‘diversity’ is key to their ability to find a binder to virtually any antigen, it is desirable to quantify how ‘diverse’ the sample is as a way to see how broad you need to cast the net. Firstly however, we need to know what we mean by ‘diversity’. One way of looking at it is akin to considering ‘species diversity’, studied extensively in ecology. For example, you estimate the ‘richness’ of species in a sample of 100 rabbits, 10 wolves and 20 sheep. Diversity measures such as Simpson’s index or entropy were used to calculate how biased the diversity is towards one species. Here the sample is quite biased towards rabbits, however if instead we had 10 rabbits, 10 wolves and 10 sheep, the ‘diversity’ would be quite uniform. Back to antibodies: it is desirable to know if a given species of an antibody is more represented than others or if one is very underrepresented. This might indicate healthy vs unhealthy immune system, indicate antibodies carrying out an immune response (when there is more of a type of antibody which is directing the immune response). Problem: in an arbitrary sample of antibody sequences/reads tell me how diverse they are. We should be able to do this by estimating the number of cell clones that gave rise to the antibodies (referred to as clonality). People have been doing this by grouping sequences by CDR3 similarity. For example, sequences with CDR3 identical or more than >95% identity, are treated as the same cell — which is tantamount to being the same ‘species’. However since the number of diverse B cells in a human organism is huge, HTS only provides a sample of these. Therefore some rarer clones might be underrepresented or missing altogether. To address this issue, Arnaout and Kaplinsky developed a methodology called Recon which estimates the antibody sample diversity. It is based on the expectation-maximization algorithm: given a list of species and their numbers, iterate adding parameters until they have a good agreement between the fitted distributions and the given data. They have validated this methodology firstly on the simulated data and then on the DeKosky dataset. The code is available from here subject to their license agreement.

Thorough analysis of the mutational landscape of the entire antibody. [here]. (Germaine Fuh from Affinta/Genentech/Roche). The authors aimed to see how malleable the variable antibody domains are to mutations by introducing all possible modifications at each site in an example antibody. As the subject molecule they have used high-affinity, very stable anti-VEGF antibody G6.31. They argue that this antibody is a good representative of human antibodies (commonly used genes Vh3, Vk1) and that its optimized CDRs might indicate well any beneficial distal mutations. They confirm that the positions most resistant to mutation are the core ones responsible for maintaining the structure of the molecule. Most notably here, they have identified that Kabat L83 position correlates with VL-CL packing. This position is most frequently a phenylalanine and less frequently valine or alanine. This residue is usually spatially close to isoleucine at position LC-106. They have defined two conformations of L83F — in and out:

  1. Out: -50<X1-100 interface.
  2. In: 50<X1<180

Being in either of these positions correlates with the orientation of LC-106 in the elbow region. This in turn affects how big the VL-CL interface is (large elbow angle=small  tight interface; small elbow angle=large interface). The L83 position often undergoes somatic hypermutation, as does the LC-106 with the most common mutation being valine.

CCP4 Study Weekend 2017: From Data to Structure

This year’s CCP4 study weekend focused on providing an overview of the process and pipelines available, to take crystallographic diffraction data from spot intensities right through to structure. Therefore sessions included; processing diffraction data, phasing through molecular replacement and experimental techniques, automated model building and refinement. As well as updates to CCP4 and where is crystallography going to take us in the future?

Surrounding the meeting there was also a session for Macromolecular (MX) crystallography users of Diamond Light Source (DLS), which gave an update on the beamlines, and scientific software, as well as examples of how fragment screening at DLS has been used. The VMXi (Versatile Macromolecular X-tallography in-situ) beamline is being developed to image crystals that are forming in situ crystallisation plates. This should allow for crystallography to be optimized, as crystallization conditions can be screened, and data collected on experiments as they crystallise, especially helpful in cases where crystallisation has routinely led to non-diffracting crystals. VXMm is a micro/nanofocus MX beamline, which is in development, with a focus to get crystallographic from very small crystals (~300nm to 10 micron diameters, with a bias to the smaller size), thereby allowing crystallography of targets that have previously been hard to get sufficient crystals. Other updates included how technology developed for fast solid state data collection on x-ray free electron lasers (XFEL) can be used on synchrotron beamlines.

A slightly more in-depth discussion of two tools presented that were developed for use alongside and within CCP4, which might be of interest more broadly:

ConKit: A python interface for contact prediction tools

Contact prediction for proteins, at its simplest, involves estimating which residues within a certain certain spatial proximity of each other, given the sequence of the protein, or proteins (for complexes and interfaces). Two major types of contact prediction exist:

  • Evolutionary Coupling
  • Supervised machine learning
    • Using ab initio structure prediction tools, without sequence homologues, to predict which contacts exist, but with a much lower accuracy than evolutionary coupling.


ConKit is a python interface (API) for contact prediction tools, consisting of three major modules:

  • Core: A module for constructing hierarchies, thereby storing necessary data such as sequences in a parsable format.
    • Providing common functionality through functions that for example declare a contact as a false positive.
  • Application: Python wrappers for common contact prediction and sequence alignment applications
  • I/O: I/O interface for file reading, writing and conversions.

Contact prediction can be used in the crystallographic structure determination field, during unconventional molecular replacement, using a tool such as AMPLE. Molecular replacement is a computational strategy to solve the phase problem. In the typical case, by using homologous structures to determine an estimate a model of the protein, which best fits the experimental diffraction intensities, and thus estimate the phase. AMPLE utilises ab initio modeling (using Rosetta) to generate a model for the protein, contact prediction can provide input to this ab initio modeling, thereby making it more feasible to generate an appropriate structure, from which to solve the phase problem. Contact prediction can also be used to analyse known and unknown structures, to identify potential functional sites.

For more information: Talk given at CCP4 study weekend (Felix Simkovic), ConKit documentation

ACEDRG: Generating Crystallographic Restraints for Ligands

Small molecule ligands are present in many crystallographic structures, especially in drug development campaigns. Proteins are formed (almost exclusively) from a sequence containing a selection of 20 amino acids, this means there are well known restraints (for example: bond lengths, bond angles, torsion angles and rotamer position) for model building or refinement of amino acids. As ligands can be built from a much wider selection of chemical moieties, they have not previously been restrained as well during MX refinement. Ligands found in PDB depositions can be used as models for the model building/ refinement of ligands in new structures, however there are a limited number of ligands available (~23,000). Furthermore, the resolution of the ligands is limited to the resolution of the macro-molecular structure from which they are extracted.

ACEDRG utilises the crystallorgraphy open database (COD), a library of (>300,000) small molecules usually with atomic resolution data (often at least 0.84 Angstrom), to generate a dictionary of restraints to be used in refining the ligand. To create these restraints ACEDRG utilises the RDkit chemoinformatics package, generating a detailed descriptor of each atom of the ligands in COD. The descriptor utilises properties of each atom including the element name, number of bonds, environment of nearest neighbours, third degree neighbours that are aromatic ring systems. The descriptor, is stored alongside the electron density values from the COD.  When a ACEDRG query is generated, for each atom in the ligand, the atom type is compared to those for which a COD structure is available, the nearest match is then used to generate a series of restraints for the atom.

ACEDRG can take a molecular description (SMILES, SDF MOL, SYBYL MOL2) of your ligand, and generate appropriate restraints for refinement, (atom types, bond lengths and angles, torsion angles, planes and chirality centers) as a mmCIF file. These restraints can be generated for a number of different probable conformations for the ligand, such that it can be refined in these alternate conformations, then the refinement program  can use local scoring criteria to select the ligand conformation that best fits the observed electron density. ACEDRG can accessed through the CCP4i2 interface, and as a command line interface.

Hopefully a useful insight to some of the tools presented at the CCP4 Study weekend. For anyone looking for further information on the CCP4 Study weekend: Agenda, Recording of Sessions, Proceedings from previous years.

Using PML Scripts to generate PyMOL images

We can all agree that typing commands into PyMOL can make pretty and publishable pictures. But your love for PyMOL lasts until you realise there is a mistake and need to re-do it. Or have to iterate over several proteins. And it takes many fiddly commands to get yourself back there (relatable rant over). Recently I was introduced to the useful tool of PML scripting, and for those who have not already discovered this gem please do read on.

These scripts can be called when you launch PyMOL (or from File>Run) and iterate through the commands in the script to adapt the image. This means all your commands can be adjusted to make the figure optimal and allow for later editing.

I have constructed and commented an example script (Joe_Example.pml) below to give a basic depiction of a T4 Lysozyme protein. Here I load the structure and set the view (the co-ordinates can be copied from PyMOL easily by clicking the ‘get view’ command). You then essentially call the commands that you would normally use to enhance your image. To try this for yourself, download the T4 Lysozyme structure from the PBD (1LYD) and running the script (command line: pymol Joe_Example.pml) in the same directory to give the image below.

The image generated by the attached PML script of the T4 Lysozyme (PDB: 1LYD)


### Load your protein ###

load ./1lyd.pdb, 1lyd

### Set your viewpoint ###

set_view (\
    -0.682980239,    0.305771887,   -0.663358808,\
    -0.392205656,    0.612626553,    0.686194837,\
     0.616211832,    0.728826880,   -0.298486710,\
     0.000000000,    0.000000000, -155.216171265,\
     4.803394318,   63.977561951,  106.548652649,\
   123.988197327,  186.444198608,   20.000000000 )

### Set Style ###

hide everything
set cartoon_fancy_helices = 1
set cartoon_highlight_color = grey70
bg_colour white
set antialias = 1
set ortho = 1
set sphere_mode, 5

### Make your selections ###

select sampleA, 1lyd and resi 1-20

colour blue, 1lyd
colour red, sampleA
show cartoon, 1lyd

### Save a copy ###

ray 1000,1500
png Lysozyme_Example_Output.png


Transgenic Mosquitoes

At the meeting on November 15 I have covered a paper by Gantz et al. describing a method for creating transgenic mosquitoes expressing antibodies hindering the development of malaria parasites.

The immune system is commonly divided into two categories: innate and adaptive. The innate immune system consists of non-specific defence mechanisms such as epithelial barriers, macrophages etc. The innate system is present in virtually every living organism. The adaptive immune system is responsible for invader-specific defence response. Is consists of B and T lymphocytes and encompasses antibody production. As only vertebrates posses the adaptive immune system, mosquitoes do not naturally produce antibodies which hinders their ability to defend themselves against pathogens such as malaria.

In the study by Gantz et al. the authors inserted transgenes expressing three single-chain Fvs (m4B7, m2A10 and m1C3) into the previously-characterised chromosomal docking sites.

Figure 1: The RT-PCR experiments showing the scFv expression in different mosquito strains

RT-PCR was used to detect scFv transcripts in RNA isolated from the transgenic mosquitoes (see Figure 1). The experiments showed that the attP 44-C recipient line allowed expression of the transgenes coding for the scFvs.

The authors evaluated the impact of the modifications on the fitness of the mosquitoes. It was shown that the transgene expression does not reduce the lifespan of the mosquitoes, or their ability to procreate.

Expression of the scFvs targeted the parasite at both the early and late development stages. The transgenic mosquitoes displayed a significant reduction in the number of malaria sporozoites per infected female, in most cases completely inhibiting the sporozoite development.

Overall the study showed that it is possible to develop transgenic mosquitoes that are resistant to malaria. If this method was combined with a mechanism for a gene spread, the malaria-resistant mosquitoes could be released into the environment, helping to fight the spread of this disease.

End of an era?

The Era of Crystallography ends…

For over 100 years, crystallography has been used to determine the atom arrangements of molecules; specifically, it has become the workhorse of routine macromolecular structure solution, being responsible for over 90% of the atomic structures in the PDB. Whilst this achievement is impressive, in some ways it has come around despite the crystallographic method, rather than because of it…

The problem, generally, is this: to perform crystallography, you need crystals. Crystals require the spontaneous assembly of billions of molecules into a regular repeated arrangement. For proteins — large, complex, irregularly shaped molecules — this is not generally a natural state for them to exist in, and getting a protein to crystallise can be a difficult process (the notable exception is Lysozyme, which it is difficult NOT to crystallise, and there are subsequently currently ~1700 crystal structures of it in the PDB). Determining the conditions under which proteins will crystallise requires extensive screening: placing the protein into a variety of difference solutions, in the hope that in one of these, the protein will spontaneously self-assemble into (robust, homogeneous) crystals. As for membrane proteins, which… exist in membranes, crystallisation solutions are sort of ridiculous (clever, but ridiculous).

But even once a crystal is obtained (and assuming it is a “good” well-diffracting crystal), diffraction experiments alone are generally not enough to determine the atomic structure of the crystal. In a crystallographic experiment, only half of the data required to solve the structure of the crystal is measured — the amplitudes. The other half of the data — the phases — are not measured. This constitutes the “phase problem” of crystallography, and “causes some problems”: developing methods to solve the phase problem is essentially a field of its own.

…and the Era of Cryo-Electron Microscopy begins

Cryo-electron microscopy (cryo-EM; primers here and here), circumnavigates both of the problems with crystallography described above (although of course it has some of its own). Single-particles of the protein (or protein complex) are deposited onto grids and immobilised, removing the need for crystals altogether. Furthermore, the phases can be measured directly, removing the need to overcome the phase problem.

Cryo-EM is also really good for determining the structures of large complexes, which are normally out of the reach of crystallography, and although cryo-EM structures used to only be determined at low resolution, this is changing quickly with improved experimental hardware.

Cryo-Electron Microscopy is getting better and better every day. For structural biologists, it seems like it’s going to be difficult to avoid it. However, for crystallographers, don’t worry, there is hope.

Start2Fold: A database of protein folding and stability data

Hydrogen/deuterium exchange (HDX) experiments are used to probe the tertiary structures and folding pathways of proteins. The rate of proton exchange between a given residue’s backbone amide proton and the surrounding solvent depends on the solvent exposure of the residue. By refolding a protein under exchange conditions, these experiments can identify which regions quickly become solvent-inaccessible, and which regions undergo exchange for longer, providing information about the refolding pathway.

Although there are many examples of individual HDX experiments in the literature, the heterogeneous nature of the data has deterred comprehensive analyses. Start2Fold ( [1] is a curated database that aims to present protein folding and stability data derived from solvent-exchange experiments in a comparable and accessible form. For each protein entry, residues are classified as early/intermediate/late based on folding data, or strong/medium/weak based on stability data. Each entry includes the PDB code, length, and sequence of the protein, as well as details of the experimental method. The database currently includes 57 entries, most of which have both folding and stability data. Hopefully, this database will grow as scientists add their own experimental data, and reveal useful information about how proteins refold.

The folding data available in Start2Fold is visualised in the figure below, with early, intermediate and late folding residues coloured light, medium and dark blue, respectively.


[1] Pancsa, R., Varadi, M., Tompa, P., Vranken, W.F., 2016. Start2Fold: a database of hydrogen/deuterium exchange data on protein folding and stability. Nucleic Acids Res. 44, D429-34.

What happens to the Human Immune Repertoire over time?

Last week during the group meeting we talked about a pre-print publication from the Ippolito group in Austin TX (here). The authors were monitoring the antibody repertoire from Bone Marrow Plasma Cells (80% of circulating abs) over a period of 6.5 years. For comparison they have monitored another individual over the period of 2.3 years. In a nutshell, the paper is like Picture 1 — just with antibodies

This is what the paper talks about in a nutshell. How the antibody repertoire looks like taken at different timepoints in an individual's lifetime.

This is what the paper talks about in a nutshell. How the antibody repertoire looks like taken at different timepoints in an individual’s lifetime.


The main question that they aimed to answer was: ‘Is the Human Antibody repertoire stable over time‘? It is plausible to think that there should be some ‘ground distribution’ of antibodies that are present over time which act as a default safety net. However we know that the antibody makeup can change radically especially when challenged by antigen. Therefore, it is interesting to ask, does the immune repertoire maintain a fairly stable distribution or not?

Firstly, it is necessary to define what we consider a stable distribution of the human antibody repertoire. The antibodies undergo the VDJ recombination as well as Somatic Hypermutation, meaning that the >10^10 estimated antibodies that a human is capable of producing have a very wide possible variation. In this publication the authors mostly focused on addressing this question by looking at how the usage of possible V, D and J genes and their combinations changes over time.

Seven snapshots of the immune repertoire were taken from the individual monitored over 6.5 years and two from the individual monitored over 2.3 years. Looking at the usage of the V, D and J genes over time, it appears that the proportion in each of the seven time points appears quite stable (Pic 2). Authors claim similar result looking at the combinations.  This would suggest that our antibody repertoire is biased to sample ‘similar’ antibodies over time. These frequencies were compared to the individual who was sampled over the period of 2.3 years and it appears that the differences might not be great between the two.

How the frequencies of V, D  and J genes change (not) over 6.5 years in a single individual

How the frequencies of V, D and J genes change (not) over 6.5 years in a single individual

It is a very interesting study which hints that we (humans) might be sampling the antibodies from a biased distribution — meaning that our bodies might have developed a well-defined safety net which is capable of raising an antibody towards an arbitrary antigen. It is an interesting starting point and to further check this hypothesis, it would be necessary to carry out such a study on multiple individuals (as a minimum to see if there are really no differences between us — which would at the same time hint that the repertoire do not change over time).


Rational Design of Antibody Protease Inhibitors

On the theme of my research area and my last presentation I talked at group meeting about a another success story in structurally designing an antibody by replicating a general protein binding site using grafted fragments involved in the original complex. The paper by Liu et al is important to me for two major reasons . Firstly they used an unconventional antibody for protein design, namely a bovine antibody which is known to have an extended CDR H3. Secondly the fragment was not grafted at the anchor points of the CDR loop.

Screen Shot 2016-07-06 at 10.28.03

SFTI-1 is a cyclic peptide and a known trypsin inhibitor. It’s structure is stabilised by a disulphide bridge. The bovine antibody is known to have an extended H3 loop which is essentially a long beta strand stalk with a knob domain at the end. Liu et al removed the knob domain and a portion of the B strand and grafted the acyclic version of the SFTI-1 to it. As I said above this result is very important because it shows we can graft a fragment/loop at places different then the anchor points of the CDR. This opens up the possibility for more diverse fragments to be grafted because of new anchor points,  and also because the fragment will sit further away from the other CDRs and the framework allowing more conformational space. To see how the designed antibody compares to the original peptide they measured the Kd and found a 4 fold increase (9.57 vs 13.3). They hypothesise that this is probably due to the fact that the extended beta strand on the antibody keeps the acyclic SFTI-1 peptide in a more stable conformation.

The problem with the bovine antibody is that if inserted in a human subject it would probably elicit an immune response from the native immune system. To humanise this antibody they found the human framework which shares the greatest sequence identity to the bovine antibody and then grafted the fragment on it. The human antibody does not have an extended CDR H3 and to decide what is the best place of grafting they tried various combinations again showing again that the fragments do not need to grafted exactly at the anchor points. Some of the resulting  human antibodies showed even more impressive Kds.

Screen Shot 2016-07-06 at 10.54.24

The best designed human antibody had a 0.79nM Kd, another 10-fold gain . Liu et al hypothesised that this might be due to the fact that the cognate protein forms contacts with residues on the other CDRs even though there is no crystal structure to show this. In order to test this hypothesis they mutated surface residues on the H2 and L1 loop to Alanine which resulted in a 6.7 fold decrease in affinity. The only comment I would have to this is that the mutations to the other CDRs might have destabilized the other CDRs on the antibody which could be the reason for the decrease in affinity.

Quantifying dispersion under varying instrument precision

Experimental errors are common at the moment of generating new data. Often this type of errors are simply due to the inability of the instrument to make precise measurements. In addition, different instruments can have different levels of precision, even-thought they are used to perform the same measurement. Take for example two balances and an object with a mass of 1kg. The first balance, when measuring this object different times might record values of 1.0083 and 1.0091, and the second balance might give values of 1.1074 and 0.9828. In this case the first balance has a higher precision as the difference between its measurements is smaller than the difference between the measurements of balance two.

In order to have some control over this error introduced by the level of precision of the different instruments, they are labelled with a measure of their precision $latex 1/\sigma_i^2$ or equivalently with their dispersion $latex \sigma_i^2$ .

Let’s assume that the type of information these instruments record is of the form $latex X_i=C + \sigma_i Z$,  where $latex Z \sim N(0,1)$ is an error term, $latex X_i$ its the value recorded by instrument $latex i$ and where $latex C$ is the fixed true quantity of interest the instrument  is trying to measure. But, what if $latex C$ is not a fixed quantity? or what if the underlying phenomenon that is being measured is also stochastic like the measurement $latex X_i$. For example if we are measuring the weight of cattle at different times, or the length of a bacterial cell, or concentration of a given drug in an organism, in addition to the error that arises from the instruments; there is also some noise introduced by dynamical changes of the object that is being measured. In this scenario, the phenomenon of interest, can be given  by a random variable $latex Y \sim N(\mu,S^2)$. Therefore the instruments would record quantities of the form $latex X_i=Y + \sigma_i Z$.

Under this case, estimating the value of $latex \mu$, the expected state of the phenomenon of interest is not a big challenge. Assume that there are $latex x_1,x_2,…,x_n$ values observed from realisations of the variables $latex X_i \sim N(\mu, \sigma_i^2 + S^2)$, which came from $latex n$ different instruments. Here $latex \sum x_i /n$ is still a good estimation of $latex \mu$ as $latex E(\sum X_i /n)=\mu$.  Now, a more challenging problem is to infer what is the underlying variability of the phenomenon of interest $latex Y$. Under our previous setup, the problem is reduced to estimating $latex S^2$ as we are assuming $latex Y \sim N(\mu,S^2)$ and that the instruments record values of the from $latex X_i=Y + \sigma_i Z$.

To estimate $latex S^2$ a standard maximum likelihood approach could be used, by considering the likelihood function:

$latex f(x_1,x_2,..,x_n)= \prod  e^{-1/2 \times (x_i-\mu)^2 /(\sigma_i^2+S^2)} \times 1/\sqrt{2 \pi (\sigma_i^2+S^2) }$,

from which the maximum likelihood estimator of $latex S^2$ is given by the solution to

$latex \sum [(X_i- \mu)^2 – (\sigma_i^2 + S^2)] /(\sigma_i^2 + S^2)^2 = 0$.

Another more naive approach could use the following result

$latex E[\sum (X_i-\sum X_i/n)^2] = (1-1/n) \sum \sigma_i^2 + (n-1) S^2$

from which $latex \hat{S^2}= (\sum (X_i-\sum X_i/n)^2 – ( (1-1/n )  \sum(\sigma_i^2) ) ) / (n-1)$.

Here are three simulation scenarios where 200 $latex X_i$ values are taken from instruments of varying precision or variance $latex \sigma_i^2, i=1,2,…,200$ and where the variance of the phenomenon of interest $latex S^2=1500$. In the first scenario $latex \sigma_i^2$ are drawn from $latex [10,1500^2]$, in the second from $latex [10,1500^2 \times 3]$ and in the third from $latex [10,1500^2 \times 5]$. In each scenario the value of $latex S_2$ is estimated 1000 times taking each time another 200 realisations of $latex X_i$. The values estimated via the maximum likelihood approach are plotted in blue, and the values obtained by the alternative method are plotted in red. The true value of the $latex S^2$ is given by the red dashed line across all plots.


First simulation scenario where $latex \sigma_i^2, i=1,2,…,200$ in $latex [10,1500^2]$. The values of  $latex \sigma_i^2$ plotted in the histogram to the right. The 1000 estimations of $latex S$ are shown by the blue (maximum likelihood) and red (alternative) histograms.


First simulation scenario where $latex \sigma_i^2, i=1,2,…,200$ in $latex [10,1500^2 \times 3]$. The values of $latex \sigma_i^2$ plotted in the histogram to the right. The 1000 estimations of $latex S$ are shown by the blue (maximum likelihood) and red (alternative) histograms.

First simulation scenario where $latex \sigma_i^2, i=1,2,…,200$ in $latex [10,1500^2 \times 5]$. The values of $latex \sigma_i^2$ plotted in the histogram to the right. The 1000 estimations of $latex S$ are shown by the blue (maximum likelihood) and red (alternative) histograms.

For recent advances in methods that deal with this kind of problems, you can look at:

Delaigle, A. and Hall, P. (2016), Methodology for non-parametric deconvolution when the error distribution is unknown. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78: 231–252. doi: 10.1111/rssb.12109