Journal club (Bernhard Knapp): NetMHCIIpan-3.0
This week’s topic of the Journalclub was about the prediction of potential T cell epitopes (=prediction of the binding affinity between peptides and MHC). In this context I presented the latest paper in the NetMHCpan series:
Karosiene E, Rasmussen M, Blicher T, Lund O, Buus S, Nielsen M. NetMHCIIpan-3.0, a common pan-specific MHC class II prediction method including all three human MHC class II isotypes, HLA-DR, HLA-DP and HLA-DQ. Immunogenetics. 2013 Oct;65(10):711-24
The reliable prediction of the peptide/MHC binding affinity is already a scientific aim for several years: Early approaches were based on the concept of anchor residues i.e. those peptide side-chain which are pointing in the direction of the MHC. The next “generation” of methods was matrix based i.e. it was simply counted how likely it is that a specific residue is at peptide position 1, 2, 3, … for binding and non-binding peptides of a specific MHC allele. In the next step methods started to incorporate higher order approaches i.e. positions within the peptide influence each other (e.g. SVM, ANN, …). While such methods are already quite reliable their major limitation is that they can only be trained on the basis of sufficient experimental binding data per allele. Therefore a prediction for alleles with only few experimental peptide binding affinities is not possible or rather poor in quality. This is especially import because there are several thousand HLA alleles: The latest version of the IPD – IMGT/HLA lists for example 1740 HLA Class I A and 2329 HLA Class I B proteins (http://www.ebi.ac.uk/ipd/imgt/hla/stats.html). Some of them are very frequent and others are not but this the aim would be a 100% coverage.
Pan specific approaches go one step further and allow the binding prediction between peptide and an arbitrary MHC allele for which the sequence is known. The NetMHCpan approach is one of them and is based on the extraction of a pseudo sequence (those MHC residues which are in close proximity to the peptide) per MHC allele. Subsequently this MHC pseudo sequence as well as the peptide sequences are encoded using some algorithm e.g. a BLOSUM matrix. Finally the encoded sequences and the experimental binding affinities are fed into an Artificial Neural Network (ANN). For an illustration of the workflow see:
This approach turned out to be quite successful and (not too much) depending on the allele the area under ROC reaches in average a healthy 0.8x number which is already quite competitive with experimental approaches.
The netMHCpan series has progressed over the last years starting to cover MHC class I in humans (2007) and beyond (2009). Then MHC class II DR (2010) and in the latest version also DP and DQ (2013). With the next paper expected to be about non human MHC class II alleles the prediction of the binding affinity between peptides and pretty much every MHC should be possible.