The unidirectional process of translation converts one dimensional mRNA sequences into three dimensional protein structures. The reactant, the mRNA, is built from 64 codon variants, while the product, the protein, is constructed from the 20 biologically relevant amino acids. This reduction of 64 variants to a mere 20 is indicative of the degeneracy contained within the genetic code. Multiple codons encode for the same amino acid, and hence any given protein can be encoded by a multitude of synonymous mRNA sequences. Current structural biology research often assumes that these synonymous mRNA sequences are equivalent; the ability of certain proteins to unfold and refold without detrimental effects leading to the assumption that the information pertaining to the folded structure is contained exclusively within the protein sequence. In actuality, experimental evidence is mounting that shows synonymous sequences can in fact produce proteins with different properties; synonymous codon switches having been observed to cause a wide range of detrimental effects, such as decreases in enzyme activity, reductions in solubility, and even causing misfolding.
For my 10 week DTC project within the Deane group, I was tasked with resurrecting the Coding Sequence and Structure database (CSandS; read: Sea Sands) and using it to test for the evolutionary conservation of codon choice over groups of proteins with similar structures. With experimental differences noted in protein product structure between synonymous sequences, we wished to investigate if codon choice is an evolutionary constraint on protein structure, and if so, to what degree, and in what manner. To test for evolutionary conservation we combined the theories of codon optimality and cotranslational folding, our hypothesis being that the choice of codon directly affects the translation efficiency of the ribosome; consequently different codons give the nascent polypeptide emerging from the ribosome varying amounts of time to explore the fold-space. By assigning optimality scores to each codon, we can search for regions of fast translation and slow translation, then by looking for correlations within aligned sets of structural similar proteins we can identify sites where codon choice is of importance. While the 10 weeks project focussed mainly on methodology and implementation, my preliminary results indicate that a large percentage of proteins are under selective pressures with regards to codon choice. I found that in most sets of proteins, there is an greater amount of correlation than would be expected by chance, this crucially suggests that there is additional structural information contained within the mRNA that is lost once translation occurs.
For additional information on the background, methodology and implementation of this research, please feel free to view the presentation slides at: http://www.slideshare.net/AlistairMartin/evolut-26981404