Linear programming (LP) is known as a fast and powerful computational technique. It has been applied to a large range of problems in finances and economics, but it is not very popular among us bioinformaticians, computational biologists, and the likes.
Linear Programming is all about find feasible solutions that satisfy a series of constraints (usually represented by inequalities). Does it sound like a familiar problem to bioinformaticians and computational biologists out there?
This leaves room for some questioning: can biological phenomena be modelled or simplified under the assumption of linearity? Furthermore, can LP be used to tackle the many difficult problems posed in our field? Perhaps an even bigger question: why would any of us use Linear Programming instead of another type of linear modelling? What are the advantages of it?
I will not incur in explaining the particulars of LP here. There is a plethora of materials available online (Wikipedia and Wolfram are accessible starting points) that detail Linear Programming. For those eager for something more substantial, V. Chvatal’s Linear Programming and Dantzig’s Linear Programming and Extensions are two good texts on the subject.
During this week’s journal club, I discussed an article that attempted to use Linear Programming to devise knowledge-based Docking Potentials (DP) tailored for transient protein-protein complexes. Transient complexes tend to be under-represented on the PDB, mostly due to the inherent difficulties of crystallizing such complexes. Hence, the development of knowledge-based potentials for these special cases of protein interaction is drastically hindered by a sample size limitation.
A cartoon representation of a transient complex between Azurin (cyan) and its partner Cytochrome C551 (dark blue) from Pseudomonas aeruginosa. Transient protein complexes are hard to crystallize, hence, under-represented on the PDB.
Source: Bizzarri AR, Brunori E, Bonanni B, Cannistraro S. Docking and molecular dynamics simulation of the Azurin–Cytochrome c551 electron transfer complex. J. Mol. Recognit. 2007; 20: 122–131
To offset such limitation, it would be ideal if one could extrapolate information from decoys (non-native conformations obtained from computational docking tools) in order to improve the Docking potentials. Furthermore, in an ideal world, one would also address the bias introduced by homology/sequence similarity between the existing proteins in the available structures of transient complexes.
The author of the article “Designing coarse grained-and atom based-potentials for protein-protein docking – Tobi D. – BMC Structural Biology 2010, 10:40 doi:10.1186/1472-6807-10-40 ” claims that LP can address such issues by incorporating information from the decoys as linear constraints to the model. The article describes a linear problem, in which the aim is to minimize the variance of how much the non-native energy potentials differ from the native ones. Also, they impose the constraints that native structures must have a lower energy than all of the non-native structures for a given complex (lower in this case is good).
The energy is defined as a weighted sum of the counts of specific interaction types on the complex interface. In their work, the author employed two models: an atom-based model and a side chain-based model. These models are used to classify atoms into groups and to simplify calculations. Initially, they define boolean (one-step) interactions: two atoms interact if they are within a cutoff distance of each other. This cutoff varies according to the type of atoms involved. The initial model led to a state of infeasibility, and it was then replaced by a two-step model, where you have strong and weak interactions and two sets of cutoff (this leads to twice as many unknowns in the LP model).
Well, does it work? How does it fair against other existing knowledge-based DPs?
Source: Designing coarse grained-and atom based-potentials for protein-protein docking. – Tobi D. – BMC Structural Biology 2010, 10:40 doi:10.1186/1472-6807-10-40
Despite the lack of brilliant results or any apparent improvement compared to the state-of-art, the potentials described in the article seem to slightly outperform ZDOCK2.3’s scoring functions.
This may actually speak in favour of the applicability of LP to problems in our area. In the case presented during the journal club, an LP approach produced comparable results to more conventional techniques.
Perhaps the best answer to “why should I use LP?” is that it is an unconventional, creative solution. It is significantly fast and, therefore, easy to try out depending on your problem. Science is all about experimentation, after all. Why would you not try a different technique if you have the means to?
The moral of the story: it is good to think outside the box, as long as you keep your feet on the ground.
Image Source: http://www.designthenewbusiness.com/blog/documenting/thinking-inside-the-box.html
Check the article discussed in the post here.