Author Archives: JP Ebejer

Recognizing pitfalls in Virtual Screening: A critical review

So, my turn to present at Group Meeting Wednesdays – and I decided to go for the work by Scior et al. about pitfalls in virtual screening.  As a general comment, I think this paper is well written and tackles most of the practical problems when running a virtual screening (VS) exercise.  Anyone, who intends to either develop a method in this field or else is planning to run a virtual screening  exercise should read it.  I’ve often heard the phrase “virtual screening doesn’t work”, and that comes almost exclusively from people who run computational experiments as a black box, without understanding what is going on and by accepting all defaults for a specific protocol.  This paper highlights what to watch out for.  Of the author list, I’ve only met with Andreas Bender once at an MGMS meeting a few years back – his main PhD work was on molecular similarity.

The article describes pitfalls associated with four areas; expectations and assumptions; data design and content; choice of software; and conformational sampling as well as ligand and target flexibility.  The authors start off by arguing that the expectations are too high; people just run a VS experiment and expect to find a potent drug.  But this a rare occurrence indeed.  Below is a set of notes for their main points.

Erroneous assumptions and expectations

  1. High expectations: main goal is to identify novel bioactive chemical matter for the particular target of interest.  Highly potent compounds desirable but not required.  Expectations too high.  Lead; single digit µM and Hit; < 25µM
  2. Stringency of queries: strict vs loose search criteria.  Strict; no diversity, few good results returned.  Loose; returns many false positives.   Removal of one feature at a time – a kind of pharmacophoric feature bootstrapping which highlights which features are important.
  3. Difficulty in binding pose prediction.  Taken from their reference [46], “For big (> 350 Daltons) ligands, however, there is currently very little evidence. We hope investigators will come forward with crystallographic confirmation of docking predictions for higher molecular weight compounds to shed more light on this important problem.”  This point is really interesting and tools, such as Gold, even have an FAQ entry which addresses this question.
  4. Water: hydrogen bonds are mediated by water which are often visible in the crystal structure.  Hard to predict exact number, position and orientation.  Realism of model at a cost of computational resources.
  5. Single vs. multiple/allosteric binding pockets: Sometimes binding site is not known yet we always assume that ligand binds to one specific place.
  6. Subjectivity of Post VS compound:  The result of a VS experiment is a ranking list of the whole screening database.  Taking top N results in very similar compounds – so some sort of post processing is usually carried out (e.g. clustering, but even a subjective manual filtering).  Difficult to reproduce across studies.
  7. Prospective validation: the benchmarking of VS algorithms is done retrospectively.  Test on an active/decoy set for a particular target.  Only putative inactives.  Rarely validated in an external prospective context.
  8. Drug-likeness: most VS experiments based on Lipinski Ro5 – not more than 5 hydrogen bond donors (nitrogen or oxygen atoms with one or more hydrogen atoms), not more than 10 hydrogen bond acceptors (nitrogen or oxygen atoms), a molecular mass less than 500 daltons, An octanol-water partition coefficient log P not greater than 5.  But these apply to oral bioavailability.  Lots of drugs fall out of this scope; intravenous drugs, antibiotic, peptidic drugs.  VS validated on Lipinski space molecules.
  9. Diversity of benchmark library vs diversity of future , prospective vs runs: Library must fit the purpose of the experiment.  Most VS validation experiments are on commercially available libraries ~ small fraction of chemical space.  Type of screening library must be closely related to objective of VS campaign.   Results have to be transferable between runs. Validation on specific target family? If the goal is lead optimization combinatorial libraries are attractive.  Natural versus synthesized compounds; different chemical space.

Data design and content

  1. Incomparability of benchmark sets: some datasets for docking studies, others for ligand based VS – incomparable methods.  In general 2D methods better than 3D (surprising!).  In 2D methods fingerprints outperform 3D methods.  Same datasets for validation of different methods – hard to reproduce any study otherwise. 
  2. Limited comparability of performance metrics: Tag along on previous point; performance measurement used for different measurements should be the same. Mean EF risky because of ratio between actives to inactive molecules. ROC curves a problem because of early and late performance – use of BedROC (different importance to early and late stages of retrieved list of compounds). EF = (number of actives / number of expected) for a given % of the database
  3. Hit rate in benchmark data sets; small libraries not good enough. Typical VS hit rates ~0.01% – 0.14%. Analogue bias; actives all look very similar to each other. Artificial enrichment; easy to tell between actives and decoys. Recent study found that for ligand based VS using no. of atoms gives half the VS performance.
  4. Assay Comparability and Technology: Properly designed datasets such as MUV, use of similarity distributions to remove anything very similar to each other.  Remove problematic molecules like autofluoroscence .  MUV uses data from pubchem; different bioassays from different groups hence different quality.  Choices of targets; cutoffs; parameters; etc.  “Ideal VS benchmark deck will never happen.”
  5. Bad molecules as actives: No real activity but either reactive or aggregating molecules in the assay which gives up a false positive; PAINs Pan assay interfering substances or frequent hitters. Small number of actives compared to inactives, false positives worse than false negatives.
  6. Putative inactive compounds as decoys. The decoys are actually actives. 
  7. Feature weights: LBVS based on a single query fails to identify important parts of the the molecule, e.g. benzamidine warhead in factor Xa inhibitors

Choice of Software

  1. Interconverting chemical formats; errors or format incompatibilities.  Information lost or altered; or when using same format across different software (e.g. chirality, hybridization, and protonation states).
  2. Molecule preparation; query molecules must be preprocessed exactly the same way as the structures in the database being screened to ensure consistency (e.g. partial charge calculation)
  3. Feature definition: Specific rules which are sometimes left out of pharmacophoric definition. e.g. O, N in oxazole do not both behave as a HBA. Watch out for tautomers, protonation state, and chirality
  4. Fingerprint selection and algorithmic implementation: different implementations of same fingerprint MACCS result in different fingerprints. Choice of descriptors; which ones to pick? Neighbourhood? Substructure?
  5. Partial charges: Mesomeric effects; formal +1 charge spread over guanidine structure.
  6. Single predictors versus ensembles: no single method works best in all cases. Consensus study; apply multiple methods and combine results.

Conformational sampling as well as ligand and target flexibility

  1. Conformational coverage: four main parameters: (i) sampling algorithms and their specific parameters; (ii) strain energy cutoffs (iii) maximum number of conformations per molecule (iv) clustering to remove duplicates
  2. Defining bioactive conformations: most ligands have never been co-crystallized with their primary targets and even fewer have been cocrystallized with counter targets. Same ligand might bind to different proteins in vastly different conformations. How easy is it to reproduce the cognate conformation? Also ligand changes shape upon binding. Minimum energy conformations are a common surrogate.
  3. Comparing conformations: definitions of identity thresholds. 0.1 < rmsd < 0.5 excellent; 0.5 < rmsd < 1.0 good fit; 1.0 < rmsd < 1.5 acceptable; 1.5 < rmsd < 2.0 less acceptable; >2.0 not a fit in terms of biological terms. All atoms vs fragments RMSD makes direct comparison hard.
  4. Size of conformational ensemble; trade off between computational cost and sampling breadth. Conformer generator may not generate bioactive conformation. How many conformations required to have bioactive one. Many bioactive conformations might exist.
  5. Ligand flexibility – hard upper limit for no. of conformations. Conformer sizes depend mostly on number of rotatable bonds. Conformer generation tools don’t work well on some classes of molecules e.g. macrocycles
  6. High energy conformations – high energy conformers (or physically unrealistic molecules; e.g. a cis secondary amide) detrimental to VS experiments. 3D pharmacophore searches sometimes result in matching strained structure; but 70% of ligands bind at strain energies below 3kcal/mol (stringent). 
  7. Target flexibility – target flexibility – can do simple things like sidechain rotation, but nothing major like backbone flexibility. Sometimes docking to multiple structures snapshots resulting from molecular dynamics
  8. Assumption of ligand overlap – lots of 3D shape based VS attempt to maximize the overlap between ligands – but based on X-ray structures this is not always the case (different ligands may occupy slight different regions of the binding pocket).
  9. Missing positive controls – Strict cutoff stops you from retrieving positive controls in your Virtual Screening experiment. Selectivity (lower number of false postives)/ sensitivity (larger percentage of true positives) cutoff needs to be determined appropriately.

In conclusion, VS can be run by a monkey – but if that is the case expect bad results. Careful database preparation, judicious parameter choices, use of positive controls, and sensible compromises between the different goals one attempts to obtain are required. VS probabilistic game – careful planning and attention to detail increases probability of success.

How to make environment variables persist in your web app (Apache2)

Last Sunday, with the Memoir gang, we were talking about using this blog as a technical notebook. This post is in that spirit.

We have moved most of the validation of Memoir before submitting the job to the queue.  This has the advantage that the user knows immediately the submission has failed (instead of waiting for the job to be processed in the queue).  The only issue with this is that the apache user (on most systems www-data) is going to run the validation pipeline.  Memoir requires a ton of PATH variables addition.

In most cases you cannot add variables in .bashrc as the apache user does not have a home directory.  The easiest way how to add environment variables for the apache user is:

sudo vim /etc/apache2/envvars

And add the following line at the bottom (these have to be customized according to the server):

export PATH=$PATH:/opt/tmalign:/opt/muscle:/opt/joy-5.10:/opt/joy-5.10/psa:/opt/joy_related-1.06:/opt/joy_related-1.06/sstruc:/opt/joy_related-1.06/hbond:/opt/medeller-dev/bin:/opt/usearch6.0.307:/opt/ncbi-blast-2.2.27+:/opt/MPT/bin

After this restart apache – and you should be laughing. Or, as is often the case, apache will be laughing at you.

sudo service apache2 restart

The apache user should now “see” the amended path variable.

Here it is this technical note – so in a few weeks time when I forget all of the above I can just copy and paste it …

(This post is short, also because there is the superbowl!!)

Anatomy of a blog post

Now, shall I use the first person singular or plural to write this?  Active or passive voice? …

It doesn’t really matter.  This isn’t a formal article, and you can even use abbreviations.  This group blog, like anything else during our time in Oxford, is an experiment.  We will give it a few months and see what happens.  If it pans out, we will have a, more or less, detailed research journal for the group.  Not to mention a link with the outside world (prospective students? employers?) and proof that we can “communicate” with others. And since this is an exploratory exercise we should have freedom to explore what we want to write about.

We should have plenty of fodder.  Let us face it, if we do not do some mildly interesting science every week then we are probably not having enough fun.  But even if you are working on a hushed up, undercover project (e.g. the next blockbuster drug against Malaria) – there are still so many interesting bits of our D.Phil. which would otherwise never see the light of day.

For inspiration, have a look at other popular scientific blogs – the chembl one is both educational and humourous in equal measures (Post Idea #1: list of bio/cheminformatics blogs which every grad student should read).  Blogs are a great way to survey literature without actually doing any reading (Post Idea #2: tricks to increase grad student productivity… what do you mean you don’t use Google Alerts to surprise your supervisor with a link to a paper published the day before?); and for a TL;DR version there is twitter (Post Idea #3: Idea #1 but for twitter instead).  I only found out about the four stranded DNA in human cells by following @biomol_info.

And of course, we are mostly a computational group – so software is what we churn out on a daily basis.  How much of the software we write ends up resting forever on our disks, never to be used again.  The masses want splitchain!  (Idea #4: post software you wrote).  And there is benefit in not only giving out software, but also explaining the internals with snippets  (Idea #5: a clever algorithm explained line-by-line).

And then there is the poster you hung up once (Idea: #6) or the talk you gave and prepared for hours on your disposable, use-once-only slides (Idea: #7).  There is the announcement of publishing a paper – that solemn moment in academia when someone else thinks what you have done is worthy (Idea: #8 – btw well done to our own Jamie Hill for his recent MP-T work).

And if your an athelete, like Anna (Dr. Lewis) who crossed the atlantic in a rowing boat or Eleanor who used to row for the blues – what can I say, this is how we roll, or row [feeble attempt at humour] – thats a non-scientific but unique and interesting experience too (Idea #8).  .

If you’ve read a paper and you think it’s interesting comment on it – people will follow your posts just because it acts like a literature filter (Idea #9).  You can probably even have a rant (Idea #10); as long as its more positive and less bitter than Fred Ross’ Farewell to Bioinformatics.

Finally, this post is long and tedious for the reader.  But that is ok too – like everything else here, it is a learning experience and the more I write the more I will improve.  So hey, I’m also doing this to write a better thesis (i.e. to make the writing less painful).

An addendum; my initial intention was to discuss the bits which make a good blog post.  You can find lots of articles about this – so it is less interesting; but here are the main points

cover_real_conformers

If a picture is really worth a thousand words, 30 of these is all I need for my thesis.