Monthly Archives: August 2022

Words of Wisdom from final year PhD students

NB: These are entirely subjective so please ignore them all if you want.

1.     Write everything down in a searchable place 

Maybe you are gifted with a brilliant memory but, for the rest of us, write everything down (either in a notebook, or better yet, some kind of searchable typed document). This includes notes from supervisor meetings, industry meetings, clever suggestions over coffee, group meetings, etc… 

In our experience, writing things on paper is risky unless you have a decent filing system (see our desks for examples of how not to file notes). It also requires writing legibly. Typed notes are also particularly useful for saving common error messages/bug fixes/useful installation instructions/functions etc in one place so that you can easily search for them again! This can be just a word document, o rGemma showed me “Notion” which has so far been really useful (and you get to put emojis next to your notes).

This also leads to the second tip…

2.     Type up notes on papers you’ve read or use a reference manager 

Continue reading

Tidbits from YouGov Polls

Some recent verdicts from the British public on YouGov polls

  • The Queen (97%) is less well known than her husband Prince Philip (98%)
  • Liz Truss’ UK popularity rating (21%) is lower than George W. Bush’s (22%)
  • The most popular British dish is ‘Chips’ (84%) followed by ‘Fish and Chips’ (83%)
  • Oxford (55%) is a less popular university than Cambridge (58%)

So much for Aristotle’s ‘wisdom of crowds’!

Retrieving AlphaFold models from AlphaFoldDB

There are now nearly a million AlphaFold [1] protein structure predictions openly available via AlphaFoldDB [2]. This represents a huge set of new data that can be used for the development of new methods. The options for downloading structures are either in bulk (sorted by genome), or individually from the webpage for a prediction.

If you want just a few hundred or a few thousand specific structures, across different genomes, neither of these options are particularly practical. For example, if you have several thousand experimental structures for which you have their PDB [3] code, and you want to obtain the equivalent AlphaFold predictions, there is another way!

If we take the example of the PDB’s current molecule of the month, pyruvate kinase (PDB code 4FXF), this is how you can go about downloading the equivalent AlphaFold prediction programmatically.

  1. Query UniProt [4] for the corresponding accession number – an example python script is shown below:
Continue reading

Ten quick tips for proofreading your work

For my blog post, I thought I’d revisit my dark past on the other side of academic publishing when I worked as a copy editor and proofreader for two years between my undergrad and Masters. During this time, I worked primarily on review papers and news content. While I don’t claim to be a great writer or editor, I thought I’d share some easy tips to help refine your writing and make it more consistent. This is by no means an exhaustive list and probably most of them will already be familiar to you!

1. Consistency is key

I think two of the most important aspects of proofreading are ensuring consistency and using your common sense. For example, instead of agonizing over how you style a word, choose what you think is most appropriate and check that you’ve applied it consistently throughout the text. Check that style matches between the main text, headings, figure legends and footnotes. Some specific things to look for include the following:

  • Capitalization
    • If you’ve capitalized headings, has this been done throughout?
  • Italicization
    • E.g., have you italicized all your mentions of ‘in silico’? 
  • Superscript and subscript
    • E.g., is ‘half-maximal inhibitory concentration (IC50)‘ the same throughout? 
  • Numbers
    • Have you mixed up numerical and spelled-out numbers?  
    • E.g., I drank five cups of coffee and 4 cups of tea. 
Continue reading

Filtering molecules with long linkers

Recently I was tasked with filtering out ‘stringy’ molecules that were being produced with the fragment merging method I’m working on (that is, molecules with lots of consecutive non-ring bonds that weren’t necessarily caught with my rotatable bond filter). While this is quite a niche/specific task, through this I discovered a couple of RDKit functions that I wasn’t previously aware of but might be helpful for other people regularly looking at small molecules. The demo adapts code from this helpful blogpost on cutting a molecule into rings and linkers from ‘Is life worth living?’ (which is a useful source of cheminformatics wisdom; https://iwatobipen.wordpress.com/2020/01/23/cut-molecule-to-ring-and-linker-with-rdkit-rdkit-chemoinformatics-memo/). Obviously in practice you may be applying lots of different filters to enumerated molecules, but this is just a small example of something I found useful. 

The Jupyter Notebook can be found at: 

https://github.com/stephwills/Demo-removing-stringy-molecules/blob/main/Molecule%20filter.ipynb

Happy coding, 

Steph 

Meeko: Docking straight from SMILES string

When docking, using software like AutoDock Vina, you must prepare your ligand by protonating the molecule, generating 3D coordinates, and converting it to a specific file format (in the case of Vina, PDBQT). Docking software typically needs the protein and ligand file inputs to be written on disk. This is limiting as generating 10,000s of files for a large virtual screen can be annoying and hinder the speed at which you dock.

Fortunately, the Forli group in Scripps Research have developed a Python package, Meeko, to prepare ligands directly from SMILES or other molecule formats for docking to AutoDock 4 or Vina, without writing any files to disk. This means you can dock directly from a single file containing all the SMILES of the ligands you are investigating!

Continue reading

ISMB 2022 – July 10-14 Madison, Wisconsin

Madison, Wisconsin, a place known for its superb selection of craft beverages, for having Wisconsin’s Best Cheese Curds, and, most importantly, for hosting the 2022 annual international conference on Intelligent Systems for Molecular Biology (ISMB). Fortunately, we (Lewis and Tobias) got to attend this year’s ISMB and get a taste of Madison. The 2022 conference is the 30th ISMB conference and has grown to become the world’s largest bioinformatics/computational biology conference with nearly 600 presented talks. We therefore got to hear a wide range of different and interesting talks.

Continue reading

5th Artificial Intelligence in Chemistry Symposium

The lineup for the Royal Society of Chemistry’s 5th “Artificial Intelligence in Chemistry” Symposium (Thursday-Friday, 1st-2nd September 2022) is now complete for both oral and poster presentations. It really is a fantastic selection of topics and speakers and it is clear this event is now a highlight of the scientific calendar. Our very own Prof. Charlotte M. Deane, MBE will be giving a keynote.

5th RSC-BMCS/RSC-CICAG Airtificial Intelligence in Chemistry Symposium, 1st-2nd September, Churchill College, Cambridge + Zoom broadcast.

It marks a return to in-person meetings: it will be held at Churchill College, Cambridge, with a conference dinner at Trinity Hall.

More details are here: https://www.rscbmcs.org/events/aichem22/.

Registration for in person attendance is open until Monday 29th August 17:00 (BST).

It is also possible to register for virtual attendance; the meeting will be broadcast on Zoom.

The evolution, evolvability and engineering of gene regulatory DNA

Catching up on the literature is one of the highlights of my job as a scientist. True, sometimes you can be overwhelmed by the amount of information you don’t have; or wonder if we really need another paper showing that protein-ligand scoring functions don’t work. And yet, sometimes you find excellent research that you can’t but regard with a mixture of awe and envy. At a recent group meeting, I discussed one such paper from the research group of Aviv Regev at MIT, where the authors perform an impressive combination of computation and experiment to consider some basic questions in gene regulation and evolution. Here is why I think it’s excellent.

The authors are interested in promoters, small sequences of DNA that precede genes, which are known to regulate how frequently their partners will be expressed. In short, these promoters are binding sites for transcription factors, a family of proteins that in turn recruit RNA polymerase to transcribe DNA to RNA. In turn, albeit not directly, the rate of gene transcription determines the rate at which a protein is produced. If this sounds simple, however, that is where our understanding stops. The human genome encodes some 1.6k different transcription factors (~6-7% of protein-coding genes) and their underworkings are still not well-understood.

Continue reading