Monthly Archives: June 2023

KAUST Computational Advances in Structural Biology

Last month, I had the privilege of being invited to the KAUST Research Conference on Computational Advances in Structural Biology, held from May 1-3, 2023. This gave me the opportunity to present some of the latest OPIG works on small molecules while visiting an exceptional campus with state-of-the-art facilities in one of those corners of the world that are not widely known. Moreover, the experience went beyond the impressive surroundings as I had the chance to attend a highly engaging conference and meet many scientists from different backgrounds.

KAUST Library (left) and Dinning Hall (right)

The conference brought together experts in the field to explore cutting-edge developments in computational structural biology. It had a primary focus on advancements in protein structure prediction, multi-scale simulations, and integrative structural biology. Cryo-electron microscopy (cryo-EM) was the most popular experimental technique, with more than a third of the talks dedicated to its applications. These talks showcased impressive examples where structure prediction, simulations, and mid-resolution cryo-EM maps were combined to construct atomic models of large macromolecular complexes.

Notable examples of integrative works were presented by Jan Kosinski and Thomas Miller, among others. Jan Kosinski shared insights into the model of the human nuclear pore complex, highlighting the integration of cryo-electron tomography (cryo-ET), prior experimental knowledge, and AlphaFold predictions. Thomas Miller, on the other hand, presented his work on EM-based visual biochemistry, which combines single-particle cryo-electron microscopy (cryo-EM), and time-resolved experiments, as a tool to study the molecular mechanisms of eukaryotic DNA replication.

There were also several talks about novel algorithms. Nazim Bouatta presented some less-known details about OpenFold and introduced some of their approaches to tackling the problem of multimer modelling. He also announced the future release of folding methods for predicting protein-ligand complexes. Jianlin Cheng presented MULTICOM, their new protein structure predictor based on consensus predictions from Alphafold. Sergei Grudinin showed deep-learning tools able to predict protein dynamics as well as some integrative modelling tools driven by low-resolution experimental observations, such as small-angle scattering.

On the cryo-EM methods side, Mikhail Kudryashev presented TomoBEAR and SUSAN, cryoEM tools developed to automatize the analysis of tomographic data. Johannes Schwab presented dynamight, a deep learning-based approach for heterogeneity analysis in single particle cryo-EM. While, on the ComChem side, Haribabu Arthanari showed their ultra-large Virtual screening platform and Jean-Louis Reymond talked about tools to enumerate, visualize and search the vast chemical space of drug-like molecules

Overall, the conference provided a quite diverse set of talks that facilitated multidisciplinary views and discussions. From protein structure prediction to integrative approaches combining experimental and computational methods, the talks showed the transformative potential of computational analysis in unravelling the complexities of biological macromolecules.

9th Joint Sheffield Conference on Cheminformatics

Over the next few days, researchers from around the world will be gathering in Sheffield for the 9th Joint Sheffield Conference on Cheminformatics. As one of the organizers (wearing my Molecular Graphics and Modeling Society ‘hat’), I can say we have an exciting array of speakers and sessions:

  • De Novo Design
  • Open Science
  • Chemical Space
  • Physics-based Modelling
  • Machine Learning
  • Property Prediction
  • Virtual Screening
  • Case Studies
  • Molecular Representations

It has traditionally taken place every three years, but despite the global pandemic it is returning this year, once again in person in the excellent conference facilities at The Edge. You can download the full programme in iCal format, and here is the conference calendar:

Continue reading

Customising MCS mapping in RDKit

Finding the parts in common between two molecules appears to be a straightforward, but actually is a maze of layers. The task, maximum common substructure (MCS) searching, in RDKit is done by Chem.rdFMCS.FindMCS, which is highly customisable with lots of presets. What if one wanted to control in minute detail if a given atom X and is a match for atom Y? There is a way and this is how.

Continue reading

Machine learning strategies to overcome limited data availability

Machine learning (ML) for biological/biomedical applications is very challenging – in large part due to limitations in publicly available data (something we recently published about [1]). Substantial amounts of time and resources may be required to generate the types of data (eg protein structures, protein-protein binding affinity, microscopy images, gene expression values) required to train ML models, however.

In cases where there is sufficient data available to provide signal, but not enough for the desired performance, ML strategies can be employed:

Continue reading

Exploring the Observed Antibody Space (OAS)

The Observed Antibody Space (OAS) [1,2] is an amazing resource for investigating observed antibodies or as a resource for training antibody specific models, however; its size (over 2.4 billion unpaired and 1.5 million paired antibody sequences as of June 2023) can make it painful to work with. Additionally, OAS is extremely information rich, having nearly 100 columns for each antibody heavy or light chain, further complicating how to handle the data. 

From spending a lot of time working with OAS, I wanted to share a few tricks and insights, which I hope will reduce the pain and increase the joy of working with OAS!

Continue reading

Academic Reading? There’s an AI for that.

AI tools are literally everywhere. Recently, I stumbled across an AI aggregator website (theresanaiforthat.com) that, given a task, will find an AI solution. At the time of writing this article, there are 4871 AI’s across 1369 tasks, with solutions ranging from scribes to polygraph examiners. Recently, I stumbled across SciSpace (formerly typeset – https://typeset.io), an “AI assistant to understand scientific literature.” So, of course, I tested it out. In this blog post, we will explore the capabilities of SciSpace and discuss how it can potentially enhance your literature review process.

The user experience of a tool can make or break its adoption. Thankfully, SciSpace isn’t bad. Its main website offers basic search functionality, enabling you to find specific papers, topics, or authors within their database. I did notice that it is missing many new papers in its database; however, users have the option to upload a PDF for analysis. Additionally, each search result includes a TL;DR summary, providing a concise overview of the paper’s contents at a glance. As expected, this summary serves as a helpful reminder for familiar papers, but I often found it inadequate in providing enough information to grasp the main arguments or story of a paper. One interesting feature of SciSpace is the ability to “trace” papers in their database. By following the citations of a paper, users can navigate through related works, authors, and topics. I think this feature would be helpful during exploration and makes finding connections between related topics a little easier.

The best thing about SciSpace is the Copilot Chrome extension. Available whenever you open a paper’s PDF or journal link, it offers text analysis, summarization, and mathematical or table comprehension. It provides a set of common template prompts, which I found helpful. For example, “What were the key contributions of that paper?”, “What data and methods have been used in this paper?”, or “What are the limitations of this paper?” I found these prompts helpful in getting a quick overview of the work faster than reading the abstract, figures, and conclusion.

To put SciSpace Copilot to the test, I used it on my recent publication. The extension provided an accurate summary of the abstract and introduction. It effectively extracted the key result and arguments plus highlighted the main contributions of the work well. To be honest, it also offered a fair and accurate summary of the limitations of the study. It was helpful; however, it does not replace the need to read the full paper.

Tools like SciSpace are clearly becoming more popular and could potentially play a larger role in how we write, read, and understand research output. In the meantime, I’ve found it helpful to significantly improve the efficiency and effectiveness of my academic reading. Its clean, user-friendly interface, TL;DR summaries, and the impressive Copilot Chrome extension save me time. Plus, it’s completely free! I do expect that at some point it will become a paid tool. Until then, it’s a great way to stay on top of published work and build an understanding of related, but unfamiliar, fields.