Last Wednesday, I was fortunate enough to be invited as a guest lecturer to the 3rd BioDataScience101 workshop, an initiative spearheaded by Paolo Marcatili, Professor of Bioinformatics at the Technical University of Denmark (DTU). This session, on amino acid sequence analysis applied to both proteomics and antibody drug discovery, was designed and organised by OPIG’s very own Tobias Olsen.
Continue readingCategory Archives: Conferences
NeurIPS 2020: Chemistry / Biology papers
Another blog post, another look at accepted papers for a major ML conference. NeurIPS joins the other major machine learning conferences (and others) in moving virtual this year, running from 6th – 12th December 2020. In a continuation of past posts (ICML 2020, NeurIPS 2019), I will highlight several of potential interest to the chem-/bio-informatics communities
The list of accepted papers can be found here, with 1,903 papers accepted out of 9,467 submissions (20% acceptance rate).
In addition to the main conference, there are several workshops highly related to the type of research undertaken in OPIG: Machine Learning in Structural Biology and Machine Learning for Molecules.
The usual caveat: given the large number of papers, these were selected either by “accident” (i.e. I stumbled across them in one way or another) or through a basic search (e.g. Ctrl+f “molecule”). If you find any I have missed, please reach out and I will update accordingly.
Continue readingLearning from Biased Datasets
Both the beauty and the downfall of learning-based methods is that the data used for training will largely determine the quality of any model or system.
While there have been numerous algorithmic advances in recent years, the most successful applications of machine learning have been in areas where either (i) you can generate your own data in a fully understood environment (e.g. AlphaGo/AlphaZero), or (ii) data is so abundant that you’re essentially training on “everything” (e.g. GPT2/3, CNNs trained on ImageNet).
This covers only a narrow range of applications, with most data not falling into one of these two categories. Unfortunately, when this is true (and even sometimes when you are in one of those rare cases) your data is almost certainly biased – you just may or may not know it.
Continue readingPrerecording Conference Talks and Posters using OBS Studio
Seemingly every conference due to take place this year has either been cancelled or will be run virtually due to the COVID-19 pandemic. Many organisers have decided that running entirely live virtual programmes causes more trouble than it’s worth (e.g. due to unforseeable IT and internet issues disrupting the schedule), and so are asking their presenters to prerecord their talks, which are then broadcast “live” on the day.
I recently “presented” two virtual prerecorded talks at the ISMB conference using Open Broadcast Software Studio (OBS Studio), a free open-source software package most commonly used by live-streamers on Twitch and Youtube. It is super simple to use and achieves a professional output, with video overlaying a presentation slide deck/poster PDF. This blog is a “how-to” on getting started with OBS for conference talks/poster presentations.
Continue readingClimate Change @ ISMB
Another special session I was listening to at ISMB 2020 was the Green stream. Several talks dealt with climate change and its relation to bioinformatics and computational biology. Two of them I found particularly interesting, one calculating the carbon footprint of ISMB itself and the other calculating the footprint of specific bioinformatics tools.
I believe most people have realised how important the issue of human-made climate change is and I assume that everyone has heard about some aspects of our life that are causing particularly many emissions compared to certain alternatives. For example, train rides vs. short-haul flights, eating the food’s food (veggies) vs. mass production of meat or renewable energies vs. coal plants, just to name some that are rather easy to change. Admittedly, I have also underestimated the urgency of the issue and I found this plot quite convincing:

What can we as computational researchers do about it?
Continue readingCitizen Science in Video Games
What I really liked about visiting ISMB last year was their diversity of talks and subgroup meetings in all areas related to biology and computers. Last year I joined two talks about improving bioinformatics education which were really interesting because I hadn’t thought about that before. This year I joined a special session on citizen science.
Citizen science is public participation in scientific research and can be done by almost everyone. I had heard about Foldit or Rosetta@Home but (unfortunately) never participated. Those two projects deal with protein folding (how does a protein reach its final functional 3D structure?) which is an important scientific problem but is computationally very expensive to study. While one of the projects is a screensaver which uses free resources of personal computers, the other is a game where players can get highscores for folding protein fragments manually. Helping science in a playful way is cool by itself but the project that was presented in one of the talks brought this to the next level. A citizen science minigame was integrated into an action game for PCs and consoles.

ICML 2020: Chemistry / Biology papers
ICML is one of the largest machine learning conferences and, like many other conferences this year, is running virtually from 12th – 18th July.
The list of accepted papers can be found here, with 1,088 papers accepted out of 4,990 submissions (22% acceptance rate). Similar to my post on NeurIPS 2019 papers, I will highlight several of potential interest to the chem-/bio-informatics communities. As before, given the large number of papers, these were selected either by “accident” (i.e. I stumbled across them in one way or another) or through a basic search (e.g. Ctrl+f “molecule”).
Continue readingBayesian Optimization and Correlated Torsion Angles—in Small Molecules
Our collaborator, Prof. Geoff Hutchison from the University of Pittsburg recently took part in the Royal Society of Chemistry’s 2020 Twitter Poster Conference, to highlight the great work carried out by one of my DPhil students, Lucian Leung Chan, on the application of Bayesian optimization to conformer generation:
Conference feedback – MABRA workshop “Adaptive immune repertoires and beyond”
Slightly belated, these are our thoughts on the MABRA workshop at the University of Surrey, which five OPIGlegts attended in January 2020.
Cooking Up a (Deep)STORM with a Little Cup of Super Resolution Microscopy
Recently, I attended the Quantitative BioImaging (QBI) Conference 2020, served right here in Oxford. Amongst the many methods on the menu were new recipes for spicing up your Cryo-EM images with a bit of CiNNamon with a peppering of Poisson point processes in the inhomogeneous spatial case amongst many others. However, like many of today’s top tier restaurants most of the courses on offer were on the smaller side, nano-scale in fact, serving up the new field of Super Resolution Microscopy!
Continue reading