Monthly Archives: July 2020

Prerecording Conference Talks and Posters using OBS Studio

Seemingly every conference due to take place this year has either been cancelled or will be run virtually due to the COVID-19 pandemic. Many organisers have decided that running entirely live virtual programmes causes more trouble than it’s worth (e.g. due to unforseeable IT and internet issues disrupting the schedule), and so are asking their presenters to prerecord their talks, which are then broadcast “live” on the day.

I recently “presented” two virtual prerecorded talks at the ISMB conference using Open Broadcast Software Studio (OBS Studio), a free open-source software package most commonly used by live-streamers on Twitch and Youtube. It is super simple to use and achieves a professional output, with video overlaying a presentation slide deck/poster PDF. This blog is a “how-to” on getting started with OBS for conference talks/poster presentations.

Continue reading

Pigs in the Parks: OPIG Social 28JUL2020

Tuesday afternoon normally heralds Group Meeting, the precious hour of the week where we gather on Zoom to hear about recently published papers, dissect each other’s research and, most importantly, bicker about appropriate usage of the servers. Knowing that Fergus B was on holiday this week and that a Group Meeting devoid of SLURM-inspired ranting would have felt strangely empty, it was instead decided that now was the time for the first in-person group social since the lockdown began in March.

Struggling to adapt to not being able to turn off Mic and Webcam – how on earth did we manage like this all the time before?!
Continue reading

Climate Change @ ISMB

Another special session I was listening to at ISMB 2020 was the Green stream. Several talks dealt with climate change and its relation to bioinformatics and computational biology. Two of them I found particularly interesting, one calculating the carbon footprint of ISMB itself and the other calculating the footprint of specific bioinformatics tools.

I believe most people have realised how important the issue of human-made climate change is and I assume that everyone has heard about some aspects of our life that are causing particularly many emissions compared to certain alternatives. For example, train rides vs. short-haul flights, eating the food’s food (veggies) vs. mass production of meat or renewable energies vs. coal plants, just to name some that are rather easy to change. Admittedly, I have also underestimated the urgency of the issue and I found this plot quite convincing:

(Screenshot from Alex Bateman’s talk)

What can we as computational researchers do about it?

Continue reading

Citizen Science in Video Games

What I really liked about visiting ISMB last year was their diversity of talks and subgroup meetings in all areas related to biology and computers. Last year I joined two talks about improving bioinformatics education which were really interesting because I hadn’t thought about that before. This year I joined a special session on citizen science.

Citizen science is public participation in scientific research and can be done by almost everyone. I had heard about Foldit or Rosetta@Home but (unfortunately) never participated. Those two projects deal with protein folding (how does a protein reach its final functional 3D structure?) which is an important scientific problem but is computationally very expensive to study. While one of the projects is a screensaver which uses free resources of personal computers, the other is a game where players can get highscores for folding protein fragments manually. Helping science in a playful way is cool by itself but the project that was presented in one of the talks brought this to the next level. A citizen science minigame was integrated into an action game for PCs and consoles.

Continue reading

Drug Promiscuity vs Selectivity

In drug discovery, compound promiscuity and selectivity refers to the ability of drug compounds to bind to several different- (promiscuous) or only one main target (selective). An important distinction here is that promiscuity is defined as specific interactions with multiple biological targets (polypharmacology) rather than a number of non-specific targets. At first glance, you might expect drugs to be designed to be as selective as possible, only hitting one biological target necessary to treat the disease and therefore reduce the chance of any side effects. This paradigm of single-target specificity has been challenged over the past two decades. Even between scientists in the drug discovery field, compound promiscuity is still a controversial topic. The field has increasingly paid attention to the topic of polypharmacology and studies have shown many pharmaceutically relevant compounds, including approved drugs to derive their biological activity from polypharmacology [1-3].

Continue reading

No labels, no problem! A quick introduction to Gaussian Mixture Models

Statistical Modelling Big Data AnalyticsTM is in vogue at the moment, and there’s nothing quite so fashionable as the neural network. Capable of capturing complex non-linear relationships and scalable for high-dimensional datasets, they’re here to stay.

For your garden-variety neural network, you need two things: a set of features, X, and a label, Y. But what do you do if labelling is prohibitively expensive or your expert labeller goes on holiday for 2 months and all you have in the meantime is a set of features? Happily, we can still learn something about the labels, even if we might not know what they are!

Continue reading

K-Means clustering made simple

The 21st century is often referred to as the age of “Big Data” due to the unprecedented increase in the volumes of data being generated. As most of this data comes without labels, making sense of it is a non-trivial task. To gain insight from unlabelled data, unsupervised machine learning algorithms have been developed and continue to be refined. These algorithms determine underlying relationships within the data by grouping data points into cluster families. The resulting clusters not only highlight associations within the data, but they are also critical for creating predictive models for new data.

Continue reading

Real Space Correlation Coefficient

Introduction

In crystalography we are often faced with the question of how well a part of our model fits the data. Now crystalography has well developed probability models for the reflection amplitudes given then entire fitted model, but these do not provide a metric for “how much of the ligand is inside the blob”. This is because the reflection based models are inherently global.

Continue reading

ICML 2020: Chemistry / Biology papers

ICML is one of the largest machine learning conferences and, like many other conferences this year, is running virtually from 12th – 18th July.

The list of accepted papers can be found here, with 1,088 papers accepted out of 4,990 submissions (22% acceptance rate). Similar to my post on NeurIPS 2019 papers, I will highlight several of potential interest to the chem-/bio-informatics communities. As before, given the large number of papers, these were selected either by “accident” (i.e. I stumbled across them in one way or another) or through a basic search (e.g. Ctrl+f “molecule”).

Continue reading