One could easily find dozens of reasons for which UNIX — mainly Ubuntu — is simply, the best operating system. Although I remember people in my proximity mentioning this for ages, it’s been only a few months that I’ve realized what are the true advantages. Helpful for this were all the people teaching/demonstrating in various modules during my first year in SABS/DTC: quite often we would be asked to do something in the console rather than by clicking the mouse. In the meanwhile, I’d wonder why using the console can be better from a nice, user-friendly GUI (i.e. Windows…). Tools like sed, grep, tar and of course alias-ing form a quick answer. I will not argue more about these but demonstrate two more tools/tricks.
Continue readingCategory Archives: Technical
Python Handout
Many OPIGlets extensively use Jupyter (in either Notebook or Lab flavour) to prototype and present their work. However, as project progress frequently notebooks are converted into regular python files for a number of reasons, losing the notebook functionality.
Wouldn’t it be nice if we could combine some of the benefits of Jupyter notebooks (not least the ability to present both code & results naturally) with regular python files?
Enter Python Handout.

Python Handout was recently (5th August 2019) released by Danijar Hafner and allows Python scripts to be converted into handouts with Markdown comments and inline figures (see above picture).
Installation is via pip (pip3 install -U handout
) and Python Handout supports python 3 scripts.
While I’ve not used Handout much (yet), I will definitely be experimenting more in the coming weeks.
How to Iterate in PyMOL
Sometimes pointing-and-clicking just doesn’t cut it. With PyMOL’s built-in Python interpreter, repetitive actions are made simple.
Continue readingConstrained Embedding with RDKit
This blog post explores the RDKit function ConstrainedEmbed.
Continue readingA gentle primer on quantum annealing
If you have done any computational work, you must have spent some time waiting for your program to run. As an undergraduate, I expected computational biology to be all fun and games: idyllic hours passing time while the computer works hard to deliver results… well, very different from the more typical frenetically staring at the computer, wishing the program would run faster. But there is more — there are some problems that are so intrinsically expensive that, even if you had access to all the computers on Earth, it would take more than your lifetime to solve a slightly non-trivial case of them. Some examples are full configuration interaction calculations in quantum chemistry, factorisation of prime numbers, optimal planning, and a long, long, etcetera. Continue reading
Some useful tools
For my blog post this week, I thought I would share, as the title suggests, a small collection of tools and packages that I found to make my work a bit easier over the last few months (mainly python based). I might add to this list as I find new tools that I think deserve a shout-out.
Biopandas
Reading in .pdb files for processing and writing your own parser (while being a good exercise to familiarize yourself with the format) is a pain and clutters your code with boilerplate.
Luckily for us, Sebastian Raschka has written a neat package called biopandas [1] which enables quick I/O of .pdb files via the pandas DataFrame class.
Continue readingMaking the most of your CPUs when using python
Over the last decade, single-threaded CPU performance has begun to plateau, whilst the number of logical cores has been increasing exponentially.
Like it or loathe it, for the last few years, python has featured as one of the top ten most popular languages [tiobe / PYPL]. That being said however, python has an issue which makes life harder for the user wanting to take advantage of this parallelism windfall. That issue is called the GIL (Global Interpreter Lock). The GIL can be thought of as the conch shell from Lord of the Flies. You have to hold the conch (GIL) for your thread to be computed. With only one conch, no matter how beautifully written and multithreaded your code, there will still only be one thread will be executed at any point in time.
Graph-based Methods for Cheminformatics
In cheminformatics, there are many possible ways to encode chemical data represented by small molecules and proteins, such as SMILES, fingerprints, chemical descriptors etc. Recently, utilising graph-based methods for machine learning have become more prominent. In this post, we will explore why representing molecules as graphs is a natural and suitable encoding. Continue reading
Automated testing with doctest
One of the ways to make your code more robust to unexpected input is to develop with boundary cases in your mind. Test-driven code development begins with writing a set of unit tests for each class. These tests often includes normal and extreme use cases. Thanks to packages like doctest
for Python, Mocha and Jasmine for Javascript etc., we can write and test codes with an easy format. In this blog post, I will present a short example of how to get started with doctest in Python. N.B. doctest
is best suited for small tests with a few scripts. If you would like to run a system testing, look for some other packages!
So, you are interested in compound selectivity and machine learning papers?
At the last OPIG meeting, I gave a talk about compound selectivity and machine learning approaching to predict whether a compound might be selective. As promised, I hereby provide a list publications I would hand to a beginner in the field of compound selectivity and machine learning. Continue reading