Monthly Archives: February 2017


The most recent paper presented to the OPIG journal club from PLOS Pathogens, The Structural Architecture of an Infectious Mammalian Prion Using Electron Cryomicroscopy. But prior to that, I presented a bit of a background to prions in general.

In the 1960s, work was being undertaken by Tikvah Alper and John Stanley Griffith on the nature of a transmissible infection which caused scrapie in sheep. They were interested in how studies of the infection showed it was somehow resistant to ionizing radiation. Infectious elements such as bacteria or viruses were normally destroyed by radiation with the amount of radiation required having a relationship with the size of the infectious particle. However, the infection caused by the scrapie agent appeared to be too small to be caused by even a virus.

In 1982, Stanley Prusiner had successfully purified the infectious agent, discovering that it consisted of a protein. “Because the novel properties of the scrapie agent distinguish it from viruses, plasmids, and viroids, a new term “prion” was proposed to denote a small proteinaceous infectious particle which is resistant to inactivation by most procedures that modify nucleic acids.”
Prusiner’s discovery led to him being awarded the Nobel Prize in 1997.

Whilst there are many different forms of infection, such as parasites, bacteria, fungi and viruses, all of these have a genome. Prions on the other hand are just proteins. Coming in two forms, the naturally occurring cellular (PrPC) and the infectious form PrPSC (Sc referring to scrapie), through an as yet unknown mechanism, PrPSC prions are able to reproduce by forcing beneign PrPC forms into the wrong conformation.  It’s believed that through this conformational change, the following diseases are caused.

  • Bovine Spongiform encephalopathy (mad cow disease)
  • Scrapie in:
    • Sheep
    • Goats
  • Chronic wasting disease in:
    • Deer
    • Elk
    • Moose
    • Reindeer
  • Ostrich spongiform encephalopathy
  • Transmissible mink encephalopathy
  • Feline spongiform  encephalopathy
  • Exotic ungulate encephalopathy
    • Nyala
    • Oryx
    • Greater Kudu
  • Creutzfeldt-Jakob disease in humans









Whilst it’s commonly accepted that prions are the cause of the above diseases there’s still debate whether the fibrils which are formed when prions misfold are the cause of the disease or caused by it. Due to the nature of prions, attempting to cure these diseases proves extremely difficult. PrPSC is extremely stable and resistant to denaturation by most chemical and physical agents. “Prions have been shown to retain infectivity even following incineration or after being subjected to high autoclave temperatures“. It is thought that chronic wasting disease is normally transmitted through the saliva and faeces of infected animals, however it has been proposed that grass plants bind, retain, uptake, and transport infectious prions, persisting in the environment and causing animals consuming the plants to become infected.

It’s not all doom and gloom however, lichens may long have had a way to degrade prion fibrils. Not just a way, but because it’s apparently no big thing to them, have done so twice. Tests on three different lichens species: Lobaria pulmonaria, Cladonia rangiferina and Parmelia sulcata, indicated at least two logs of reduction, including reduction “following exposure to freshly-collected P. sulcata or an aqueous extract of the lichen”. This has the potential to inactivate the infectious particles persisting in the landscape or be a source for agents to degrade prions.

Parallel Computing: GNU Parallel

Recently I started using the OPIG servers to run the algorithm I have developed (CRANkS) on datasets from DUDE (Database of Useful Decoys Enhanced).

This required learning how to run jobs in parallel. Previously I had been using computer clusters with their own queuing system (Torque/PBS) which allowed me to submit each molecule to be scored by the algorithm as a separate job. The queuing system would then automatically allocate nodes to jobs and execute jobs accordingly. On a side note I learnt how to submit these jobs an array, which was preferable to submitting ~ 150,000 separate jobs:

qsub -t 1:X

where the contents of would be:


which would submit jobs to, where X is the total number of jobs.

However the OPIG servers do not have a global queuing system to use. I needed a way of being able to run the code I already had in parallel with minimal changes to the workflow or code itself. There are many ways to run jobs in parallel, but to minimise work for myself, I decided to use GNU parallel [1].

This is an easy-to-use shell tool, which I found quick and easy to install onto my home server, allowing me to access it on each of the OPIG servers.

To use it I simply run the command:

cat | parallel -j Y

where Y is the number of cores to run the jobs on, and contains:


This executes each job making use of Y number of cores when available to run the jobs in parallel.

Quick, easy, simple and minimal modifications needed! Thanks to Jin for introducing me to GNU Parallel!

[1] O. Tange (2011): GNU Parallel – The Command-Line Power Tool, The USENIX Magazine, February 2011:42-47.

Interesting Jupyter and IPython Notebooks

Here’s a treasure trove of interesting Jupyter and iPython notebooks, with lots of diverse examples relevant to OPIG, including an RDKit notebook, but also:

Entire books or other large collections of notebooks on a topic (covering Introductory Tutorials; Programming and Computer Science; Statistics, Machine Learning and Data Science; Mathematics, Physics, Chemistry, Biology; Linguistics and Text Mining; Signal Processing; Scientific computing and data analysis with the SciPy Stack; General topics in scientific computing; Machine Learning, Statistics and Probability; Physics, Chemistry and Biology; Data visualization and plotting; Mathematics; Signal, Sound and Image Processing; Natural Language Processing; Pandas for data analysis); General Python Programming; Notebooks in languages other than Python (Julia; Haskell; Ruby; Perl; F#; C#); Miscellaneous topics about doing various things with the Notebook itself; Reproducible academic publications; and lots more!  


Interesting Antibody Papers

Hints how broadly neutralizing antibodies arise (paper here). (Haynes lab here) Antibodies can be developed to bind virtually any antigen. There is a stark difference however between the ‘binding’ antibodies and ‘neutralizing’ antibodies. Binding antibodies are those that make contact with the antigen and perhaps flag it for elimination. This is in contrast to neutralizing antibodies, whose binding eliminates the biological activity of the antigen. A special class of such neutralizing antibodies are ‘broad neutralizing antibodies’. These are molecules which are capable of neutralizing multiple strains of the antigen. Such broadly neutralizing antibodies are very important in the fight against highly malleable diseases such as Influenza or HIV.

The process how such antibodies arise is still poorly understood. In the manuscript of Williams et al., they make a link between the memory and plasma B cells of broadly neutralizing antibodies and find their common ancestor. The common ancestor turned out to be auto-reactive, which might suggest that some degree of tolerance is necessary to allow for broadly neutralizing abs (‘hit a lot of targets fatally’). From a more engineering perspective, they create chimeras of the plasma and memory b cells and demonstrate that they are much more powerful in neutralizing HIV.

Ineresting data: their crystal structures are different broadly neutralizing abs co-crystallized with the same antigen (altought small…). Good set for ab-specific docking or epitope prediction — beyond the other case like that in the PDB (lysozyme)! At the time of writing the structures were still on hold in the PDB so watch this space…

Using RDKit to load ligand SDFs into Pandas DataFrames

If you have downloaded lots of ligand SDF files from the PDB, then a good way of viewing/comparing all their properties would be to load it into a Pandas DataFrame.

RDKit has a very handy function just for this – it’s found under the PandasTool module.

I show an example below within Jupypter-notebook, in which I load in the SDF file, view the table of molecules and perform other RDKit functions to the molecules.

First import the PandasTools module:

from rdkit.Chem import PandasTools

Read in the SDF file:

SDFFile = "./Ligands_noHydrogens_noMissing_59_Instances.sdf"
BRDLigs = PandasTools.LoadSDF(SDFFile)

You can see the whole table by calling the dataframe:


The ligand properties in the SDF file are stored as columns. You can view what these properties are, and in my case I have loaded 59 ligands each having up to 26 properties:

It is also very easy to perform other RDKit functions on the dataframe. For instance, I noticed there is no heavy atom column, so I added my own called ‘NumHeavyAtoms’:

BRDLigs['NumHeavyAtoms']=BRDLigs.apply(lambda x: x['ROMol'].GetNumHeavyAtoms(), axis=1)

Here is the column added to the table, alongside columns containing the molecules’ SMILES and RDKit molecule: