Author Archives: Eoin Malins

HERO proteins are here to save you (assuming you’re another protein or a fruit fly)

For one of OPIG’s short talks, I recently introduced the work done by Kotaro Tsuboyama et al. found in the paper A widespread family of heat-resistant obscure (Hero) proteins protect against protein instability and aggregation. As the name implies, HERO proteins have been found to retain function even after being boiled at 95C and have been found both in Drosophila and human HEK293T cell lines. Whilst it’s not impossible to find proteins which can “survive” 90+ Celsius, these are expected to be the reserve of extremophiles, not found in humans or fruit flies.

Continue reading

Considering Containers? – Go for Singularity

Docker is an excellent containerisation system ideally suited to production servers.  It allows you to do one small thing but do it well.  For example, breaking a large blog up into individually maintained containers for a web-server, a database and (say) a wordpress instance. However due to inherent security woes, Docker doesn’t play nicely with multi-tenanted machines, the kind which are the bread and butter for researchers and HPC users.  That’s where Singularity steps in.   

Continue reading

Making the most of your CPUs when using python

Over the last decade, single-threaded CPU performance has begun to plateau, whilst the number of logical cores has been increasing exponentially.

Like it or loathe it, for the last few years, python has featured as one of the top ten most popular languages [tiobe / PYPL].   That being said however, python has an issue which makes life harder for the user wanting to take advantage of this parallelism windfall.  That issue is called the GIL (Global Interpreter Lock).  The GIL can be thought of as the conch shell from Lord of the Flies.  You have to hold the conch (GIL) for your thread to be computed.  With only one conch, no matter how beautifully written and multithreaded your code, there will still only be one thread will be executed at any point in time.

Continue reading

Le Tour de Farce v6.0

Tuesday the 12th of June brought sun, cycling and beer to the land of OPIG. It was once again time for the annual Tour de Farce.

Le tour, now in its highly refined 6.0 version, covered a route which took us from the statistics department at Oxford (home to many OPIGlets) to our first port of call, The Head of the River pub. We then followed the river Isis (or Thames if you prefer) from the head of the river towards Osney mead.

Passing though Osney we soon arrived at our second waypoint. The Punter. One of the OPIGlets lived locally and so we were met by their trusty companion, who was better behaved than many of the others on Le Tour.

Departing the punter on two wheels (or in one case, on one) we followed the river upstream to The Perch.

 

Our arrival at The Perch was slightly hampered by a certain OPIGlet taking out anything in her path in her excitement.  Mr Sulu.. Ramming speed.

 

Those that survived soon left the perch, as we were once again headed upstream, this time to The Trout.

Having braved about half the journey it was now time for another restorative beverage and to take on supplies.  Sustenance was provided by Jacob’s Inn.   Jacob’s Inn has the advantage of goats, chickens and pigs in the back garden.  Having spent most of the afternoon in each other’s company, the company of pigs was preferable for some.

As we finished dinner, the sun was beginning to set and so we abandoned the original plan of finishing off at The Fishes.  Instead we returned southwards where we closed off the evening with a drink at The Royal Oak, mere yards from where we started the day.

The route of the 2018 v6.0 Tour de Farce.

 

 

Storing your stuff with clever filesystems: ZFS and tmpfs

The filesystem is a a critical component of just about any operating system, however it’s often overlooked. When setting up a new server, the default filesystem options are often ticked and never thought about again. However, there exist a couple of filesystems which can provide some extraordinary features and speed. I’m talking about ZFS and tmpfs.

ZFS was originally developed by Oracle for their Solaris operating system, but has now been open-sourced and is freely available on linux. Tmpfs is a temporary file system which uses system memory to provide fast temporary storage for files. Together, they can provide outstanding reliability and speed for not very much effort.

Hard disk capacity has increased exponentially over the last 50 years. In the 1960s, you could rent a 5MB hard disk from IBM for the equivalent of $130,000 per month. Today you can buy for less than $600 a 12TB disk – a 2,400,000 times increase.

As storage technology has moved on, the filesystems which sit on top of them ideally need to be able to access the full capacity of those ever increasing disks. Many relatively new, or at least in-use, filesystems have serious limitations. Akin to “640K ought to be enough for anybody”, the likes of the FAT32 filesystem supports files which are at most 4GB on a chunk of disk (a partition) which can be at most 16TB. Bear in mind that arrays of disks can provide a working capacity of many times that of a single disk. You can buy the likes of a supermicro sc946ed shelf which will add 90 disks to your server. In an ideal world, as you buy bigger disks you should be able to pile them into your computer and tell your existing filesystem to make use of them, your filesystem should grow and you won’t have to remember a different drive letter or path depending on the hardware you’re using.

ZFS is a 128-bit file system, which means a single installation maxes out at 256 quadrillion zettabytes. All metadata is allocated dynamically so there isn’t the need to pre-allocate inodes and directories can have up to 2^48 (256 trillion) entries. ZFS provides the concept of “vdevs” (virtual devices) which can be a single disk or redundant/striped collections of multiple disks. These can be dynamically added to a pool of vdevs of the same type and your storage will grow onto the fresh hardware.

A further consideration is that both disks of the “spinning rust” variety and SSDs are subject to silent data corruption, i.e. “bit rot”. This can be caused by a number of factors even including cosmic rays, but the consequence is read errors when it comes time to retrieve your data. Manufacturers are aware of this and buried in the small print for your hard disk will be values for “unrecoverable read errors” i.e. data loss. ZFS works around this by providing several mechanisms:

  • Checksums for each block of data written.
  • Checksums for each pointer to data.
  • Scrub – Automatically validates checksums when the system is idle.
  • Multiple copies – Even if you have a single disk, it’s possible to provide redundancy by setting a copies=n variable during filesystem creation.
  • Self-healing – When a bad data block is detected, ZFS fetches the correct data from a redundant copy and replaces it with the correct data.

An additional bonus to ZFS is its ability to de-duplicate data. Should you be working with a number of very similar files, on a normal filesystem, each file will take up space proportional to the amount of data that’s contained. As ZFS keeps checksums of each block of data, it’s able to determine if two blocks contain identical data. ZFS therefore provides the ability to have multiple pointers to the same file and only store the differences between them.

 

ZFS also provides the ability to take a point in time snapshot of the entire filesystem and roll it back to a previous time. If you’re a software developer, got a package that has 101 dependencies and you need to upgrade it? Afraid to upgrade it in case it breaks things horribly? Working on code and you want to roll back to a previous version? ZFS snapshots can be run with cron or manually and provide a version of the filesystem which can be used to extract previous versions of overwritten or deleted files or used to roll everything back to a point in time when it worked.

Similar to deduplication, a snapshot won’t take up any disk extra space until the data starts to change.

The other filesystem worth mentioning is tmpfs. Tmpfs takes part of the system memory and turns it into a usable filesystem. This is incredibly useful for systems which create huge numbers of temporary files and attempt to re-read them. Tmpfs is also just about as fast as a filesystem can be. Compared to a single SSD or a RAID array of six disks, tmpfs blows them out of the water speed wise.

Creating a tmpfs filesystem is simple:
First create your mountpoint for the disk:

mkdir /mnt/ramdisk

Then mount it. The options are saying make it 1GB in size, it’s of type tmpfs and to mount it at the previously created mount point:

mount –t tmpfs -o size=1024m tmpfs /mnt/ramdisk

At this point, you can use it like any other filesystem:

df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1  218G 128G   80G  62% /
/dev/sdb1  6.3T 2.4T  3.6T  40% /spinnyrust
tank       946G 3.5G  942G   1% /tank
tmpfs      1.0G 391M  634M  39% /mnt/ramdisk

Slowing the progress of prion diseases

At present, the jury is still out on how prion diseases affect the body let alone how to cure them. We don’t know if amyloid plaques cause neurodegeneration or if they’re the result of it. Due to highly variable glycophosphatidylinositol (GPI) anchors, we don’t know the structure of prions. Due to their incredible resistance to proteolysis, we don’t know a simple way to destroy prions even using in an autoclave. The current recommendation[0] by the World Health Organisation includes the not so subtle: “Immerse in a pan containing 1N sodium hydroxide and heat in a gravity displacement autoclave at 121°C”.

There are several species including Water Buffalo, Horses and Dogs which are immune to prion diseases. Until relatively recently it was thought that rabbits were immune too. “Despite rabbits no longer being able to be classified as resistant to TSEs, an outbreak of ‘mad rabbit disease’ is unlikely”.[1] That being said, other than the addition of some salt bridges and additional H-bonds, we don’t know if that’s why some animals are immune.

We do know at least two species of lichen (P. sulcata and L. plumonaria) have not only discovered a way to naturally break down prions, but they’ve evolved two completely independent pathways to do so. How they accomplish this? We’re still not sure in fact, it was only last year that it was discovered that lichens may be composed of three symbiotic partnerships and not two as previously thought.[3]

With all this uncertainty, one thing is known: PrPSc, the pathogenic form of the Prion converts PrPC, the cellular form. Just preventing the production of PrPC may not be a good idea, mainly because we don’t know what it’s there for in the first place. Previous studies using PrP-knockout have shown hints that:

  • Hematopoietic stem cells express PrP on their cell membrane. PrP-null stem cells exhibit increased sensitivity to cell depletion. [4]
  • In mice, cleavage of PrP proteins in peripheral nerves causes the activation of myelin repair in Schwann Cells. Lack of PrP proteins caused demyelination in those cells. [5]
  • Mice lacking genes for PrP show altered long-term potentiation in the hippocampus. [6]
  • Prions have been indicated to play an important role in cell-cell adhesion and intracellular signalling.[7]

However, an alternative approach which bypasses most of the unknowns above is if it were possible to make off with the substrate which PrPSc uses, the progress of the disease might be slowed. A study by R Diaz-Espinoza et al. was able to show that by infecting animals with a self-replicating non-pathogenic prion disease it was possible to slow the fatal 263K scrapie agent. From their paper [8], “results show that a prophylactic inoculation of prion-infected animals with an anti-prion delays the onset of the disease and in some animals completely prevents the development of clinical symptoms and brain damage.”

[0] https://www.cdc.gov/prions/cjd/infection-control.html
[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3323982/
[2] https://blogs.scientificamerican.com/artful-amoeba/httpblogsscientificamericancomartful-amoeba20110725lichens-vs-the-almighty-prion/
[3] http://science.sciencemag.org/content/353/6298/488
[4] “Prion protein is expressed on long-term repopulating hematopoietic stem cells and is important for their self-renewal”. PNAS. 103 (7): 2184–9. doi:10.1073/pnas.0510577103
[5] Abbott A (2010-01-24). “Healthy prions protect nerves”. Nature. doi:10.1038/news.2010.29
[6] Maglio LE, Perez MF, Martins VR, Brentani RR, Ramirez OA (Nov 2004). “Hippocampal synaptic plasticity in mice devoid of cellular prion protein”. Brain Research. Molecular Brain Research. 131 (1-2): 58–64. doi:10.1016/j.molbrainres.2004.08.004
[7] Málaga-Trillo E, Solis GP, et al. (Mar 2009). Weissmann C, ed. “Regulation of embryonic cell adhesion by the prion protein”. PLoS Biology. 7 (3): e55. doi:10.1371/journal.pbio.1000055
[8] http://www.nature.com/mp/journal/vaop/ncurrent/full/mp201784a.html

Le Tour de Farce v5.0

Every summer the OPIGlets go on a cycle ride across the scorched earth of Oxford in search of life-giving beer. Now in its fifth iteration, the annual Tour de Farce took place on us on Tuesday the 13th of June.

Establishments frequented included The Victoria, The Plough, Jacobs Inn (where we had dinner and didn’t get licked by their goats, certainly not), The Perch and finally The Punter. Whilst there were plans to go to The One for their inimitable “lucky 13s” by 11PM we were alas too late, so doubled down in The Punter.

Highlights of this years trip included certain members of the group almost immediately giving up when trying to ride a fixie and subsequently being shown up by our unicycling brethren.

Prions

The most recent paper presented to the OPIG journal club from PLOS Pathogens, The Structural Architecture of an Infectious Mammalian Prion Using Electron Cryomicroscopy. But prior to that, I presented a bit of a background to prions in general.

In the 1960s, work was being undertaken by Tikvah Alper and John Stanley Griffith on the nature of a transmissible infection which caused scrapie in sheep. They were interested in how studies of the infection showed it was somehow resistant to ionizing radiation. Infectious elements such as bacteria or viruses were normally destroyed by radiation with the amount of radiation required having a relationship with the size of the infectious particle. However, the infection caused by the scrapie agent appeared to be too small to be caused by even a virus.

In 1982, Stanley Prusiner had successfully purified the infectious agent, discovering that it consisted of a protein. “Because the novel properties of the scrapie agent distinguish it from viruses, plasmids, and viroids, a new term “prion” was proposed to denote a small proteinaceous infectious particle which is resistant to inactivation by most procedures that modify nucleic acids.”
Prusiner’s discovery led to him being awarded the Nobel Prize in 1997.

Whilst there are many different forms of infection, such as parasites, bacteria, fungi and viruses, all of these have a genome. Prions on the other hand are just proteins. Coming in two forms, the naturally occurring cellular (PrPC) and the infectious form PrPSC (Sc referring to scrapie), through an as yet unknown mechanism, PrPSC prions are able to reproduce by forcing beneign PrPC forms into the wrong conformation.  It’s believed that through this conformational change, the following diseases are caused.

  • Bovine Spongiform encephalopathy (mad cow disease)
  • Scrapie in:
    • Sheep
    • Goats
  • Chronic wasting disease in:
    • Deer
    • Elk
    • Moose
    • Reindeer
  • Ostrich spongiform encephalopathy
  • Transmissible mink encephalopathy
  • Feline spongiform  encephalopathy
  • Exotic ungulate encephalopathy
    • Nyala
    • Oryx
    • Greater Kudu
  • Creutzfeldt-Jakob disease in humans

 

 

 

 

 

 

 

 

Whilst it’s commonly accepted that prions are the cause of the above diseases there’s still debate whether the fibrils which are formed when prions misfold are the cause of the disease or caused by it. Due to the nature of prions, attempting to cure these diseases proves extremely difficult. PrPSC is extremely stable and resistant to denaturation by most chemical and physical agents. “Prions have been shown to retain infectivity even following incineration or after being subjected to high autoclave temperatures“. It is thought that chronic wasting disease is normally transmitted through the saliva and faeces of infected animals, however it has been proposed that grass plants bind, retain, uptake, and transport infectious prions, persisting in the environment and causing animals consuming the plants to become infected.

It’s not all doom and gloom however, lichens may long have had a way to degrade prion fibrils. Not just a way, but because it’s apparently no big thing to them, have done so twice. Tests on three different lichens species: Lobaria pulmonaria, Cladonia rangiferina and Parmelia sulcata, indicated at least two logs of reduction, including reduction “following exposure to freshly-collected P. sulcata or an aqueous extract of the lichen”. This has the potential to inactivate the infectious particles persisting in the landscape or be a source for agents to degrade prions.

Structure prediction through sequence physical properties

Today, protein structure is mainly predicted by aligning the unknown amino acid sequence against all sequences for which we already know the physical structure. Whilst sequences differing in length can be readily catered for by inserting or deleting (with or without affine gap penalties) the odd amino acid, there will frequently be cases where there are mutations. To compensate for this, the likes of the BLOSUM matrix is used to score the likelihood of one amino acid having mutated into another. For example, a hydrophobic residue is more likely to be swapped for something similar, than it is to be replaced with something strongly hydrophilic.

Whilst this is an entirely reasonable basis, there are many other physical property factors which which can be considered. In fact, the Amino Acid Index currently lists 544 physicochemical and biochemical properties of amino acids. A paper by Yi He et al. PNAS 2015;112:5029-5032 recently made use of a subset of these properties to predict structure. Their work shown below shows target T0797 (B) from CASP 11 compared with a purely physical structure predicted using their method (A) and the PSI BLAST candidate for the same sequence (C).

em structure phys properties

Even though their structure only had three residues in common with the target sequence, it is plainly more similar than the PSI BLAST attempt. The RMSD between structures A and B is also reported as being 0.73 Å, whilst PSI BLAST returns an RMSD of 2.09 Å.