Anatomy of a blog post

Now, shall I use the first person singular or plural to write this?  Active or passive voice? …

It doesn’t really matter.  This isn’t a formal article, and you can even use abbreviations.  This group blog, like anything else during our time in Oxford, is an experiment.  We will give it a few months and see what happens.  If it pans out, we will have a, more or less, detailed research journal for the group.  Not to mention a link with the outside world (prospective students? employers?) and proof that we can “communicate” with others. And since this is an exploratory exercise we should have freedom to explore what we want to write about.

We should have plenty of fodder.  Let us face it, if we do not do some mildly interesting science every week then we are probably not having enough fun.  But even if you are working on a hushed up, undercover project (e.g. the next blockbuster drug against Malaria) – there are still so many interesting bits of our D.Phil. which would otherwise never see the light of day.

For inspiration, have a look at other popular scientific blogs – the chembl one is both educational and humourous in equal measures (Post Idea #1: list of bio/cheminformatics blogs which every grad student should read).  Blogs are a great way to survey literature without actually doing any reading (Post Idea #2: tricks to increase grad student productivity… what do you mean you don’t use Google Alerts to surprise your supervisor with a link to a paper published the day before?); and for a TL;DR version there is twitter (Post Idea #3: Idea #1 but for twitter instead).  I only found out about the four stranded DNA in human cells by following @biomol_info.

And of course, we are mostly a computational group – so software is what we churn out on a daily basis.  How much of the software we write ends up resting forever on our disks, never to be used again.  The masses want splitchain!  (Idea #4: post software you wrote).  And there is benefit in not only giving out software, but also explaining the internals with snippets  (Idea #5: a clever algorithm explained line-by-line).

And then there is the poster you hung up once (Idea: #6) or the talk you gave and prepared for hours on your disposable, use-once-only slides (Idea: #7).  There is the announcement of publishing a paper – that solemn moment in academia when someone else thinks what you have done is worthy (Idea: #8 – btw well done to our own Jamie Hill for his recent MP-T work).

And if your an athelete, like Anna (Dr. Lewis) who crossed the atlantic in a rowing boat or Eleanor who used to row for the blues – what can I say, this is how we roll, or row [feeble attempt at humour] – thats a non-scientific but unique and interesting experience too (Idea #8).  .

If you’ve read a paper and you think it’s interesting comment on it – people will follow your posts just because it acts like a literature filter (Idea #9).  You can probably even have a rant (Idea #10); as long as its more positive and less bitter than Fred Ross’ Farewell to Bioinformatics.

Finally, this post is long and tedious for the reader.  But that is ok too – like everything else here, it is a learning experience and the more I write the more I will improve.  So hey, I’m also doing this to write a better thesis (i.e. to make the writing less painful).

An addendum; my initial intention was to discuss the bits which make a good blog post.  You can find lots of articles about this – so it is less interesting; but here are the main points

cover_real_conformers

If a picture is really worth a thousand words, 30 of these is all I need for my thesis.

 

Journal club: Principles for designing ideal protein structures

The goal of protein design is to generate a sequence that assumes  a certain structure and/or performs a specific function. A recent paper in Nature has attempted to design sequences for each of five naturally occurring protein folds. The success rate ranges from 10-40%.

This recent work comes from the Baker group, who are best known for Rosetta and have made several previous steps in this direction. In a 2003 paper this group stripped several naturally occurring proteins down to the backbone, and then generated sequences whose side-chains were consistent with these backbone structures. The sequences were expressed and found to fold into proteins, but the structure of these proteins remained undetermined. Later that same year the group designed a protein, Top7, with a novel fold and confirmed that its structure closely matched that of the design (RMSD of 1.2A).

The proteins designed in these three pieces of work (the current paper and the two papers from 2003) all tend to be more stable than naturally occurring proteins. This increased stability may explain why, as with the earlier Top7, the final structures in the current work closely match the design (RMSD 1 or 2A), despite ab initio structure prediction rarely being this accurate. These structures are designed to sit in a deep potential well in the Rosetta energy function, whereas natural proteins presumably have more complicated energy landscapes that allow for conformational changes and easy degradation. Designing a protein with two or more conformations is a challenge for the future.

In the current work, several sequences were designed for each of the fold types. These sequences have substantial sequence similarity to each other, but do not match existing protein families. The five folds all belong to the alpha + beta or alpha/beta SCOP classes. This is a pragmatic choice: all-alpha proteins often fold into undesired alternative topologies, and all-beta proteins are prone to aggregation. By contrast, rules such as the right-handedness of beta-alpha-beta turns have been known since the 1970s, and can be used to help design a fold.

The authors describe several other rules that influence the packing of beta-alpha-beta, beta-beta-alpha and alpha-beta-beta structural elements. These relate the lengths of the elements and their connective loops with the handedness of the resulting subunit. The rules and their derivations are impressive, but it is not clear to what extent they are applied in the design of the 5 folds. The designed folds contain 13 beta-alpha-beta subunits, but only 2 alpha-beta-beta subunits, and 1 beta-beta-alpha subunit.

An impressive feature of the current work is the use of the Rosetta@home project to select sequences with funnelled energy landscapes, which are less likely to misfold. Each candidate sequence was folded >200000 times from an extended chain. Only ~10% of sequences had a funnelled landscape. It would have been interesting to validate whether the rejected sequences really were less likely to adopt the desired fold — especially given that this selection procedure requires vast computational resources.

The design of these five novel proteins is a great achievement, but even greater challenges remain. The present designs are facilitated by the use of short loops in regions connecting secondary structure elements. Functional proteins will probably require longer loops, more marginal stabilities, and a greater variety of secondary structure subunits.

Talk: Membrane Protein 3D Structure Prediction & Loop Modelling in X-ray Crystallography

Seb gave a talk at the Oxford Structural Genomics Consortium on Wednesday 9 Jan 2013. The talk mentioned the work of several other OPIG members. Below is the gist of it.

Membrane protein modelling pipeline

Homology modelling pipeline with several membrane-protein-specific steps. Input is the target protein’s sequence, output is the finished 3D model.

Fragment-based loop modelling pipeline for X-ray crystallography

Given an incomplete model of a protein, as well as the current electron density map, we apply our loop modelling method FREAD to fill in a gap with many decoy structures. These decoys are then scored using electron density quality measures computed by EDSTATS. This process can be iterated to arrive at a complete model.

Over the past five years the Oxford Protein Informatics Group has produced several pieces of software to model various aspects of membrane protein structure. iMembrane predicts how a given protein structure sits in the lipid bilayer. MP-T aligns a target protein’s sequence to an iMembrane-annotated template structure. MEDELLER produces an accurate core model of the target, based on this target-template alignment. FREAD then fills in the remaining gaps through fragment-based loop modelling. We have assembled all these pieces of software into a single pipeline, which will be released to the public shortly. In the future, further refinements will be added to account for errors in the core model, such as helix kinks and twists.

X-ray crystallography is the most prevalent way to obtain a protein’s 3D structure. In difficult cases, such as membrane proteins, often only low resolution data can be obtained from such experiments, making the subsequent computational steps to arrive at a complete 3D model that much harder. This usually involves tedious manual building of individual residues and much trial and error. In addition, some regions of the protein (such as disordered loops) simply are not represented by the electron density at all and it is difficult to distinguish these from areas that simply require a lot of work to build. To alleviate some of these problems, we are developing a scoring scheme to attach an absolute quality measure to each residue being built by our loop modelling method FREAD, with a view towards automating protein structure solution at low resolution. This work is being carried out in collaboration with Frank von Delft’s Protein Crystallography group at the Oxford Structural Genomics Consortium.