Two Tools for Systematically Compiling Ensembles of Protein Structures

In order to know how a protein works, we generally want to know its 3-dimensional structure. We then can either try to solve it ourselves (which requires considerable time, skill, and resources), or look for it in the Protein Data Bank, in case it has already been solved. The vast majority of structures in the Protein Data Bank (PDB) are solved through protein crystallography, and represent a “snapshot” of the conformational space available to our protein of interest.

As proteins have a degree of inherent flexibility, we may be interested in looking at more structures of the same protein. This information can help answer important questions, such as how does the protein behave with small molecule ligands, does it undergo any major conformational changes, and so on. With the wealth of structural data in the PDB, multiple structures are available for a vast and increasing number of proteins, making this type of analysis possible.

If we know the protein sequence, or some other identifier (Uniprot ID, for example), we can also download all of its available structures, or choose only those that pass a set of filters, such resolution, year of deposition, or organism, among others. This can be done either through the PDB’s web interface (link), or programmatically through the PDBs RESTful services (link).

Once we have downloaded the structures, we usually want to somehow compare them to each other. This can be complicated by the fact that different structures might have different residue numbering, unresolved residues in key locations, different orientation in space, multiple chains, and other mismatched annotations.

In this blog post, I wanted to share two tools that make the process of compiling ensembles of protein structures more automated and their downstream analysis more straightforward, and which I have found quite useful. This is not meant to be a review of all published tools for this task.

SIENA [1] is an automated pipeline that compiles protein structure ensembles from the PDB, given a query structure. It performs structure validation and alignment, filters according to user-defined structure criteria, and can also perform ensemble reduction, which can be useful in cases where a protein has a great number of highly redundant structures.

SIENA is available as a web service here: https://proteins.plus/2ozr#siena

And can also be accessed programmatically through RESTful services: https://proteins.plus/help/siena_rest

KLIFS [2] is another tool I have used for compiling structural ensembles, but which is focussed on a single, very therapeutically relevant protein family: protein kinases. In addition to alignment and structure validation, KLIFS allows the user to query for structures in specific conformations (DFG loop in, out, angle of glycine-rich loop, position of alpha-C-helix), whether and what types of ligands are bound (orthosteric, allosteric, molecular weight range, etc), and even what conserved waters are present.

KLIFS can be found here: http://klifs.vu-compmedchem.nl/

Happy ensemble-hunting!

Papers:
[1] Bietz, S. Rarey, M.: SIENA: Efficient Compilation of Selective Protein Binding Site Ensembles. Journal of Chemical Information and Modeling,56(1): 248-59.

[2] Kooistra, A. et al., KLIFS: a structural kinase-ligand interaction database, Nucleic Acids Research, Volume 44, Issue D1, 4 January 2016, Pages D365–D371, https://doi.org/10.1093/nar/gkv1082

Author