Real Space Correlation Coefficient

Introduction

In crystalography we are often faced with the question of how well a part of our model fits the data. Now crystalography has well developed probability models for the reflection amplitudes given then entire fitted model, but these do not provide a metric for “how much of the ligand is inside the blob”. This is because the reflection based models are inherently global.

Instead we might consider the electron density map, the distribution of electron density throughout the unit cell approximated as a three dimensional real array calculated from the reflections, to be the “truth” and measure distance of the model from that locally.

One such measure is the Real Space Correlation Coefficient, or RSCC. This is commonly used for assesing the goodness of fit of partial models, particarly backbone residues and bound ligands.

Theory

The RSCC is defined by the following formula:

Where:
– rho obs: the observed electron density at a point, as calculated from the FFT of the fit reflections
– rho calc: the electron density predicted at a point by a model (in particular a molecular model together with a electron density model)

This formulation has a number of motivations:
– A value of 1 indicates perfect correspondance between observed electron density and predicted
– It does not require the values to be scaled

One concern that one might have is that RSCC does not propogate uncertainty from the statistical model and so is a point estimate of goodness of fit. In particular, it relies on the overall qaulity being good for the values suggested by RSCC to be meaningful. However in most use cases, where the backbone is already mostly fit, the phases tend to be sufficiently well resolved that far more meaningful is experimental uncertainty and artifacting.

Practical

To calculate an RSCC you will require both:
– A Model
– A reflection file (i.e. mtz) or electron density map (i.e. ccp4)

After this there are several ways in which to find the RSCC in practice. Unfortunately, most tools do not report their results in a structured format, so require formating.

Phenix

With the phenix package, this can be done as:

phenix.model_vs_data model.pdb data.hkl

Importantly it should be noted that the data should be in .mtz format.

CCP4

With the CCP4 package on the other hand this can be done with EDSTATS as:

echo resl=50,resh=2.1  | edstats  XYZIN in.pdb  MAPIN1 fo.map

Importantly it should be noted that the map must be in .ccp4 format

Author