Author Archives: Javier Pardo Díaz

Because not all interesting biology is health-related!

Nowadays, biological research science spins around health: Cancer. Neuroscience. Immunology. Pharmacology. And many more health-related areas which are being deeply studied. It seems that everyone is keen to spend their lives looking for the cure of cancer or Alzheimer. What a drag! For this reason (and also to show that research in less popular and less founded sectors can also improve significantly human lives), I have decided to write about something completely different: plant microbiome!

Indeed, I am going to write about bacteria. And no, they are not related to health at all. These bacteria live the soil and infect plants. However, they are not “bad”. Actually, they favour the plant’s growth and development. This is possible thanks to a fascinating process which finishes (ALERT SPOILER!!) with the bacteria transforming the atmospheric nitrogen into ammonia that can be used by the plant (nitrogen fixation).

The process starts with some kind of small talk between Rhizobium (the bacteria) and the legume (the plant): Legumes secrete compounds through their roots that the bacteria living close by can detect. In response to this stimulus, bacteria approach the root hairs of the plant and attach and secrete lipo-chitooligosaccharides known as Nod factors.

It continues with some action: The plants sense the Nod factors, which induce the root hairs curling and trapping the bacteria. The bacteria continue to grow and eventually form an infection thread whose growth allows the bacteria to reach other plant cells.

And it finishes with a happily ever after ending: A structure called a nodule is formed. The bacteria in the nodule form an organelle called the symbiosome, within which the bacteria differentiate to a state called bacteroid. In this stage, the bacteroid fixes nitrogen for the plant.

I know… Everything has happened too fast (the process can take 1 – 2 weeks). And I have not been bothered to explain it in detail so you can enjoy reading this amazing review: https://www.ncbi.nlm.nih.gov/pubmed/23493145

But wait! I almost forget to say why is worth studying this… The point is that plants need nitrogen to grow and they cannot use atmospheric nitrogen. Therefore, the more nitrogen they receive from the bacteria, the more they will grow. Consequently, we may increase the quantity of food available by improving this process.

Measuring correlation

Correlation is defined as how close two variables are to having a dependence relationship with each other. At first sight, it looks kind of simple, but there are two main problems:

  1. Despite the obvious situations (i.e. correlation = 1), it is difficult to say whether 2 variables are correlated or not (i.e correlation = 0.7). For instance, would you be able to say if the variables X and Y from the following to plots are correlated?
  2. There are different ways of measure of correlation that may not agree when comparing different distributions. As an example, which plot shows a higher correlation? The answer will depend on how you do measure the correlation since if you use Pearson correlation, you would pick A whereas if you choose Spearman correlation you will take B

Here, I will explain some of the different correlation measures you can use:

Pearson product-moment correlation coefficient

  • What does it measure? Only linear dependencies between the variables.
  • How it is obtained? By dividing the covariance of the two variables by the product of their standard deviations. (It is defined only if both of the standard deviations are finite and nonzero). \rho _{X,Y}={\frac {\operatorname {cov} (X,Y)}{\sigma _{X}\sigma _{Y}}}
  • Properties:
  1. ρ (X,Y) = +1 : perfect direct (increasing) linear relationship (correlation).
  2. ρ (X,Y) = -1 : perfect decreasing (inverse) linear relationship (anticorrelation).
  3. In all other cases, ρ (X,Y) indicates the degree of linear dependence between the variables. As it approaches zero there is less of a relationship (closer to uncorrelated).
  4. Only gives a perfect value when X and Y are related by a linear function.
  • When is it useful? For the case of a linear model with a single independent variable, the coefficient of determination (R squared) is the square of r, Pearson’s product-moment coefficient.

 

Spearman’s rank correlation coefficient:

  • What does it measure? How well the relationship between two variables can be described using a monotonic function (a function that only goes up or only goes down).
  • How it is obtained? Pearson correlation between the rank values of the two variables.

{\displaystyle r_{s}=\rho _{\operatorname {rg} _{X},\operatorname {rg} _{Y}}={\frac {\operatorname {cov} (\operatorname {rg} _{X},\operatorname {rg} _{Y})}{\sigma _{\operatorname {rg} _{X}}\sigma _{\operatorname {rg} _{Y}}}}}

Only if all n ranks are distinct integers, it can be computed using the popular formula.

{\displaystyle r_{s}={1-{\frac {6\sum d_{i}^{2}}{n(n^{2}-1)}}}.}

Where di is the difference between the two ranks of each observation.

  • Properties:
  1. rs (X,Y) = +1:  X and Y are related by any increasing monotonic function.
  2. rs (X,Y) = -1:  X and Y are related by any decreasing monotonic function.
  3. The Spearman correlation increases in magnitude as X and Y become closer to being perfect monotone functions of each other.
  • When is it useful? It is appropriate for both continuous and discrete ordinal variables. It can be use for looking for non-linear dependence relationships.

Kendall’s tau coefficient

  • What does it measure? The ordinal association between two measured quantities.
  • How it is obtained?

{\displaystyle \tau ={\frac {({\text{number of concordant pairs}})-({\text{number of discordant pairs}})}{n(n-1)/2}}.}

Any pair of observations (xi , yi)  and (xj, yj) are said to be concordant if the ranks for both elements agree. That happens if xi-xj and yi-xj have the same sign. If their sign are different, they are considered as discordant pairs

  • Properties:
  1. τ (X,Y) = +1: The agreement between the two rankings is perfect (i.e., the two rankings are the same)
  2. τ (X,Y) = -1: The disagreement between the two rankings is perfect (i.e., one ranking is the reverse of the other)
  3. If X and Y are independent, then we would expect the coefficient to be approximately zero.
  • When is it useful? It is appropriate for both continuous and discrete ordinal variables. It can be use for looking for non-linear dependence relationships.

Distance correlation:

  • What does it measure? Both linear and nonlinear association between two random variables or random vectors.
  • How is it obtained? By dividing the variable’s distance covariance by the product of their distance standard deviations:

\operatorname {dCor}(X,Y)={\frac {\operatorname {dCov}(X,Y)}{{\sqrt {\operatorname {dVar}(X)\,\operatorname {dVar}(Y)}}}},

The distance covariance is defined as:

{\displaystyle \operatorname {dCov} _{n}^{2}(X,Y):={\frac {1}{n^{2}}}\sum _{j=1}^{n}\sum _{k=1}^{n}A_{j,k}\,B_{j,k}.}

Where:

{\displaystyle A_{j,k}:=a_{j,k}-{\overline {a}}_{j\cdot }-{\overline {a}}_{\cdot k}+{\overline {a}}_{\cdot \cdot },\qquad B_{j,k}:=b_{j,k}-{\overline {b}}_{j\cdot }-{\overline {b}}_{\cdot k}+{\overline {b}}_{\cdot \cdot },}

{\begin{aligned}a_{{j,k}}&=\|X_{j}-X_{k}\|,\qquad j,k=1,2,\ldots ,n,\\b_{{j,k}}&=\|Y_{j}-Y_{k}\|,\qquad j,k=1,2,\ldots ,n,\end{aligned}}

where || ⋅ || denotes Euclidean norm.

  • Properties:
  1. dCor (X,Y) = 0 if and only if the random vectors are independent.
  2. dCor (X,Y) = 1: Perfect dependence between the two distributions.
  3. dCor (X,Y) is defined for X and Y in arbitrary dimension.
  • When is it useful? It is appropriate to find any kind  dependence relationships between the 2 variables. Also if X and Y have different dimensions.