{"id":4015,"date":"2018-04-24T10:38:36","date_gmt":"2018-04-24T09:38:36","guid":{"rendered":"http:\/\/www.blopig.com\/blog\/?p=4015"},"modified":"2018-04-24T10:46:57","modified_gmt":"2018-04-24T09:46:57","slug":"measuring-correlation","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2018\/04\/measuring-correlation\/","title":{"rendered":"Measuring correlation"},"content":{"rendered":"<p><strong>Correlation<\/strong> is defined as how close two variables are to having a dependence relationship with each other. At first sight, it looks kind of simple, but there are two main problems:<\/p>\n<ol>\n<li>Despite the obvious situations (i.e. correlation = 1), it is difficult to say whether 2 variables are correlated or not (i.e correlation = 0.7). For instance, would you be able to say if the variables <em>X<\/em> and <em>Y<\/em> from the following to plots are correlated?<\/li>\n<li>There are different ways of measure of correlation that may not agree when comparing different distributions. As an example, which plot shows a higher correlation? The answer will depend on <em>how <\/em>you do measure the correlation since if you use Pearson correlation, you would pick A whereas if you choose Spearman correlation you will take B<\/li>\n<\/ol>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-4026 aligncenter\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2018\/04\/Examples-final.jpeg?resize=625%2C257&#038;ssl=1\" alt=\"\" width=\"625\" height=\"257\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2018\/04\/Examples-final.jpeg?resize=1024%2C421&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2018\/04\/Examples-final.jpeg?resize=300%2C123&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2018\/04\/Examples-final.jpeg?resize=768%2C316&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2018\/04\/Examples-final.jpeg?resize=624%2C257&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2018\/04\/Examples-final.jpeg?w=1410&amp;ssl=1 1410w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2018\/04\/Examples-final.jpeg?w=1250&amp;ssl=1 1250w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/p>\n<p>Here, I will explain some of the different correlation measures you can use:<\/p>\n<p><strong>Pearson product-moment correlation coefficient<\/strong><\/p>\n<ul>\n<li><em>What does it measure?<\/em> Only linear dependencies between the variables.<\/li>\n<li><em>How it is obtained?<\/em> By dividing the covariance of the two variables by the product of their standard deviations. (It is defined only if both of the standard deviations are finite and nonzero). <img decoding=\"async\" loading=\"lazy\" class=\"mwe-math-fallback-image-inline aligncenter\" src=\"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f76ccfa7c2ed7f5b085115086107bbe25d329cec\" alt=\"\\rho _{X,Y}={\\frac {\\operatorname {cov} (X,Y)}{\\sigma _{X}\\sigma _{Y}}}\" \/><\/li>\n<\/ul>\n<ul>\n<li><em>Properties:<\/em><\/li>\n<\/ul>\n<ol>\n<li>\u03c1 (X,Y) = +1 : perfect direct (increasing) linear relationship (correlation).<\/li>\n<li>\u03c1 (X,Y) = -1 : perfect decreasing (inverse) linear relationship (anticorrelation).<\/li>\n<li>In all other cases,\u00a0\u03c1 (X,Y) indicates the degree of linear dependence between the variables. As it approaches zero there is less of a relationship (closer to uncorrelated).<\/li>\n<li>Only gives a perfect value when X and Y are related by a linear function.<\/li>\n<\/ol>\n<ul>\n<li><em>When is it useful<\/em>? For the case of a linear model with a single independent variable, the coefficient of determination (R squared) is the square of r, Pearson&#8217;s product-moment coefficient.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><strong>Spearman&#8217;s rank correlation coefficient<\/strong>:<\/p>\n<ul>\n<li><em>What does it measure?<\/em> How well the relationship between two variables can be described using a monotonic function (a function that only goes up or only goes down).<\/li>\n<li><em>How it is obtained<\/em>? Pearson correlation between the rank values of the two variables.<\/li>\n<\/ul>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"mwe-math-fallback-image-inline aligncenter\" src=\"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a8dda555d22080d721679401fa13181cad3863f6\" alt=\"{\\displaystyle r_{s}=\\rho _{\\operatorname {rg} _{X},\\operatorname {rg} _{Y}}={\\frac {\\operatorname {cov} (\\operatorname {rg} _{X},\\operatorname {rg} _{Y})}{\\sigma _{\\operatorname {rg} _{X}}\\sigma _{\\operatorname {rg} _{Y}}}}}\" \/><\/p>\n<p>Only if all <i>n<\/i> ranks are <i>distinct integers<\/i>, it can be computed using the popular formula.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"mwe-math-fallback-image-inline aligncenter\" src=\"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b69578f3203ecf1b85b1a0929772b376ae07a3ce\" alt=\"{\\displaystyle r_{s}={1-{\\frac {6\\sum d_{i}^{2}}{n(n^{2}-1)}}}.}\" \/><\/p>\n<p><span class=\"mwe-math-element\">W<\/span>here <em>di<\/em> is the difference between the two ranks of each observation.<\/p>\n<ul>\n<li><em>Properties<\/em>:<\/li>\n<\/ul>\n<ol>\n<li>rs (X,Y) = +1:\u00a0 X and Y are related by any increasing monotonic function.<\/li>\n<li>rs (X,Y) = -1:\u00a0 X and Y are related by any decreasing monotonic function.<\/li>\n<li>The Spearman correlation increases in magnitude as X and Y become closer to being perfect monotone functions of each other.<\/li>\n<\/ol>\n<ul>\n<li><em>When is it useful?<\/em> It is appropriate for both continuous and discrete ordinal variables. It can be use for looking for non-linear dependence relationships.<\/li>\n<\/ul>\n<p><strong>Kendall&#8217;s tau coefficient<\/strong><\/p>\n<ul>\n<li><em>What does it measure? <\/em>The ordinal association between two measured quantities.<\/li>\n<li><em>How it is obtained<\/em>?<\/li>\n<\/ul>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"mwe-math-fallback-image-inline aligncenter\" src=\"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5860036b99dbbeba99829ec3adfd009b792c0a92\" alt=\"{\\displaystyle \\tau ={\\frac {({\\text{number of concordant pairs}})-({\\text{number of discordant pairs}})}{n(n-1)\/2}}.}\" \/><\/p>\n<p>Any pair of observations (xi , yi)\u00a0 and (xj, yj) are said to be concordant if the ranks for both elements agree. That happens if xi-xj and yi-xj have the same sign. If their sign are different, they are considered as discordant pairs<\/p>\n<ul>\n<li><em>Properties:<\/em><\/li>\n<\/ul>\n<ol>\n<li>\u03c4 (X,Y) = +1: The agreement between the two rankings is perfect (i.e., the two rankings are the same)<\/li>\n<li>\u03c4 (X,Y) = -1: The disagreement between the two rankings is perfect (i.e., one ranking is the reverse of the other)<\/li>\n<li>If X and Y are independent, then we would expect the coefficient to be approximately zero.<\/li>\n<\/ol>\n<ul>\n<li><em>When is it useful?<\/em> It is appropriate for both continuous and discrete ordinal variables. It can be use for looking for non-linear dependence relationships<em>.<\/em><\/li>\n<\/ul>\n<p><strong>Distance correlation:<\/strong><\/p>\n<ul>\n<li><em>What does it measure?<\/em> Both linear and nonlinear association between two random variables or random vectors.<\/li>\n<li><em>How is it obtained?<\/em> By dividing the variable&#8217;s distance covariance by the product of their distance standard deviations:<\/li>\n<\/ul>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"mwe-math-fallback-image-inline aligncenter\" src=\"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4705de443ddff6faf7822a4a0b02eaf42b0d7d33\" alt=\"\\operatorname {dCor}(X,Y)={\\frac {\\operatorname {dCov}(X,Y)}{{\\sqrt {\\operatorname {dVar}(X)\\,\\operatorname {dVar}(Y)}}}},\" \/><\/p>\n<p>The distance covariance is defined as:<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"mwe-math-fallback-image-inline aligncenter\" src=\"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/690d25098aa7a1a954e388785dd32cf64f54c6d9\" alt=\"{\\displaystyle \\operatorname {dCov} _{n}^{2}(X,Y):={\\frac {1}{n^{2}}}\\sum _{j=1}^{n}\\sum _{k=1}^{n}A_{j,k}\\,B_{j,k}.}\" \/><\/p>\n<p>Where:<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"mwe-math-fallback-image-inline aligncenter\" src=\"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2401660e88d209f5a9d732ab8b9bc6e3e4f87616\" alt=\"{\\displaystyle A_{j,k}:=a_{j,k}-{\\overline {a}}_{j\\cdot }-{\\overline {a}}_{\\cdot k}+{\\overline {a}}_{\\cdot \\cdot },\\qquad B_{j,k}:=b_{j,k}-{\\overline {b}}_{j\\cdot }-{\\overline {b}}_{\\cdot k}+{\\overline {b}}_{\\cdot \\cdot },}\" \/><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"mwe-math-fallback-image-inline aligncenter\" src=\"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/902f03b7eaae4d7c422dda6600d8c6f03e0a8f3c\" alt=\"{\\begin{aligned}a_{{j,k}}&amp;=\\|X_{j}-X_{k}\\|,\\qquad j,k=1,2,\\ldots ,n,\\\\b_{{j,k}}&amp;=\\|Y_{j}-Y_{k}\\|,\\qquad j,k=1,2,\\ldots ,n,\\end{aligned}}\" \/><\/p>\n<p>where || \u22c5 || denotes Euclidean norm.<\/p>\n<ul>\n<li><em>Properties:<\/em><\/li>\n<\/ul>\n<ol>\n<li>dCor (X,Y) = 0 if and only if the random vectors are independent.<\/li>\n<li>dCor (X,Y) = 1: Perfect dependence between the two distributions.<\/li>\n<li><span id=\"MathJax-Element-2-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-6\" class=\"mjx-math\"><span id=\"MJXc-Node-7\" class=\"mjx-mrow\"><span id=\"MJXc-Node-8\" class=\"mjx-texatom\"><span id=\"MJXc-Node-9\" class=\"mjx-mrow\"><span id=\"MJXc-Node-10\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-cal-R\">dCor <\/span><\/span><\/span><\/span><span id=\"MJXc-Node-11\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span id=\"MJXc-Node-12\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">X<\/span><\/span><span id=\"MJXc-Node-13\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">,<\/span><\/span><span id=\"MJXc-Node-14\" class=\"mjx-mi MJXc-space1\"><span class=\"mjx-char MJXc-TeX-math-I\">Y<\/span><\/span><span id=\"MJXc-Node-15\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><\/span><\/span> is defined for <span id=\"MathJax-Element-3-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-16\" class=\"mjx-math\"><span id=\"MJXc-Node-17\" class=\"mjx-mrow\"><span id=\"MJXc-Node-18\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">X<\/span><\/span><\/span><\/span><\/span> and <span id=\"MathJax-Element-4-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-19\" class=\"mjx-math\"><span id=\"MJXc-Node-20\" class=\"mjx-mrow\"><span id=\"MJXc-Node-21\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Y<\/span><\/span><\/span><\/span><\/span> in arbitrary dimension.<\/li>\n<\/ol>\n<ul>\n<li><em>When is it useful?<\/em> It is appropriate to find any kind\u00a0 dependence relationships between the 2 variables. Also if X and Y have different dimensions.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Correlation is defined as how close two variables are to having a dependence relationship with each other. At first sight, it looks kind of simple, but there are two main problems: Despite the obvious situations (i.e. correlation = 1), it is difficult to say whether 2 variables are correlated or not (i.e correlation = 0.7). [&hellip;]<\/p>\n","protected":false},"author":52,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[10,15],"tags":[],"ppma_author":[534],"class_list":["post-4015","post","type-post","status-publish","format-standard","hentry","category-groupmeetings","category-technical"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":534,"user_id":52,"is_guest":0,"slug":"javier","display_name":"Javier Pardo D\u00edaz","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/fc152623b8f763cda0c87f20eab1dfc3a69a2067beac85560657a717ed373605?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/4015","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/52"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=4015"}],"version-history":[{"count":11,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/4015\/revisions"}],"predecessor-version":[{"id":4034,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/4015\/revisions\/4034"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=4015"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=4015"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=4015"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=4015"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}