Pearson Correlation

In statistics, Pearson’s r correlation coefficient quantifies the linear correlation between two variables X and Y.

The value is always between -1 and 1.

  • A value of 1 indicates a strong positive linear correlation.
  • A value of 0 indicates no correlation.
  • A value of -1 indicates a strong negative linear correlation.

One will find, that unlike covariance, correlation is scale-invariant, i.e. the scale of the data does not ultimately affect the correlation value.

Our confidence in our correlation value depends on how much data we have and is quantified by the p-value. This value tells us the probability that randomly sampled points would result in a similarly strong if not stronger relationship. The lower the p-value, the less likely that our correlation is due to randomness, and therefore the more confidence we are in our correlation value. This is in a way similar to hypothesis testing

Mathematically, the Pearson correlation between two variables X and Y is defined as the covariance between the two variables over the product of their variances:

\[ {\displaystyle \rho _{X,Y}={\frac {\operatorname {cov} (X,Y)}{\sigma _{X}\sigma _{Y} } } } \]


IMPORTANT: CORRELATION DOES NOT IMPLY CAUSATION!!!

Notes mentioning this note


Here are all the zettels in this zettelkasten, along with their links, visualized as a graph. You may need to zoom and pan around to see something.