# Pearson Correlation

In statistics, Pearson’s r correlation coefficient quantifies the linear correlation between two variables X and Y.

The value is always between -1 and 1.

- A value of 1 indicates a strong positive linear correlation.
- A value of 0 indicates no correlation.
- A value of -1 indicates a strong negative linear correlation.

One will find, that unlike covariance, correlation is scale-invariant, i.e. the scale of the data does not ultimately affect the correlation value.

Our *confidence* in our correlation value depends on how much data we have and is quantified by the *p-value*. This value tells us the probability that randomly sampled points would result in a similarly strong if not stronger relationship. The lower the *p-value*, the less likely that our correlation is due to randomness, and therefore the more confidence we are in our correlation value. This is in a way similar to hypothesis testing

Mathematically, the Pearson correlation between two variables X and Y is defined as the covariance between the two variables over the product of their variances:

\[ {\displaystyle \rho _{X,Y}={\frac {\operatorname {cov} (X,Y)}{\sigma _{X}\sigma _{Y} } } } \]

IMPORTANT: CORRELATION DOES NOT IMPLY CAUSATION!!!