Pearson Correlation
In statistics, Pearson’s r correlation coefficient quantifies the linear correlation between two variables X and Y.
The value is always between -1 and 1.
- A value of 1 indicates a strong positive linear correlation.
- A value of 0 indicates no correlation.
- A value of -1 indicates a strong negative linear correlation.
One will find, that unlike covariance, correlation is scale-invariant, i.e. the scale of the data does not ultimately affect the correlation value.
Our confidence in our correlation value depends on how much data we have and is quantified by the p-value. This value tells us the probability that randomly sampled points would result in a similarly strong if not stronger relationship. The lower the p-value, the less likely that our correlation is due to randomness, and therefore the more confidence we are in our correlation value. This is in a way similar to hypothesis testing
Mathematically, the Pearson correlation between two variables X and Y is defined as the covariance between the two variables over the product of their variances:
\[ {\displaystyle \rho _{X,Y}={\frac {\operatorname {cov} (X,Y)}{\sigma _{X}\sigma _{Y} } } } \]
IMPORTANT: CORRELATION DOES NOT IMPLY CAUSATION!!!