Covariance

In statistics, covariance is used to classify 3 types of relationships between two variables X and Y.

  • Relationships where there is a positive relationship between X and Y.
  • Relationships where there is a negative relationship between X and Y.
  • Relationships where there is no relationship between X and Y.

More rigorously, covariance is defined as the expected value of the product of the deviations of x and y from their respective expected values

\[ \displaystyle \operatorname {cov} (X,Y)=\operatorname {E} { {\big [}(X-\operatorname {E} [X])(Y-\operatorname {E} [Y]){\big ]} }. \]

When calculating sample covariance, the formula can be written as

\[ \displaystyle q_{jk}={\frac {1}{N-1}}\sum _{i=1}^{N}\left(X_{ij}-{\bar {X} }_{j}\right)\left(X_{ik}-{\bar {X} }_{k}\right). \]

A positive covariance value indicates a positive relationship, and vice versa. A covariance of 0 indicates no relationship.


It should be noted that a covariance value is hard to interpret on its own. Covariance only tells us about the type of relationship. It does not tell us about the steepness of any eventual slope, nor does it tell us the strength of the relationship.

This happens because covariance values are sensitive to the scale of the data. This is why we only need to care about whether the covariance is positive, negative or 0, with no attention paid to the magnitude.

While covariance is not useful on its own, it is often used as a “stepping stone” for other, more useful calculations, such as correlation

Notes mentioning this note


Here are all the zettels in this zettelkasten, along with their links, visualized as a graph. You may need to zoom and pan around to see something.