Automatic Differentiation

Automatic Differentiation (AD) is a set of techniques that allow us to compute derivatives efficiently and automatically. This is as an (arguably better) alternative to symbolic differentiation and numerical differentiation.

While AD can achieve the same (perfect) accuracy in its computed gradients as symbolic differentiation, it does not suffer from expression swell and doesn’t require closed-form expressions.

This is achieved by acting directly on the intermediate variables defined in the implementation of the original function. AD understands that ultimately, any differentiable function will be composed of primitives, basic operations for which we know how to compute the derivative.

Through the use of the chain rule1, we can compute the value of our function’s derivative by chaining the derivatives of the underlying primitives.

Autodiff has 2 main versions:

  1. Just a quick reminder of the chain rule in its most basic form:

    For a function \(f(x)\) differentiable at \(x = c\) which can be written as the composition of two functions \(f(x) = (g \circ h)(x)\), the derivative of \(f\) with respect to \(x\) at \(x=c\) can be calculated with

    \[ f'(c) = (g \circ h)'(c) = g'(h(c)) \cdot h'(c) \]

    or for a function \(w(t, u_1, \cdots, u_i)\)

    \[ \begin{aligned} \frac{\partial w}{\partial t} &= \sum_i \left(\frac{\partial w}{\partial u_i} \cdot \frac{\partial u_i}{\partial t}\right)\\ &= \frac{\partial w}{\partial u_1} \cdot \frac{\partial u_1}{\partial t} + \frac{\partial w}{\partial u_2} \cdot \frac{\partial u_2}{\partial t} + \cdots \end{aligned} \]

Notes mentioning this note

Here are all the zettels in this zettelkasten, along with their links, visualized as a graph. You may need to zoom and pan around to see something.