Automatic differentiation¶
Automatic Differention (Automatic Differentiationin Machine Learning: a Survey is a method of applying algorithmic differentiation, to compute derivatives on the fly by repeated application of the chain rule
For any given operation, we can draw a computational graph of operations, and thus calculate the total chain rule derivative, using the above formula. However, auto-diff can compute this for us using the concept of Dual Numbers, and corollaries of the chain rule (sum rule, product rule, quotient rule, sin, log, etc.).
A Dual Number is the one-dimensional Grassman algebra over $\mathbb{R}$. It is simply defined some $a, b$
such that $\varepsilon^2 = 0$.
Heuristically, one can consider this as a value $a$, along with it’s differential component $b$.
A simple Julia implementation¶
We define our dual number as a struct; here, let $x=a$ be the value of our number, and $\epsilon = \varepsilon b$ be it’s derivative:
struct D <: Number
x::Float64
ϵ::Float64
end
We then define corollaries of the chain rule so that our Dual Number can be used:
import Base: +, -, *, /, sin, log, convert, promote_rule
# summation rule
a::D + b::D = D(a.x + b.x, a.ϵ + b.ϵ)
a::D - b::D = D(a.x - b.x, a.ϵ - b.ϵ)
# product rule
a::D * b::D = D(a.x * b.x, (a.x * b.ϵ) + (a.ϵ * b.x) )
# quotient rule
a::D / b::D = D(a.x / b.x, ( (b.x * a.ϵ) - (a.x * b.ϵ) ) / (b.x^2))
# sin & log
sin(a::D) = D(sin(a.x), cos(a.x) * a.ϵ)
log(a::D) = D(log(a.x), 1/a.x * a.ϵ)
# conversion
Base.convert(::Type{D}, x::Real) = D(x, zero(x))
# always promote to Dual
Base.promote_rule(::Type{D}, ::Type{<:Number}) = D
We can try it out on any function of choice: for some $$ f(a, b) = \log \left( ab + \sin(a) \right), $$ we want to compute $$ \frac{\text{d}f}{\text{d}a}. $$
In Julia with our auto-diff:
f(a, b) = log(a * b + sin(a))
a = 3.1
b = 2.4
f(D(a, 1), b)
D(2.0124440881688996, 0.18724182935843758)
Note, we set $\epsilon = 1$ as it represents $\text{d}x / \text{d}x$ in the chain rule expansion.
Comparing to a symbolic result:
f_derivative(a,b) = 1/(a*b + sin(a)) * (b + cos(a)) # symbolic derivative
f_derivative(a, b)
0.18724182935843758
f(D(a,1), b).ϵ ≈ f_derivative(a, b)
true
We can generalize the above for some impressive results, by adding a simple convenience function
derivative(f::Function, x::Number) = f(D(x, one(x))).ϵ
derivative (generic function with 1 method)
derivative(x -> 3*x^2, 5) # -> 6 * x eval at x = 5
30.0
Forward and reverse mode¶
The concept of forward mode and reverse mode is often discussed in auto-diff literature. A nice overview of the differences is given by Mike Innes.
We’ll use packages for this later. In brief:
Forward mode: use for $\mathbb{R} \rightarrow \mathbb{R}^n$
Reverse mode: use for $\mathbb{R}^n \rightarrow \mathbb{R}$
Example: Babylonian square root¶
The Babylonian square root is an algorithm for calculating the square root of a number, up to a given repeated precision.
In short, it is repeatedly applying $$ t \mapsto \left( t + \frac{1}{2} x \right) $$ until $t$ converges on $\sqrt{x}$. In Julia:
@inline function Babylonian(x; N=10)
t = (1 + x) / 2
for i = 2:N
t = (t + x/t) / 2
end
t
end
Babylonian (generic function with 1 method)
Comparing the accuracy of the algorithm:
Babylonian(2), √2
(1.414213562373095, 1.4142135623730951)
With this implementation, we can obtain an implementation for the derivative for free:
Babylonian(D(5, 1))
D(2.23606797749979, 0.22360679774997896)
Note that the LLVM optimizer uses parallelized instructions in the above (check with @code_native
).
Forward mode with ForwardDiff.jl¶
Our above has been implemented much more extensively and complete in the ForwardDiff.jl package (and associated DiffRules.jl).
ForwardDiff provides the concept of a dual number for us, as well as all the requesite chain rule applications. It is extremely simple to use:
using ForwardDiff
ForwardDiff.derivative(Babylonian, 5)
0.223606797749979
?ForwardDiff.derivative
ForwardDiff.derivative(f, x::Real)
Return df/dx
evaluated at x
, assuming f
is called as f(x)
.
This method assumes that isa(f(x), Union{Real,AbstractArray})
.
ForwardDiff.derivative(f!, y::AbstractArray, x::Real, cfg::DerivativeConfig = DerivativeConfig(f!, y, x), check=Val{true}())
Return df!/dx
evaluated at x
, assuming f!
is called as f!(y, x)
where the result is stored in y
.
Set check
to Val{false}()
to disable tag checking. This can lead to perturbation confusion, so should be used with care.
Reverse mode with Zygote.jl¶
An implementation of reverse mode auto-diff is in Zygote.jl, which is the basis of most of the Julia ML ecosystem.
Similarly to ForwardDiff, Zygote is very straight forward to use:
using Zygote
gradient(Babylonian, 5)
(0.22360679774997896,)
?gradient
search: gradient
gradient(f, args...)
Returns a tuple containing ∂f/∂x
for each argument x
, the derivative (for scalar x) or the gradient.
f(args...)
must be a real number, see jacobian
for array output.