Optim.jl
Tianhao Zhao
Specifying the IPNewton method in Optim.jl
IntroductionWhat to provideIllustrative exampleExplain by partObjective functionJacobian of objective functionHessian of objective functionNon-linear constraintsJacobian of the non-linear constraintsHessian of the non-linear constraintsAppendix: Autodiff of
Optim.jl
is a powerful Julia package that offers a wide range of optimization algorithms. Among these, interior-point Newton (IPNewton
) is the only gradient-based method capable of solving non-linearly constrained problems:
where
Despite its flexibility, IPNewton
can magically accelerate the algorithm by requiring users to provide the Jacobians and Hessians for both objective function
Notes: Automatic differentiation is available by option
autodiff=true
and implemented byForwardDiff.jl
In this post, I clearly outline the ideas behind and math formulas for these data structures (primarily matrices) to help readers better understand the official documentation and tailor the method to their specific problems.
For readability, we define the following alias:
F64 = Float64
Vec = AbstractVector
Mat = AbstractMatrix
V64 = Vector{Float64}
M64 = Matrix{Float64}
Num = Union{Real, Int, ...}
To solve the problem of Eq (IPNewton
API, one need to manually specify the following functions:
Definition | Math | Function declaration | Parameters | In-place? |
---|---|---|---|---|
Objective function | fun(x::Vec)::Num | control variable vector x of length n | False | |
Jacobian of objective function | fun_grad!(g::Vec,x::Vec)::Nothing | gradient vector g of length n and control vector x | True | |
Hessian of objective function | fun_hess!(h::Mat,x::Vec)::Nothing | hessian matrix h and control vector x | True | |
Non-linear constraints | con_c!(c::Vec,x::Vec)::Vec | constraint residuals c and control x | Partially | |
Jacobian of the non-linear constraints | con_jacobian!(J::Mat,x::Vec)::Mat | Jacobian matrix J and control | Partially | |
Hessian of the non-linear constraints | con_h!(h::Mat,x::Vec,lam::Vec)::Nothing | Hessian matrix h of objective f , control x , lam of length | True |
Let's illustrate how to specify these functions using a non-trivial example of economic modeling:
Notes: to avoid confusion, I use
to denote consumption but use to denote inequality constraints.
Notes: Box constraints on the control variables are separately processed by the solver API, so there is no need to convert them into some inequality constraints of
.
Notes: The
IPNewton
API allows custom but constant lower bounds and uppwer bounds for. This may help to simplify the definition of some constraints.
This is a typical "expert" who produces consumption goods using housing wealth and labor. The expert also borrows in form of bond
When solving this problem, the expected value function
The callable interpolant of
Reformulating the problem to a minimization one and cancelling out
where
and the objective function is:
The official documentation of Optim.jl
does not explicitly explain the pipeline of IPNnewton
. Thus, it is a little bit less intuitive to understand the behavior of required functions. In short, a pipeline is like:
User provides a pure objective function
The algorithm allocates Jacobians and Hessians for
Users provide "pipe" functions that receive and modify these internally created Jacobian and Hessian objects in-place without manually managing their construction, use, and destruction.
The objective function fun(x::Vec)::Num
that receives a
Hint: If your objective function requires extra parameters, then consider wrapping it with the information set
as a closure.
In the minimization API, the objective function is the negative of Eq (
The Jacobian
In the illustrative example:
The function fun_grad!(g,x)
should modify each element of vector g
using the values in the above equation at exactly the same position. This function only updates g
without returning it.
Note: One can quickly notice that: evaluating
requires computing the partial derivatives of the expected value function . This can be done explicitly (e.g. if you are using Chebyshev polynomials) or using finite difference approximation (most cases), or using automatic differentiation on the callable function . Readers may have to write the difference program themselves and properly handle the case on the boundary of feasible domains. This idea also applies to all the following matrices.
The Hessian
In the illustrative example, this matrix looks like:
The function fun_hess!(h::Mat,x::Vec)
should modify each element of matrix h
using the values in the above equation at exactly the same position. This function updates h
without returning it.
The non-linear constraints con_c!(c::Vec,x::Vec)::Vec
should modify the passed c
vector and also return c
. (Wierd behavior uhmmmm)
Since
The function con_jacobian!(J::Mat,x::Vec)::Mat
, however, now not only modifies the passed J
matrix but returns J
as well. Retuning a copy is the biggest difference of constraint-related API cp. objective-related API.
In the illustrative example, such a Jacobian looks like:
The IPNewton
algorithm enforces the inequality constraints using log-barrier functions. It optimizes an augmented Lagrangian with barrier terms. Thus, the total Hessian includes:
which adds extra con_h!(h::Mat,x::Vec,lam::Vec)::Nothing
.
h::Mat
is the Hessian of the objective function IPNewton
API and we users don't need to manage its construction and destruction.
lam::Vec
is a
The function con_h!()
looks like:
function con_h!(h,x,lam)
# add the hessian of the 1st constraint on the top of `h`; multiplied by the 1st multiplier `lam`
h += lam[1] * hess_cons_1(x) # hess_cons_1(x)::Matrix, n*n
# add the hessian of the 2nd constraint on the top of `h`
h += lam[2] * hess_cons_2(x)
# ...
return nothing
end
In the illustrative example, each constraint's Hessian wrt the control
which are derived from the 1st and 2nd row of the Jacobian in Eq (
Hint: Let's be more specific, step by step. The first row of Eq (
) is the Jacobian of the 1st constraint : A typical Jacobian is a column vector so we transpose the above matrix row to:
Then, we take derivatives again to get the Hessian
of the 1st constraint as Eq ( )
The evaluation of Jacobians and Hessians requires approximating the Jacobian and Hessian of the expected value function
Suppose we have an guess of
xp = [ap,hp,z] # a', h', z
val = ev(xp)
The ForwardDiff.jl
package can perform automatic differentiation on such a callable object. For example, evaluting Jacobian and Hessian at point
import ForwardDiff as fd
x0 = [1.0,0.9,0.8]
# Jacobian: 3-vector
fd.gradient(ev, x0)::Vector{Float64}
# Hessian: 3*3 matrix
fd.hessian(ev, x0)::Matrix{Float64}
Users may check the GitHub page of the package for more details.