Specifying the IPNewton method in `Optim.jl`

Tianhao Zhao

Specifying the IPNewton method in Optim.jlIntroductionWhat to provideIllustrative exampleExplain by partObjective functionJacobian of objective functionHessian of objective functionNon-linear constraintsJacobian of the non-linear constraintsHessian of the non-linear constraintsAppendix: Autodiff of $\bar{v}$

Introduction

Optim.jl is a powerful Julia package that offers a wide range of optimization algorithms. Among these, interior-point Newton (IPNewton) is the only gradient-based method capable of solving non-linearly constrained problems:

\begin{aligned} (1) & min_{x} f (x) \\ (2) & s.t. & l_{x} \leq x \leq u_{x} & (box constraints) \\ (3) & l_{c} \leq c (x) \leq u_{c} & (non-linear constraints) \\ (4) & x, l_{x}, u_{x} \in R^{n} \\ (5) & c (x), l_{c}, u_{c} \in R^{p} \end{aligned}

$n$ $p$ $l_c = u_c$ ).

IPNewton $f(x)$ $c(x)$ . Even though automatic differentiation is available today, user-provided information still has the best performance when it is available.

Notes: Automatic differentiation is available by option autodiff=true and implemented by ForwardDiff.jl

In this post, I clearly outline the ideas behind and math formulas for these data structures (primarily matrices) to help readers better understand the official documentation and tailor the method to their specific problems.

What to provide

For readability, we define the following alias:
F64 = Float64
Vec = AbstractVector
Mat = AbstractMatrix
V64 = Vector{Float64}
M64 = Matrix{Float64}
Num = Union{Real, Int, ...}

$\ref{eq:01}$ ) using IPNewton API, one need to manually specify the following functions:

Definition	Math	Function declaration	Parameters	In-place?
Objective function	$f(x)\in\mathbb{R}$	`fun(x::Vec)::Num`	control variable vector `x` of length `n`	False
Jacobian of objective function	$\nabla f(x)\in\mathbb{R}^{n}$	`fun_grad!(g::Vec,x::Vec)::Nothing`	gradient vector `g` of length `n` and control vector `x`	True
Hessian of objective function	$\nabla^2 f(x)\in\mathbb{R}^{n\times n}$	`fun_hess!(h::Mat,x::Vec)::Nothing`	hessian matrix `h` and control vector `x`	True
Non-linear constraints	$c(x)\in\mathbb{R}^{p}$	`con_c!(c::Vec,x::Vec)::Vec`	constraint residuals `c` and control `x`	Partially
Jacobian of the non-linear constraints	$\nabla c(x)\in\mathbb{R}^{p\times n}$	`con_jacobian!(J::Mat,x::Vec)::Mat`	Jacobian matrix `J` and control	Partially
Hessian of the non-linear constraints	$\nabla^2 c(x)\in\mathbb{R}^{n\times n}$	`con_h!(h::Mat,x::Vec,lam::Vec)::Nothing`	Hessian matrix `h` of objective `fx` $\lambda$ `lam` $p$	True

Illustrative example

Let's illustrate how to specify these functions using a non-trivial example of economic modeling:

\begin{aligned} (6) & v (a, h, z) = & max_{C, a^{'}, h^{'}, ℓ} \log C + γ \log h - δ \log ℓ + β \cdot \underset{=: \bar{v} (a^{'}, h^{'}, z)}{\underset{⏟}{\int_{z^{'} | z} v (a^{'}, h^{'}, z^{'}) d ϕ (z^{'} | z)}} \\ (7) & s.t. & C + (1 + r) a + p h^{'} = z h^{α} ℓ^{1 - α} + a^{'} + p h \\ (8) & C \geq 0 & (Non-negative consumption) \\ (9) & a^{'} \geq 0 & (No shorting of asset) \\ (10) & h^{'} \geq 0 & (No-shorting of housing) \\ (11) & ℓ \in [0, 1] & (Time endowment) \\ (12) & a^{'} \leq θ p h^{'} & (Collateral constraint) \\ (13) & (a, h, z) \in [\underset{―}{a}, \bar{a}] \times [\underset{―}{h}, \bar{h}] \times [\underset{―}{z}, \bar{z}] & (State constraints) \end{aligned}

$C$ $c$ to denote inequality constraints.

$c(x)$ .

IPNewton $c(x)$ $c(x)$ constraints.

$a'$ $h'$ . This problem has many features that are common in many models. Meanwhile, we also make some technical assumptions regarding computer programming:

$\bar{v}(a',h',z)$ $v(a,h,z)$ .
$\bar{v}(\cdot)$ $\mathbb{R}^3$ $\bar{v}(\cdot)$ can still return a number (e.g. a very negative number).

minimization $c$ using the equality constraint, we re-write the problem as:

\begin{aligned} (14) & min_{x := (a^{'}, h^{'}, ℓ)} f (x | M) \\ (15) & s.t. & [\begin{array}{c} \underset{―}{a} \\ \underset{―}{h} \\ 0 \end{array}] \leq x \leq [\begin{array}{c} \bar{a} \\ \bar{h} \\ 1 \end{array}] & (Box constraints of the controls) \\ (16) & [\begin{array}{c} 0 \\ 0 \end{array}] \leq \underset{c (x)}{\underset{⏟}{[\begin{array}{c} C (x) - ϵ \\ θ p h^{'} - a^{'} \end{array}]}} \leq [\begin{array}{c} + \infty \\ + \infty \end{array}] & (Inequality constraints) \end{aligned}

$M$ $\epsilon$ $C(x)$ is:

\begin{matrix} (17) & C (x) := z h^{α} ℓ^{1 - α} + a^{'} + p h - (1 + r) a - p h^{'} \end{matrix}

and the objective function is:

\begin{matrix} (18) & f (x | M) := \log {C (x)} + γ \log h - δ \log ℓ + β \bar{v} (a^{'}, h^{'}, z) \end{matrix}

Explain by part

The official documentation of Optim.jl does not explicitly explain the pipeline of IPNnewton. Thus, it is a little bit less intuitive to understand the behavior of required functions. In short, a pipeline is like:

$f(x)$ $c(x)$
$f(x)$ $c(x)$ internally
Users provide "pipe" functions that receive and modify these internally created Jacobian and Hessian objects in-place without manually managing their construction, use, and destruction.

Objective function

$f(x)$ fun(x::Vec)::Num $n$ -vector and returns a scalar of the function value. This is standard and required even if the autodiff is turned on.

$M$ as a closure.

negative $\ref{eq:020}$ ).

Jacobian of objective function

$\nabla f(x)$ is a column vector:

\begin{matrix} (19) & \begin{matrix} \nabla f (x) := [\begin{matrix} \frac{\partial f (x)}{\partial x_{1}} \\ ⋮ \\ \frac{\partial f (x)}{\partial x_{n}} \end{matrix}] \in R^{n} \end{matrix} \end{matrix}

In the illustrative example:

\begin{matrix} (20) & \begin{matrix} \nabla f (x) = [\begin{matrix} \partial f (x) / \partial a^{'} \\ \partial f (x) / \partial h^{'} \\ \partial f (x) / \partial ℓ \end{matrix}] = [\begin{matrix} \frac{1}{C (x)} + β \frac{\partial \bar{v} (a^{'}, h^{'}, z)}{\partial a^{'}} \\ β \frac{\partial \bar{v} (a^{'}, h^{'}, z)}{\partial h^{'}} \\ \frac{1}{C (x)} \cdot z (1 - α) h^{α} ℓ^{- α} - \frac{δ}{ℓ} \end{matrix}] \end{matrix} \end{matrix}

The function fun_grad!(g,x) should modify each element of vector g using the values in the above equation at exactly the same position. This function only updates g without returning it.

$\nabla f(x)$ $\bar{v}$ $\bar{v}(\cdot)$ . Readers may have to write the difference program themselves and properly handle the case on the boundary of feasible domains. This idea also applies to all the following matrices.

Hessian of objective function

$\nabla^2 f(x)$ $n\times n$ matrix:

\begin{array}{r} (21) & \nabla^{2} f (x) = [\begin{array}{c} \frac{\partial^{2} f (x)}{\partial x_{1} \partial x_{1}} & \frac{\partial^{2} f (x)}{\partial x_{1} \partial x_{2}} & \dots & \frac{\partial^{2} f (x)}{\partial x_{1} \partial x_{n}} \\ \frac{\partial^{2} f (x)}{\partial x_{2} \partial x_{1}} & ⋱ & \frac{\partial^{2} f (x)}{\partial x_{2} \partial x_{n}} \\ ⋮ & ⋮ \\ \frac{\partial^{2} f (x)}{\partial x_{n} \partial x_{1}} & \dots & \frac{\partial^{2} f (x)}{\partial x_{n} \partial x_{n}} \end{array}] \end{array}

In the illustrative example, this matrix looks like:

\begin{array}{r} (22) & \nabla^{2} f (x) = [\begin{array}{c} - \frac{1}{C^{2} (x)} + β \frac{\partial^{2} \bar{v} (a^{'}, h^{'}, z)}{\partial a^{'} \partial a^{'}} & \frac{p}{C^{2} (x)} + β \frac{\partial^{2} \bar{v} (a^{'}, h^{'}, z)}{\partial a^{'} \partial h^{'}} & - \frac{z (1 - α) h^{α} ℓ^{- α}}{C^{2} (x)} \\ β \frac{\partial^{2} \bar{v} (a^{'}, h^{'}, z)}{\partial h^{'} \partial a^{'}} & β \frac{\partial^{2} \bar{v} (a^{'}, h^{'}, z)}{\partial h^{'} \partial h^{'}} & 0 \\ - \frac{1}{C^{2} (x)} \cdot z (1 - α) h^{α} ℓ^{- α} & \frac{p}{C^{2} (x)} \cdot z (1 - α) h^{α} ℓ^{- α} & \frac{z (1 - α) (- α)}{C (x)} h^{α} ℓ^{- α - 1} - {[\frac{z (1 - α) h^{α} ℓ^{- α}}{C (x)}]}^{2} + \frac{δ}{ℓ^{2}} \end{array}] \end{array}

The function fun_hess!(h::Mat,x::Vec) should modify each element of matrix h using the values in the above equation at exactly the same position. This function updates h without returning it.

Non-linear constraints

$c(x)$ is programmed as a vector-value function. The function con_c!(c::Vec,x::Vec)::Vec should modify the passed c vector and also return c. (Wierd behavior uhmmmm)

Jacobian of the non-linear constraints

$\nabla c(x)$ is a vector-value function, its Jacobian is now a matrix looking like:

\begin{array}{r} (23) & \nabla c (x) = [\begin{array}{c} \frac{\partial c_{1} (x)}{\partial x_{1}} & \frac{\partial c_{1} (x)}{\partial x_{2}} & \dots & \frac{\partial c_{1} (x)}{\partial x_{n}} \\ \frac{\partial c_{2} (x)}{\partial x_{1}} & ⋱ & \frac{\partial c_{2} (x)}{\partial x_{n}} \\ ⋮ & ⋱ & ⋮ \\ \frac{\partial c_{p} (x)}{\partial x_{1}} & \dots & \frac{\partial c_{p} (x)}{\partial x_{n}} \end{array}] \in R^{p \times n} \end{array}

The function con_jacobian!(J::Mat,x::Vec)::Mat, however, now not only modifies the passed J matrix but returns J as well. Retuning a copy is the biggest difference of constraint-related API cp. objective-related API.

In the illustrative example, such a Jacobian looks like:

\begin{array}{r} (24) & \nabla c (x) = [\begin{array}{c} 1 & - p & z (1 - α) h^{α} ℓ^{- α} \\ - 1 & θ p & 0 \end{array}] \end{array}

Hessian of the non-linear constraints

The IPNewton algorithm enforces the inequality constraints using log-barrier functions. It optimizes an augmented Lagrangian with barrier terms. Thus, the total Hessian includes:

\begin{matrix} (25) & H (x) = \nabla^{2} f (x) + \sum_{i = 1}^{p} λ_{i} \cdot \nabla^{2} c_{i} (x) \end{matrix}

$p$ Hessian of the constraints on the top of the orginal Hessian of the objective function. This formula is the key to understand the behavior of function con_h!(h::Mat,x::Vec,lam::Vec)::Nothing.

h::Matobjective function $f(x)$ $c(x)$ . This object is internally created by the IPNewton API and we users don't need to manage its construction and destruction.
lam::Vec $p$ $\lambda_i$ . This object is also internally created by the API and we just take it as given

The function con_h!() looks like:


function con_h!(h,x,lam)
  # add the hessian of the 1st constraint on the top of `h`; multiplied by the 1st multiplier `lam`
  h += lam[1] * hess_cons_1(x)  # hess_cons_1(x)::Matrix, n*n
  
  # add the hessian of the 2nd constraint on the top of `h`
  h += lam[2] * hess_cons_2(x)
  
  # ...
  
  return nothing
end

$x$ are:

\begin{aligned} (26) & \nabla^{2} c_{1} (x) = [\begin{array}{c} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & z (1 - α) (- α) h^{α} ℓ^{- α - 1} \end{array}] \\ (27) & \nabla^{2} c_{2} (x) = [\begin{array}{c} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array}] \end{aligned}

$\ref{eq:024}$ ) respectively.

$\ref{eq:024}$ $c_1(x):=C(x) -\epsilon$ :
$\begin{array}{r} (28) & \nabla c_{1} (x) = [\begin{array}{c} 1 & - p & z (1 - α) h^{α} ℓ^{- α} \end{array}] \end{array}$
A typical Jacobian is a column vector so we transpose the above matrix row to:
$\begin{array}{r} (29) & \nabla c_{1} (x) = [\begin{array}{c} 1 \\ - p \\ z (1 - α) h^{α} ℓ^{- α} \end{array}] = [\begin{array}{c} \frac{\partial c_{1} (x)}{\partial a^{'}} \\ \frac{\partial c_{1} (x)}{\partial h^{'}} \\ \frac{\partial c_{1} (x)}{\partial ℓ} \end{array}] \end{array}$
$\nabla^2 c_1(x)$ $\ref{eq:026}$ )

$\bar{v}$

$\bar{v}(a',h',z)$ .

$\bar{v}(\cdot)$ $\bar{v}(\cdot)$ $x'$ :


xp = [ap,hp,z] # a', h', z
val = ev(xp)

ForwardDiff.jl $[1.0,0.9,0.8]$ :


import ForwardDiff as fd

x0 = [1.0,0.9,0.8]

# Jacobian: 3-vector
fd.gradient(ev, x0)::Vector{Float64}

# Hessian: 3*3 matrix
fd.hessian(ev, x0)::Matrix{Float64}

Users may check the GitHub page of the package for more details.

Specifying the IPNewton method in Optim.jl