Specifying the IPNewton method in Optim.jl

Tianhao Zhao

Introduction

Optim.jl is a powerful Julia package that offers a wide range of optimization algorithms. Among these, interior-point Newton (IPNewton) is the only gradient-based method capable of solving non-linearly constrained problems:

(1)minxf(x)(2)s.t. lxxux(box constraints)(3)lcc(x)uc(non-linear constraints)(4)x,lx,uxRn(5)c(x),lc,ucRp

where n represents the number of control variables and p denotes the number of non-linear inequality constraints (with equality constraints satisfying lc=uc).

Despite its flexibility, IPNewton can magically accelerate the algorithm by requiring users to provide the Jacobians and Hessians for both objective function f(x) and constraint functions c(x). Even though automatic differentiation is available today, user-provided information still has the best performance when it is available.

Notes: Automatic differentiation is available by option autodiff=true and implemented by ForwardDiff.jl

In this post, I clearly outline the ideas behind and math formulas for these data structures (primarily matrices) to help readers better understand the official documentation and tailor the method to their specific problems.

What to provide

For readability, we define the following alias:

  • F64 = Float64

  • Vec = AbstractVector

  • Mat = AbstractMatrix

  • V64 = Vector{Float64}

  • M64 = Matrix{Float64}

  • Num = Union{Real, Int, ...}

To solve the problem of Eq (1) using IPNewton API, one need to manually specify the following functions:

DefinitionMathFunction declarationParametersIn-place?
Objective functionf(x)Rfun(x::Vec)::Numcontrol variable vector x of length nFalse
Jacobian of objective functionf(x)Rnfun_grad!(g::Vec,x::Vec)::Nothinggradient vector g of length n and control vector xTrue
Hessian of objective function2f(x)Rn×nfun_hess!(h::Mat,x::Vec)::Nothinghessian matrix h and control vector xTrue
Non-linear constraintsc(x)Rpcon_c!(c::Vec,x::Vec)::Vecconstraint residuals c and control xPartially
Jacobian of the non-linear constraintsc(x)Rp×ncon_jacobian!(J::Mat,x::Vec)::MatJacobian matrix J and controlPartially
Hessian of the non-linear constraints2c(x)Rn×ncon_h!(h::Mat,x::Vec,lam::Vec)::NothingHessian matrix h of objective f, control x, λ-weight lam of length pTrue

Illustrative example

Let's illustrate how to specify these functions using a non-trivial example of economic modeling:

(6)v(a,h,z)=maxC,a,h, logC+γloghδlog+βz|zv(a,h,z)dϕ(z|z)=:v¯(a,h,z)(7)s.t. C+(1+r)a+ph=zhα1α+a+ph(8)C0(Non-negative consumption)(9)a0(No shorting of asset)(10)h0(No-shorting of housing)(11)[0,1](Time endowment)(12)aθph(Collateral constraint)(13)(a,h,z)[a,a¯]×[h,h¯]×[z,z¯](State constraints)

Notes: to avoid confusion, I use C to denote consumption but use c to denote inequality constraints.

Notes: Box constraints on the control variables are separately processed by the solver API, so there is no need to convert them into some inequality constraints of c(x).

Notes: The IPNewton API allows custom but constant lower bounds and uppwer bounds for c(x). This may help to simplify the definition of some c(x) constraints.

This is a typical "expert" who produces consumption goods using housing wealth and labor. The expert also borrows in form of bond a at rate and face collateral constraint that depends on h. This problem has many features that are common in many models. Meanwhile, we also make some technical assumptions regarding computer programming:

  1. When solving this problem, the expected value function v¯(a,h,z) has been interpolated. It is wrapped up as a callable function such that we do not need to explicitly write the expectation operation over v(a,h,z).

  2. The callable interpolant of v¯() can smoothly handle any point in R3. If a point is not in the feasible state space, then v¯() can still return a number (e.g. a very negative number).

Reformulating the problem to a minimization one and cancelling out c using the equality constraint, we re-write the problem as:

(14)minx:=(a,h,)f(x|M)(15)s.t. [ah0]x[a¯h¯1](Box constraints of the controls)(16)[00][C(x)ϵθpha]c(x)[++](Inequality constraints)

where M is the information set we have for this problem (including current states, parameters, computational space, and other model details); ϵ is a small amount to avoid exact zero; the substituted consumption C(x) is:

(17)C(x):=zhα1α+a+ph(1+r)aph

and the objective function is:

(18)f(x|M):=log{C(x)}+γloghδlog+βv¯(a,h,z)

Explain by part

The official documentation of Optim.jl does not explicitly explain the pipeline of IPNnewton. Thus, it is a little bit less intuitive to understand the behavior of required functions. In short, a pipeline is like:

  1. User provides a pure objective function f(x) and a pure constraint function c(x)

  2. The algorithm allocates Jacobians and Hessians for f(x) and c(x) internally

  3. Users provide "pipe" functions that receive and modify these internally created Jacobian and Hessian objects in-place without manually managing their construction, use, and destruction.

Objective function

The objective function f(x) is set up as a scalar function call fun(x::Vec)::Num that receives a n-vector and returns a scalar of the function value. This is standard and required even if the autodiff is turned on.

Hint: If your objective function requires extra parameters, then consider wrapping it with the information set M as a closure.

In the minimization API, the objective function is the negative of Eq (18).

Jacobian of objective function

The Jacobian f(x) is a column vector:

(19)f(x):=[f(x)x1f(x)xn]Rn

In the illustrative example:

(20)f(x)=[f(x)/af(x)/hf(x)/]=[1C(x)+βv¯(a,h,z)aβv¯(a,h,z)h1C(x)z(1α)hααδ]

The function fun_grad!(g,x) should modify each element of vector g using the values in the above equation at exactly the same position. This function only updates g without returning it.

Note: One can quickly notice that: evaluating f(x) requires computing the partial derivatives of the expected value function v¯. This can be done explicitly (e.g. if you are using Chebyshev polynomials) or using finite difference approximation (most cases), or using automatic differentiation on the callable function v¯(). Readers may have to write the difference program themselves and properly handle the case on the boundary of feasible domains. This idea also applies to all the following matrices.

Hessian of objective function

The Hessian 2f(x) is an n×n matrix:

(21)2f(x)=[2f(x)x1x12f(x)x1x22f(x)x1xn2f(x)x2x12f(x)x2xn2f(x)xnx12f(x)xnxn]

In the illustrative example, this matrix looks like:

(22)2f(x)=[1C2(x)+β2v¯(a,h,z)aapC2(x)+β2v¯(a,h,z)ahz(1α)hααC2(x)β2v¯(a,h,z)haβ2v¯(a,h,z)hh01C2(x)z(1α)hααpC2(x)z(1α)hααz(1α)(α)C(x)hαα1[z(1α)hααC(x)]2+δ2]

The function fun_hess!(h::Mat,x::Vec) should modify each element of matrix h using the values in the above equation at exactly the same position. This function updates h without returning it.

Non-linear constraints

The non-linear constraints c(x) is programmed as a vector-value function. The function con_c!(c::Vec,x::Vec)::Vec should modify the passed c vector and also return c. (Wierd behavior uhmmmm)

Jacobian of the non-linear constraints

Since c(x) is a vector-value function, its Jacobian is now a matrix looking like:

(23)c(x)=[c1(x)x1c1(x)x2c1(x)xnc2(x)x1c2(x)xncp(x)x1cp(x)xn]Rp×n

The function con_jacobian!(J::Mat,x::Vec)::Mat, however, now not only modifies the passed J matrix but returns J as well. Retuning a copy is the biggest difference of constraint-related API cp. objective-related API.

In the illustrative example, such a Jacobian looks like:

(24)c(x)=[1pz(1α)hαα1θp0]

Hessian of the non-linear constraints

The IPNewton algorithm enforces the inequality constraints using log-barrier functions. It optimizes an augmented Lagrangian with barrier terms. Thus, the total Hessian includes:

(25)H(x)=2f(x)+i=1pλi2ci(x)

which adds extra p Hessian of the constraints on the top of the orginal Hessian of the objective function. This formula is the key to understand the behavior of function con_h!(h::Mat,x::Vec,lam::Vec)::Nothing.

The function con_h!() looks like:

In the illustrative example, each constraint's Hessian wrt the control x are:

(26)2c1(x)=[00000000z(1α)(α)hαα1](27)2c2(x)=[000000000]

which are derived from the 1st and 2nd row of the Jacobian in Eq (24) respectively.

Hint: Let's be more specific, step by step. The first row of Eq (24) is the Jacobian of the 1st constraint c1(x):=C(x)ϵ :

(28)c1(x)=[1pz(1α)hαα]

A typical Jacobian is a column vector so we transpose the above matrix row to:

(29)c1(x)=[1pz(1α)hαα]=[c1(x)ac1(x)hc1(x)]

Then, we take derivatives again to get the Hessian 2c1(x) of the 1st constraint as Eq (26)

Appendix: Autodiff of v¯

The evaluation of Jacobians and Hessians requires approximating the Jacobian and Hessian of the expected value function v¯(a,h,z).

Suppose we have an guess of v¯(). Suppose we have interpolated the guess and wrapped it as a callable object, which means that it can be called like a function that returns the value of v¯() at a given point x:

The ForwardDiff.jl package can perform automatic differentiation on such a callable object. For example, evaluting Jacobian and Hessian at point [1.0,0.9,0.8]:

Users may check the GitHub page of the package for more details.