Collocation with the presence of endogeneous Markovian risk

Tianhao Zhao

Introduction

The collocation method is widely used in macroeconomic modeling to approximate value functions (or policy functions) through interpolation techniques. This approach offers two key advantages:

  1. Many interpolation methods provide greater smoothness than piecewise linear interpolation while requiring fewer supporting nodes.

  2. By projecting the original problem onto a pseudo-linear one in the space of interpolation coefficients, the collocation method enables analytical evaluation of the Jacobian matrix for the approximated system. This Jacobian facilitates the use of gradient-based algorithms, such as Newton’s method, to accelerate convergence.

Despite its advantages, many textbooks and online materials overlook an important scenario: cases where the transition matrix (often assumed to describe a Markov process governing uncertainty in dynamic programming) is exogenous but depends on the current state of the system. This scenario, involving endogenous risk, is highly relevant in economic applications. For instance, an individual’s unemployment risk in general equilibrium may depend on the aggregate unemployment rate, which itself evolves as an aggregate state. Addressing such cases requires specific formulations and methodological adjustments to extend the collocation method.

In this post, I first establish the notation for collocation and outline the construction of the discretized system. I emphasize that endogenous risk, compared to standard exogenous risk, significantly increases the number of equations required to interpolate the expected value function. Specifically, the size of the system expands proportionally to the number of supporting nodes for endogenous states. I then discuss how to solve the resulting pseudo-linear system and compute its Jacobian matrix analytically. Finally, I explore several topics for efficiently implementing the collocation method.

Notation

Hint:

  1. I do my best to label the dimension of each vector and matrix such that one can easily translate the math notations to their programs without wasting time in figuring out potential dimesion mismatches.

  2. A vector space Rn should be always translated to column vector in computer programming.

We consider solving the following time-invariant Bellman equation:

(36)v(x)=maxcu(c,x)+βE{v(x)|x}(Bellman equation)(37)s.t. cCRdc(admissible space)(38)xXRd(state space & state constraints)(39)v(x):XR(value function)(40)x:=(xa,xe)(state sorting)(41)xaRda,xeRde,da+de=d(endo & exogenous states split)(42)xa=μ(x,c)(endo law of motion)(43)xeMarkovProcess(x)(exogenous law of motion)(44)Pr{xe|xe}=Pr{xe|xe}(xa)(Markovian)

where xa is the sub-vector of endogeneous states, xe is the sub-vector of exogenous states. The whole law of motion of x consists of μ(x,c) and the Markov process for xe. The exogenous states xe follow a k-state Markov chain (or any process that can be reasonably approximated by such a Markov chain) with transition matrix P(xa)Rk×k. For convenience of algebra, we sort the dimensions of x by having the de exogenous states xe in the very end of the state vector.

Remark: The state vector x, in general equilibrium, consists of individual states AND aggregate states simultaneously.

Here we assume the transition probability Pr{xe|xe}(xa) is a function of endogenous states xa. It locates in the generic case in which the "risk" depends on some endogenous states (e.g. in general equilibrium with endogenous unemployment risk). And for convenience, we split the ransition matrix P(xa) row by row:

(45)P(xa):=[pxa(|xe1)pxa(|xe2)pxa(|xek)]Rk×k,pxa(|xi)R1×k

where pxa(|xei) is the distribution vector conditional on current exogenous states xei.

Attention: the transition matrix P(xa) depends on endogenous states xa but the matrix itself determines the next period's exogenous states xe.

Notes: stacking is not necessary to be Cartesian/tensor product. The supporting nodes are not necessary to be uniform or evenly spaced. This assumption allows for sparse grid and other special grids.

Notation: In the following sections, I keep using x to denote an arbitrary point in the state space X, and using zX to denote supporting nodes of the interpolation. A node z must be a point, but a point is not necessary to be a node.

Remark: Supporting nodes are necessary in either local interpolation methods or global methods. In global methods, these nodes are sampled and evaluated at to fit the interpolation.

Let's define stackings:

(46)V(X):=[v(x1)v(x2)v(xh)]Rh×1,V(Z)Rn×1

For convenience, let's also define the stacking of policy c(x), instantaneous utility u(c,x), and the new endogenous states xa by its law of motion:

(47)C(X):=[c(x1)c(xh)]Rh×dc,C(Z)Rn×dc(48)U(X):=[u(c(x1),x1)u(c(x2),x2)u(c(xh),xh)]Rh×1,U(Z)Rn×1(49)Xa(X):=[μ(x1,c(x1))μ(x2,c(x2))μ(xh,c(xh))]=μ(X,C(X))Rh×da,Xa(Z)Rn×da

Let X:=[x1,x2,,xh]TRh×d be an arbitrary h stacking of d-dimensional points where each xiX. A special case of the stacking is Z:=[z1,z2,,zn]TRn×d which is an n stacking of d-dimensional (unique) supporting nodes.

Let v():RdR be the value function to solve. An interpolant v^() of degree m1 must have the following form:

(50)v^(x):=ϕ(x)θ(51)ϕ(x)R1×m,θRm×1

where ϕ() is the basis vector and θ is the interpolation coefficient vector. Let's define stakced basis matrix evaluated at h arbitrary points:

(52)Φ(X):=[ϕ(x1)ϕ(x2)ϕ(xh)]Rh×m,Φ(Z)Rn×m

Thus, the stacked interpolated values can be linearly written as:

(53)V^(X)=Φ(X)θ(54)V^(Z)=Φ(Z)θ

Hints: For (univariate or multivariate) piecewise linear interpolation, Φ(Z)=In and θ=V(Z) where In is the n×n identity matrix.

Now, let's interpolate and stack the expected value function (or Q-function). The expectation operation is defined as:

(55)E{v(x)|x}:=j=1kPr{xe|xe}(xa)v(x)=ve(xa,xa,xe)=ve(xa,x):Rda×RdR(56)v^e(xa,x):=ϕe(xa,x)θe(57)ϕe()R1×me,θeRme×1

where the dimensionality of ve is doubled to da+d.

Remark: An alternative way of defining the interpolant is ϕe(xa)θe(x) which fits many scenarios. However, this alternative method requires further interpolation of a vector-valued function θe():RdRζ. Its dimensionality ζ, which depends on the number of supporting nodes, is infeasibly high. Our current implementation of Eq (56) works better at a cost of da+d dimensionality.

Remark: The node set of Xa can be different from the set of partial z from the n nodes in X, which allows multiple resolution of the interpolation.

For convenience, we define the tensor space of endogenous states A:=Xa×XRda+d and use notation a=(xa,x)A to denote an arbitray joined point; and use az=(za,z) to denote a point that is joined by two endogenous state nodes. There are na2 such nodes. Then, let's define the stacked expected value function, and the stacked basis matrix of the expected value function:

(58)Ve(A):Rh×(da+d)Rh×1,Ve(Az):Rnan×(da+d)Rnan×1(59)Φe(A)Rh×me,Φe(Az)Rnan×me

Thanks to the linearity of the expectation operation, we can define the expectation operator as a row-vector such that:

(60)ve(xa,x)=j=1kpxa(ej|xe)(xa)v(xa,ej)=pxa(|xe)[v(xa,e1)v(xa,e2)v(xa,ek)],ej{xe1,,xek}k Markov states

Intuitively, for the following stacked mapping from node stacking Z to node stacking Az:

(61)Ve(Az)Rnan×1=EV(Z)Rn×1

one can define an expectation matrix ERnan×n such that Ve(Az)=EV(Z). Each row of E picks needed nodes from the n nodes of Z and does weighted summation using the corresponding distribution vector pxa(|xe) according to the current node of Az.

Remark: The construction of E, personally speaking, is the hardest part of collocation. Its structure not only depends on the stacking order of your Z nodes but also depends on the type of your grid. Thus, it is hard to write down mathematically (unless some special cases such as Cartesian nodes and states sorted lexicographically or anti-lexicographically), but it might be easy to implement in computer programs using Hash map or other efficient data structures.


Discretized system

The discretized system is a pseudo linear system of total n(na+1) equations

(62)V(Z)=U(Z)+βVe(Xa(Z),Z)Bellman equations, total: n(63)Ve(Az)=EV(Z)Q equations, total: nan

accompoanied by n optimization problems (optimization step):

(64)Xa(Z)=μ(Z,C(Z)),C(Z)=argmaxCHamiltonian(Z)

Remark:

  1. The number of equations degenerates to n+na if P(xa) is no longer depending on endogeneous states xa. This happens in stationary equilibrium or purely exogenous risk.

  2. The endogeneity of the risk P dramatically increases the number of equations by approximately na times.

Plugging the interpolation in:

(65)Φ(Z)θ=U(Z)+βΦe(Xa(Z),Z)θeΦe(Az)θe=EΦ(Z)θ

There are m+me equations about the inteprolation coefficients θ:=(θ,θe).

Remark:

  1. For both global and local interpolation methods, m+me is usually equal to n(na+1).

  2. Thus, if there is no discontinuity of v, then global methods are always more preferred.

The system is pseudo linear in the coefficients. To solve this system using gradient-based methods such as Newton's method, we define:

(66)F(θ):=RHS(θ)LHS(θ)=0(67)RHS(θ):=[U(Z)+βΦe(Xa(Z),Z)θeEΦ(Z)θ](68)LHS(θ):=[Φ(Z)θΦe(Az)θe]

Remark: The Eq (65) can be converted to a fixed-point format by left multiplying Φ1(Z) and Φe1(Az) to both sides of the equation. The fixed-point format can be directly used to run some iterative algorithms.

There is Jacobian:

F(θ)=[0m×mβΦe(Xa(Z),Z)EΦ(Z)0me×me][Φ(Z)0me×me0m×mΦe(Az)](69)=[Φ(Z)βΦe(Xa(Z),Z)EΦ(Z)Φe(Az)]

With the analytical Jacobian, one can efficiently solve the whole problem in a reasonable time.

Remark: One can quickly figure out that most terms of F(θ) can be pre-conditioned except the basis matrix Φe(Xa(Z),Z). The reason is that the new endogenous states Xa(Z) change by iteration and evaluating it requires doing the optimization step. The optimization step, which also determines U(Z) of every iteration in F(θ), is the bottleneck of the whole algorithm's performance.

Topic: Chocie of interpolation methods

The choice of interpolation methods becomes particularly important as the dimensionality of the Bellman equation increases. Since the number of interpolation coefficients must match the number of supporting nodes to exactly determine the coefficients, the projected pseudo-linear system remains subject to the curse of dimensionality (not only the number of states, but also the extra dimensionality due to the endogenous risk!). Thus, one must carefully trade off between different interpolation methods based on their performance characteristics. The following table compares the memory demand of local and global interpolation methods generally given the same accuracy demand.

MethodRequired number of nodesBasis matrix sparsity
LocalLargeSparse
GlobalNot that largeDense (usually)

The following table compares some popular interpolation methods:

MethodTypeTime ComplexityMemory ComplexityNotes
Piecewise LinearLocalO(n)O(n)Simple, fast, but less smooth; scales well for higher dimensions.
Cubic SplineLocalO(n)O(n)Provides smoother approximations than linear interpolation but may struggle in very high dimensions.
Polynomial BasisGlobalO(n3)O(n2)Highly smooth but computationally expensive; impractical for high dimensions.
Chebyshev PolynomialsGlobalO(nlogn)O(n)Offers better numerical stability and smoothness; suitable for moderate dimensions.
Radial Basis Functions (RBF)GlobalO(n2)O(n2)Flexible and accurate but memory-intensive; more suitable for low-dimensional problems.

One can quickly realize that the trade-off does not break the curse of dimensionality. To really break it, one can explore other dimensionality reduction techniques such as

Topic: Recipe of Newton's method

To solve Eq (65), one can call any standard solvers (e.g. MATLAB's optimization toolbox) and provide Eq (69) to the solver. However, when someone has to write their own iterator, the following simple updating formula is always good to recall:

(70)θ=θ(F(θ))1F(θ)

To improve the efficiency by avoiding inverting the Jacobian matrix when m+me is large, one can check some alternative methods such as Newton-Krylov Method.

Topic: Optimization step

Even though collocation significantly accelerates the convergence of the entire iteration, 90% of the computational cost is still concentrated in the optimization step, where one must solve m+me embarrassingly differentiated optimization problems in every iteration to get the instantaneous utility U(C(Z),Z) and new endogenous states Xa(Z).

The performance depends heavily on the number of control variables and the complexity of the admissible space C:

Therefore, one must carefully select the optimizer while verifying the interpolation properties used in collocation. For example, the MATLAB function fmincon is powerful and robust, but it may fail to converge within a reasonable time if the interpolation method causes numerical oscillations or destroys the convexity of the Hamiltonian.