Collocation with the presence of endogeneous Markovian risk

Tianhao Zhao

Collocation with the presence of endogeneous Markovian riskIntroductionNotationDiscretized systemTopic: Chocie of interpolation methodsTopic: Recipe of Newton's methodTopic: Optimization step

Introduction

The collocation method is widely used in macroeconomic modeling to approximate value functions (or policy functions) through interpolation techniques. This approach offers two key advantages:

Many interpolation methods provide greater smoothness than piecewise linear interpolation while requiring fewer supporting nodes.
By projecting the original problem onto a pseudo-linear one in the space of interpolation coefficients, the collocation method enables analytical evaluation of the Jacobian matrix for the approximated system. This Jacobian facilitates the use of gradient-based algorithms, such as Newton’s method, to accelerate convergence.

Despite its advantages, many textbooks and online materials overlook an important scenario: cases where the transition matrix (often assumed to describe a Markov process governing uncertainty in dynamic programming) is exogenous but depends on the current state of the system. This scenario, involving endogenous risk, is highly relevant in economic applications. For instance, an individual’s unemployment risk in general equilibrium may depend on the aggregate unemployment rate, which itself evolves as an aggregate state. Addressing such cases requires specific formulations and methodological adjustments to extend the collocation method.

In this post, I first establish the notation for collocation and outline the construction of the discretized system. I emphasize that endogenous risk, compared to standard exogenous risk, significantly increases the number of equations required to interpolate the expected value function. Specifically, the size of the system expands proportionally to the number of supporting nodes for endogenous states. I then discuss how to solve the resulting pseudo-linear system and compute its Jacobian matrix analytically. Finally, I explore several topics for efficiently implementing the collocation method.

Notation

Hint:
I do my best to label the dimension of each vector and matrix such that one can easily translate the math notations to their programs without wasting time in figuring out potential dimesion mismatches.
$\mathbb{R}^{n}$ should be always translated to column vector in computer programming.

We consider solving the following time-invariant Bellman equation:

\begin{aligned} (36) & v (x) = max_{c} u (c, x) + β E {v (x^{'}) | x} & (Bellman equation) \\ (37) & s.t. & c \in C \subseteq R^{d_{c}} & (admissible space) \\ (38) & x \in X \subseteq R^{d} & (state space & state constraints) \\ (39) & v (x) : X \to R & (value function) \\ (40) & x := (x_{a}, x_{e}) & (state sorting) \\ (41) & x_{a} \in R^{d_{a}}, x_{e} \in R^{d_{e}}, d_{a} + d_{e} = d & (endo & exogenous states split) \\ (42) & x_{a}^{'} = μ (x, c) & (endo law of motion) \\ (43) & x_{e} \sim MarkovProcess (x) & (exogenous law of motion) \\ (44) & Pr {x_{e}^{'} | x_{e}} = Pr {x_{e}^{'} | x_{e}} (x_{a}) & (Markovian) \end{aligned}

$x_a$ $x_e$ $x$ $\mu(x,c)$ $x_e$ $x_e$ $k$ $P(x_a)\in\mathbb{R}^{k\times k}$ $x$ $d_e$ $x_e$ in the very end of the state vector.

$x$ , in general equilibrium, consists of individual states AND aggregate states simultaneously.

$\Pr\{x_e'|x_e\}(x_a)$ $x_a$ $P(x_a)$ row by row:

\begin{matrix} (45) & \begin{matrix} P (x_{a}) := [\begin{matrix} p_{x_{a}} (\cdot | x_{e}^{1}) \\ p_{x_{a}} (\cdot | x_{e}^{2}) \\ ⋮ \\ p_{x_{a}} (\cdot | x_{e}^{k}) \end{matrix}] \in R^{k \times k}, p_{x_{a}} (\cdot | x^{i}) \in R^{1 \times k} \end{matrix} \end{matrix}

$p_{x_a}(\cdot|x_e^i)$ $x_e^i$ .

$P(x_a)$ $x_a$ $x_e$ .

Notes: stacking is not necessary to be Cartesian/tensor product. The supporting nodes are not necessary to be uniform or evenly spaced. This assumption allows for sparse grid and other special grids.

$x$ arbitrary point $\mathcal{X}$ $z\in\mathcal{X}$ supporting nodes $z$ must be a point, but a point is not necessary to be a node.

Remark: Supporting nodes are necessary in either local interpolation methods or global methods. In global methods, these nodes are sampled and evaluated at to fit the interpolation.

Let's define stackings:

\begin{aligned} (46) & V (X) := [\begin{array}{c} v (x^{1}) \\ v (x^{2}) \\ ⋮ \\ v (x^{h}) \end{array}] \in R^{h \times 1}, V (Z) \in R^{n \times 1} \end{aligned}

$c(x)$ $u(c,x)$ $x_a'$ by its law of motion:

\begin{aligned} (47) & C (X) := [\begin{array}{c} c (x^{1}) \\ ⋮ \\ c (x^{h}) \end{array}] \in R^{h \times d_{c}}, C (Z) \in R^{n \times d_{c}} \\ (48) & U (X) := [\begin{array}{c} u (c (x^{1}), x^{1}) \\ u (c (x^{2}), x^{2}) \\ ⋮ \\ u (c (x^{h}), x^{h}) \end{array}] \in R^{h \times 1}, U (Z) \in R^{n \times 1} \\ (49) & X_{a}^{'} (X) := [\begin{array}{c} μ (x^{1}, c (x^{1})) \\ μ (x^{2}, c (x^{2})) \\ ⋮ \\ μ (x^{h}, c (x^{h})) \end{array}] = \vec{μ} (X, C (X)) \in R^{h \times d_{a}}, X_{a}^{'} (Z) \in R^{n \times d_{a}} \end{aligned}

$X := [x^1,x^2,\dots,x^h]^T \in \mathbb{R}^{h\times d}$ $h$ $d$ $x^i\in\mathcal{X}$ $Z:=[z^1,z^2,\dots,z^n]^T \in\mathbb{R}^{n\times d}$ $n$ $d$ -dimensional (unique) supporting nodes.

$v(\cdot):\mathbb{R}^{d}\to\mathbb{R}$ $\hat{v}(\cdot)$ $m-1$ must have the following form:

\begin{aligned} (50) & \hat{v} (x) := ϕ (x) \cdot θ \\ (51) & ϕ (x) \in R^{1 \times m}, θ \in R^{m \times 1} \end{aligned}

$\phi(\cdot)$ $\theta$ $h$ arbitrary points:

\begin{matrix} (52) & \begin{matrix} Φ (X) := [\begin{matrix} ϕ (x^{1}) \\ ϕ (x^{2}) \\ ⋮ \\ ϕ (x^{h}) \end{matrix}] \in R^{h \times m}, Φ (Z) \in R^{n \times m} \end{matrix} \end{matrix}

Thus, the stacked interpolated values can be linearly written as:

\begin{aligned} (53) & \hat{V} (X) = Φ (X) \cdot θ \\ (54) & \hat{V} (Z) = Φ (Z) \cdot θ \end{aligned}

$\Phi(Z) = I_n$ $\theta=V(Z)$ $I_n$ $n\times n$ identity matrix.

Now, let's interpolate and stack the expected value function (or Q-function). The expectation operation is defined as:

\begin{aligned} (55) & E {v (x^{'}) | x} := \sum_{j = 1}^{k} Pr {x_{e}^{'} | x_{e}} (x_{a}) \cdot v (x^{'}) = v_{e} (x_{a}^{'}, x_{a}, x_{e}) = v_{e} (x_{a}^{'}, x) : R^{d_{a}} \times R^{d} \to R \\ (56) & \approx {\hat{v}}_{e} (x_{a}^{'}, x) := ϕ_{e} (x_{a}^{'}, x) \cdot θ_{e} \\ (57) & ϕ_{e} (\cdot) \in R^{1 \times m_{e}}, θ_{e} \in R^{m_{e} \times 1} \end{aligned}

$v_e$ $d_a+d$ .

$\phi_e(x_a') \cdot \theta_e(x)$ vector $\theta_e(\cdot):\mathbb{R}^{d}\to\mathbb{R}^{\zeta}$ $\zeta$ $\ref{eq:017}$ $d_a+d$ dimensionality.

$\mathcal{X}_a$ $z$ $n$ $\mathcal{X}$ , which allows multiple resolution of the interpolation.

$\mathcal{A} := \mathcal{X}_a \times \mathcal{X} \subseteq \mathbb{R}^{d_a+d}$ $a=(x_a',x)\in\mathcal{A}$ $a_z = (z_a',z)$ $n_a^2$ such nodes. Then, let's define the stacked expected value function, and the stacked basis matrix of the expected value function:

\begin{aligned} (58) & V_{e} (A) : R^{h \times (d_{a} + d)} \to R^{h \times 1}, V_{e} (A_{z}) : R^{n_{a} n \times (d_{a} + d)} \to R^{n_{a} n \times 1} \\ (59) & Φ_{e} (A) \in R^{h \times m_{e}}, Φ_{e} (A_{z}) \in R^{n_{a} n \times m_{e}} \end{aligned}

Thanks to the linearity of the expectation operation, we can define the expectation operator as a row-vector such that:

\begin{aligned} (60) & v_{e} (x_{a}^{'}, x) = \sum_{j = 1}^{k} p_{x_{a}} (e^{j} | x_{e}) (x_{a}) \cdot v (x_{a}^{'}, e^{j}) = p_{x_{a}} (\cdot | x_{e}) \cdot [\begin{array}{c} v (x_{a}^{'}, e^{1}) \\ v (x_{a}^{'}, e^{2}) \\ ⋮ \\ v (x_{a}^{'}, e^{k}) \end{array}], e^{j} \in \underset{k Markov states}{\underset{⏟}{{x_{e}^{1}, \dots, x_{e}^{k}}}} \end{aligned}

$Z$ $A_z$ :

\begin{matrix} (61) & \underset{R^{n_{a} n \times 1}}{\underset{⏟}{V_{e} (A_{z})}} = E \circ \underset{R^{n \times 1}}{\underset{⏟}{V (Z)}} \end{matrix}

$E\in\mathbb{R}^{n_an \times n}$ $V_e(A_z)=E\cdot V(Z)$ $E$ $n$ $Z$ $p_{x_a}(\cdot|x_e)$ $A_z$ .

$E$ $Z$ nodes but also depends on the type of your grid. Thus, it is hard to write down mathematically (unless some special cases such as Cartesian nodes and states sorted lexicographically or anti-lexicographically), but it might be easy to implement in computer programs using Hash map or other efficient data structures.

Discretized system

$n(n_a+1)$ equations

\begin{aligned} (62) & V (Z) = U (Z) + β \cdot V_{e} (X_{a}^{'} (Z), Z) & Bellman equations, total: n \\ (63) & V_{e} (A_{z}) = E \cdot V (Z) & Q equations, total: n_{a} n \end{aligned}

$n$ optimization problems (optimization step):

\begin{matrix} (64) & X_{a}^{'} (Z) = \vec{μ} (Z, C (Z)), C (Z) = \arg max_{C} Hamiltonian(Z) \end{matrix}

Remark:
$n+n_a$ $P(x_a)$ $x_a$ . This happens in stationary equilibrium or purely exogenous risk.
$P$ $n_a$ times.

Plugging the interpolation in:

\begin{aligned} (65) & Φ (Z) \cdot θ = U (Z) + β \cdot Φ_{e} (X_{a}^{'} (Z), Z) \cdot θ_{e} \\ Φ_{e} (A_{z}) \cdot θ_{e} = E \cdot Φ (Z) \cdot θ \end{aligned}

$m+m_e$ $\vec\theta:=(\theta,\theta_e)$ .

Remark:
$m+m_e$ $n(n_a+1)$ .
$v$ , then global methods are always more preferred.

The system is pseudo linear in the coefficients. To solve this system using gradient-based methods such as Newton's method, we define:

\begin{aligned} (66) & F (\vec{θ}) := RHS (\vec{θ}) - LHS (\vec{θ}) = 0 \\ (67) & RHS (\vec{θ}) := [\begin{array}{c} U (Z) + β \cdot Φ_{e} (X_{a}^{'} (Z), Z) \cdot θ_{e} \\ E \cdot Φ (Z) \cdot θ \end{array}] \\ (68) & LHS (\vec{θ}) := [\begin{array}{c} Φ (Z) \cdot θ \\ Φ_{e} (A_{z}) \cdot θ_{e} \end{array}] \end{aligned}

$\ref{eq:030}$ $\Phi^{-1}(Z)$ $\Phi^{-1}_e(A_z)$ to both sides of the equation. The fixed-point format can be directly used to run some iterative algorithms.

There is Jacobian:

\begin{aligned} \nabla F (\vec{θ}) = & [\begin{array}{c} 0_{m \times m} & β \cdot Φ_{e} (X_{a}^{'} (Z), Z) \\ E \cdot Φ (Z) & 0_{m_{e} \times m_{e}} \end{array}] - [\begin{array}{c} Φ (Z) & 0_{m_{e} \times m_{e}} \\ 0_{m \times m} & Φ_{e} (A_{z}) \end{array}] \\ (69) & = & [\begin{array}{c} - Φ (Z) & β \cdot Φ_{e} (X_{a}^{'} (Z), Z) \\ E \cdot Φ (Z) & - Φ_{e} (A_{z}) \end{array}] \end{aligned}

With the analytical Jacobian, one can efficiently solve the whole problem in a reasonable time.

$\nabla \mathbf{F}(\vec\theta)$ $\Phi_e(X_a'(Z),Z)$ $X_a'(Z)$ $U(Z)$ $\mathbf{F}(\vec\theta)$ , is the bottleneck of the whole algorithm's performance.

Topic: Chocie of interpolation methods

The choice of interpolation methods becomes particularly important as the dimensionality of the Bellman equation increases. Since the number of interpolation coefficients must match the number of supporting nodes to exactly determine the coefficients, the projected pseudo-linear system remains subject to the curse of dimensionality (not only the number of states, but also the extra dimensionality due to the endogenous risk!). Thus, one must carefully trade off between different interpolation methods based on their performance characteristics. The following table compares the memory demand of local and global interpolation methods generally given the same accuracy demand.

Method	Required number of nodes	Basis matrix sparsity
Local	Large	Sparse
Global	Not that large	Dense (usually)

The following table compares some popular interpolation methods:

Method	Type	Time Complexity	Memory Complexity	Notes
Piecewise Linear	Local	$O(n)$	$O(n)$	Simple, fast, but less smooth; scales well for higher dimensions.
Cubic Spline	Local	$O(n)$	$O(n)$	Provides smoother approximations than linear interpolation but may struggle in very high dimensions.
Polynomial Basis	Global	$O(n^3)$	$O(n^2)$	Highly smooth but computationally expensive; impractical for high dimensions.
Chebyshev Polynomials	Global	$O(n \log n)$	$O(n)$	Offers better numerical stability and smoothness; suitable for moderate dimensions.
Radial Basis Functions (RBF)	Global	$O(n^2)$	$O(n^2)$	Flexible and accurate but memory-intensive; more suitable for low-dimensional problems.

One can quickly realize that the trade-off does not break the curse of dimensionality. To really break it, one can explore other dimensionality reduction techniques such as

Sparse grid: This method uses the idea of optimizing the grid structure. It reduces the number of nodes while keeping the sparsity of the basis matrix. If incorporating node adaption, then the number of required nodes can be further cut
Machine learning: This method employs a highly efficient interpolation/approximation technique, where automatic differentiation and backpropagation can significantly accelerate the iteration process. In fact, it represents a generalized approach to the traditional collocation method.

Topic: Recipe of Newton's method

$\ref{eq:030}$ $\ref{eq:034}$ ) to the solver. However, when someone has to write their own iterator, the following simple updating formula is always good to recall:

\begin{matrix} (70) & {\vec{θ}}^{'} = \vec{θ} - {(\nabla F (\vec{θ}))}^{- 1} \cdot F (\vec{θ}) \end{matrix}

$m+m_e$ is large, one can check some alternative methods such as Newton-Krylov Method.

Topic: Optimization step

Even though collocation significantly accelerates the convergence of the entire iteration, 90% of the computational costoptimization step $m + m_e$ every $U(C(Z),Z)$ $X_a'(Z)$ .

$\mathcal{C}$ :

It is fast if there is only one control variable (e.g., consumption) and a linear search can handle it efficiently.
It is manageable with two control variables (e.g., consumption and leisure), where the simplex method can still work reasonably well in unconstrained case and with space rectangularization.
It becomes very slowmore than three control variables $\mathcal{C}$ non-convex $C(Z)$ $X_a'(Z)$ is not ensured.

Therefore, one must carefully select the optimizer while verifying the interpolation properties used in collocation. For example, the MATLAB function fmincon is powerful and robust, but it may fail to converge within a reasonable time if the interpolation method causes numerical oscillations or destroys the convexity of the Hamiltonian.