Collocation with the presence of endogeneous Markovian risk
Tianhao Zhao
Introduction
The collocation method is widely used in macroeconomic modeling to approximate value functions (or policy functions) through interpolation techniques. This approach offers two key advantages:
Many interpolation methods provide greater smoothness than piecewise linear interpolation while requiring fewer supporting nodes.
By projecting the original problem onto a pseudo-linear one in the space of interpolation coefficients, the collocation method enables analytical evaluation of the Jacobian matrix for the approximated system. This Jacobian facilitates the use of gradient-based algorithms, such as Newton’s method, to accelerate convergence.
Despite its advantages, many textbooks and online materials overlook an important scenario: cases where the transition matrix (often assumed to describe a Markov process governing uncertainty in dynamic programming) is exogenous but depends on the current state of the system. This scenario, involving endogenous risk, is highly relevant in economic applications. For instance, an individual’s unemployment risk in general equilibrium may depend on the aggregate unemployment rate, which itself evolves as an aggregate state. Addressing such cases requires specific formulations and methodological adjustments to extend the collocation method.
In this post, I first establish the notation for collocation and outline the construction of the discretized system. I emphasize that endogenous risk, compared to standard exogenous risk, significantly increases the number of equations required to interpolate the expected value function. Specifically, the size of the system expands proportionally to the number of supporting nodes for endogenous states. I then discuss how to solve the resulting pseudo-linear system and compute its Jacobian matrix analytically. Finally, I explore several topics for efficiently implementing the collocation method.
Notation
Hint:
- I do my best to label the dimension of each vector and matrix such that one can easily translate the math notations to their programs without wasting time in figuring out potential dimesion mismatches.
- A vector space $\mathbb{R}^{n}$ should be always translated to column vector in computer programming.
We consider solving the following time-invariant Bellman equation:
$$ \begin{aligned} & v(x) = \max_{c} u(c,x) + \beta \mathbb{E} { v(x’) | x } & \text{(Bellman equation)} \ \text{s.t. }& c \in \mathcal{C} \subseteq \mathbb{R}^{d_c} & \text{(admissible space)} \ & x \in \mathcal{X} \subseteq \mathbb{R}^{d} & \text{(state space & state constraints)} \ & v(x) : \mathcal{X} \to \mathbb{R} & \text{(value function)} \ & x := (x_a, x_e) & \text{(state sorting)} \ & x_a \in \mathbb{R}^{d_a}, x_e \in \mathbb{R}^{d_e}, d_a + d_e = d & \text{(endo & exogenous states split)} \ & x_a’ = \mu(x,c) & \text{(endo law of motion)} \ & x_e \sim \text{MarkovProcess}(x) &\text{(exogenous law of motion)} \ & \Pr{ x_e’ | x_e } = \Pr{ x’_e | x_e }(x_a) & \text{(Markovian)} \end{aligned} $$
where $x_a$ is the sub-vector of endogeneous states, $x_e$ is the sub-vector of exogenous states. The whole law of motion of $x$ consists of $\mu(x,c)$ and the Markov process for $x_e$. The exogenous states $x_e$ follow a $k$-state Markov chain (or any process that can be reasonably approximated by such a Markov chain) with transition matrix $P(x_a)\in\mathbb{R}^{k\times k}$. For convenience of algebra, we sort the dimensions of $x$ by having the $d_e$ exogenous states $x_e$ in the very end of the state vector.
Remark: The state vector $x$, in general equilibrium, consists of individual states AND aggregate states simultaneously.
Here we assume the transition probability $\Pr{x_e’|x_e}(x_a)$ is a function of endogenous states $x_a$. It locates in the generic case in which the “risk” depends on some endogenous states (e.g. in general equilibrium with endogenous unemployment risk). And for convenience, we split the ransition matrix $P(x_a)$ row by row:
$$ P(x_a) := \begin{bmatrix} p_{x_a}(\cdot|x_e^1) \ p_{x_a}(\cdot|x_e^2) \ \vdots \ p_{x_a}(\cdot|x_e^k) \end{bmatrix} \in \mathbb{R}^{k\times k}, p_{x_a}(\cdot|x^i) \in \mathbb{R}^{1\times k} $$
where $p_{x_a}(\cdot|x_e^i)$ is the distribution vector conditional on current exogenous states $x_e^i$.
Attention: the transition matrix $P(x_a)$ depends on endogenous states $x_a$ but the matrix itself determines the next period’s exogenous states $x_e$.
Notes: stacking is not necessary to be Cartesian/tensor product. The supporting nodes are not necessary to be uniform or evenly spaced. This assumption allows for sparse grid and other special grids.
Notation: In the following sections, I keep using $x$ to denote an arbitrary point in the state space $\mathcal{X}$, and using $z\in\mathcal{X}$ to denote supporting nodes of the interpolation. A node $z$ must be a point, but a point is not necessary to be a node.
Remark: Supporting nodes are necessary in either local interpolation methods or global methods. In global methods, these nodes are sampled and evaluated at to fit the interpolation.
Let’s define stackings:
$$ \begin{align} & V(X) := \begin{bmatrix} v(x^1) \ v(x^2) \ \vdots \ v(x^h) \end{bmatrix} \in \mathbb{R}^{h\times 1}, V(Z) \in \mathbb{R}^{n\times 1} \end{align} $$
For convenience, let’s also define the stacking of policy $c(x)$, instantaneous utility $u(c,x)$, and the new endogenous states $x_a’$ by its law of motion:
$$ \begin{align} & C(X) := \begin{bmatrix} c(x^1) \ \vdots \ c(x^h) \end{bmatrix} \in \mathbb{R}^{h\times d_c}, C(Z) \in \mathbb{R}^{n\times d_c} \ % ——— & U(X) := \begin{bmatrix} u(c(x^1),x^1) \ u(c(x^2),x^2) \ \vdots \ u(c(x^h),x^h) \end{bmatrix}\in\mathbb{R}^{h\times 1}, U(Z) \in \mathbb{R}^{n\times 1} \ % ——— & X_a’(X) := \begin{bmatrix} \mu(x^1,c(x^1)) \ \mu(x^2, c(x^2)) \ \vdots \ \mu(x^h,c(x^h)) \end{bmatrix} = \vec\mu(X,C(X)) \in \mathbb{R}^{h\times d_a}, X’_a(Z) \in \mathbb{R}^{n\times d_a} \end{align} $$
Let $X := [x^1,x^2,\dots,x^h]^T \in \mathbb{R}^{h\times d}$ be an arbitrary $h$ stacking of $d$-dimensional points where each $x^i\in\mathcal{X}$. A special case of the stacking is $Z:=[z^1,z^2,\dots,z^n]^T \in\mathbb{R}^{n\times d}$ which is an $n$ stacking of $d$-dimensional (unique) supporting nodes.
Let $v(\cdot):\mathbb{R}^{d}\to\mathbb{R}$ be the value function to solve. An interpolant $\hat{v}(\cdot)$ of degree $m-1$ must have the following form:
$$ \begin{align} & \hat{v}(x) := \phi(x) \cdot \theta \ & \phi(x) \in \mathbb{R}^{1\times m}, \theta \in \mathbb{R}^{m\times 1} \end{align} $$
where $\phi(\cdot)$ is the basis vector and $\theta$ is the interpolation coefficient vector. Let’s define stakced basis matrix evaluated at $h$ arbitrary points:
$$ \Phi(X) := \begin{bmatrix} \phi(x^1) \ \phi(x^2) \ \vdots \ \phi(x^h) \end{bmatrix} \in {\mathbb{R}^{h\times m}}, \Phi(Z) \in \mathbb{R}^{n\times m} $$
Thus, the stacked interpolated values can be linearly written as:
$$ \begin{align} & \hat{V}(X) = \Phi(X) \cdot \theta \ & \hat{V}(Z) = \Phi(Z) \cdot \theta \end{align} $$
Hints: For (univariate or multivariate) piecewise linear interpolation, $\Phi(Z) = I_n$ and $\theta=V(Z)$ where $I_n$ is the $n\times n$ identity matrix.
Now, let’s interpolate and stack the expected value function (or Q-function). The expectation operation is defined as:
$$ \begin{align} & \mathbb{E}{ v(x’)|x } := \sum_{j=1}^k \Pr{x_e’|x_e}(x_a) \cdot v(x’) = v_e(x_a’,x_a,x_e) = v_e(x_a’,x) :\mathbb{R}^{d_a} \times \mathbb{R}^{d} \to \mathbb{R} \ & \approx \hat{v}_e (x_a’,x) := \phi_e(x_a’,x) \cdot \theta_e \label{eq:017} \ & \phi_e(\cdot) \in \mathbb{R}^{1\times m_e}, \theta_e \in \mathbb{R}^{m_e \times 1} \end{align} $$
where the dimensionality of $v_e$ is doubled to $d_a+d$.
Remark: An alternative way of defining the interpolant is $\phi_e(x_a’) \cdot \theta_e(x)$ which fits many scenarios. However, this alternative method requires further interpolation of a vector-valued function $\theta_e(\cdot):\mathbb{R}^{d}\to\mathbb{R}^{\zeta}$. Its dimensionality $\zeta$, which depends on the number of supporting nodes, is infeasibly high. Our current implementation of Eq ($\ref{eq:017}$) works better at a cost of $d_a+d$ dimensionality.
Remark: The node set of $\mathcal{X}_a$ can be different from the set of partial $z$ from the $n$ nodes in $\mathcal{X}$, which allows multiple resolution of the interpolation.
For convenience, we define the tensor space of endogenous states $\mathcal{A} := \mathcal{X}_a \times \mathcal{X} \subseteq \mathbb{R}^{d_a+d}$ and use notation $a=(x_a’,x)\in\mathcal{A}$ to denote an arbitray joined point; and use $a_z = (z_a’,z)$ to denote a point that is joined by two endogenous state nodes. There are $n_a^2$ such nodes. Then, let’s define the stacked expected value function, and the stacked basis matrix of the expected value function:
$$ \begin{align} & V_e(A) : \mathbb{R}^{h\times (d_a+d)} \to \mathbb{R}^{h\times 1}, V_e(A_z): \mathbb{R}^{n_an\times (d_a+d)} \to \mathbb{R}^{n_an\times 1} \ & \Phi_e(A) \in \mathbb{R}^{h \times m_e}, \Phi_e(A_z) \in \mathbb{R}^{n_an \times m_e} \end{align} $$
Thanks to the linearity of the expectation operation, we can define the expectation operator as a row-vector such that:
$$ \begin{align} & v_e(x_a’,x) = \sum_{j=1}^k p_{x_a}(e^j|x_e)(x_a) \cdot v(x_a’,e^j) = p_{x_a}(\cdot |x_e) \cdot \begin{bmatrix} v(x_a’, e^1) \ v(x_a’, e^2) \ \vdots \ v(x_a’, e^k) \end{bmatrix}, e^j\in \underbrace{ { x_e^1, \dots, x_e^k } }_{k \text{ Markov states}} \end{align} $$
Intuitively, for the following stacked mapping from node stacking $Z$ to node stacking $A_z$:
$$ \underbrace{V_e(A_z)}{\mathbb{R}^{n_an\times 1}} = \mathbb{E} \circ \underbrace{V(Z)}{\mathbb{R}^{n\times 1}} $$
one can define an expectation matrix $E\in\mathbb{R}^{n_an \times n}$ such that $V_e(A_z)=E\cdot V(Z)$. Each row of $E$ picks needed nodes from the $n$ nodes of $Z$ and does weighted summation using the corresponding distribution vector $p_{x_a}(\cdot|x_e)$ according to the current node of $A_z$.
Remark: The construction of $E$, personally speaking, is the hardest part of collocation. Its structure not only depends on the stacking order of your $Z$ nodes but also depends on the type of your grid. Thus, it is hard to write down mathematically (unless some special cases such as Cartesian nodes and states sorted lexicographically or anti-lexicographically), but it might be easy to implement in computer programs using Hash map or other efficient data structures.
Discretized system
The discretized system is a pseudo linear system of total $n(n_a+1)$ equations
$$ \begin{align} & V(Z) = U(Z) + \beta \cdot V_e(X_a’(Z),Z) & \text{Bellman equations, total: }n \ & V_e(A_z) = E \cdot V(Z) & \text{Q equations, total: } n_an \end{align} $$
accompoanied by $n$ optimization problems (optimization step):
$$ X_a’(Z) = \vec\mu(Z,C(Z)), C(Z) = \arg\max_C \text{Hamiltonian(Z)} $$
Remark:
- The number of equations degenerates to $n+n_a$ if $P(x_a)$ is no longer depending on endogeneous states $x_a$. This happens in stationary equilibrium or purely exogenous risk.
- The endogeneity of the risk $P$ dramatically increases the number of equations by approximately $n_a$ times.
Plugging the interpolation in:
$$ \begin{align} & \Phi(Z) \cdot \theta = U(Z) + \beta \cdot \Phi_e(X’_a(Z),Z) \cdot \theta_e \label{eq:030} \ & \Phi_e(A_z) \cdot \theta_e = E \cdot \Phi(Z) \cdot \theta \nonumber \end{align} $$
There are $m+m_e$ equations about the inteprolation coefficients $\vec\theta:=(\theta,\theta_e)$.
Remark:
- For both global and local interpolation methods, $m+m_e$ is usually equal to $n(n_a+1)$.
- Thus, if there is no discontinuity of $v$, then global methods are always more preferred.
The system is pseudo linear in the coefficients. To solve this system using gradient-based methods such as Newton’s method, we define:
$$ \begin{align} & \mathbf{F}(\vec\theta) := \text{RHS}(\vec\theta) - \text{LHS}(\vec\theta) = 0 \ & \text{RHS}(\vec\theta) := \begin{bmatrix} U(Z) + \beta \cdot \Phi_e(X’_a(Z),Z) \cdot \theta_e \ E \cdot \Phi(Z) \cdot \theta \end{bmatrix} \ & \text{LHS}(\vec\theta) := \begin{bmatrix} \Phi(Z) \cdot \theta \ \Phi_e(A_z) \cdot \theta_e \end{bmatrix} \end{align} $$
Remark: The Eq ($\ref{eq:030}$) can be converted to a fixed-point format by left multiplying $\Phi^{-1}(Z)$ and $\Phi^{-1}_e(A_z)$ to both sides of the equation. The fixed-point format can be directly used to run some iterative algorithms.
There is Jacobian:
$$ \begin{align} \nabla \mathbf{F}(\vec\theta) =& \begin{bmatrix} \mathbf{0}{m\times m} & \beta \cdot \Phi_e(X’a(Z),Z) \ E \cdot \Phi(Z) & \mathbf{0}{m_e \times m_e} \end{bmatrix} - \begin{bmatrix} \Phi(Z) & \mathbf{0}{m_e\times m_e} \ \mathbf{0}_{m\times m} & \Phi_e(A_z) \end{bmatrix} \nonumber \ % ——- =& \begin{bmatrix} -\Phi(Z) & \beta \cdot \Phi_e(X’_a(Z),Z) \ E \cdot \Phi(Z) & -\Phi_e(A_z) \end{bmatrix} \label{eq:034} \end{align} $$
With the analytical Jacobian, one can efficiently solve the whole problem in a reasonable time.
Remark: One can quickly figure out that most terms of $\nabla \mathbf{F}(\vec\theta)$ can be pre-conditioned except the basis matrix $\Phi_e(X_a’(Z),Z)$. The reason is that the new endogenous states $X_a’(Z)$ change by iteration and evaluating it requires doing the optimization step. The optimization step, which also determines $U(Z)$ of every iteration in $\mathbf{F}(\vec\theta)$, is the bottleneck of the whole algorithm’s performance.
Topic: Chocie of interpolation methods
The choice of interpolation methods becomes particularly important as the dimensionality of the Bellman equation increases. Since the number of interpolation coefficients must match the number of supporting nodes to exactly determine the coefficients, the projected pseudo-linear system remains subject to the curse of dimensionality (not only the number of states, but also the extra dimensionality due to the endogenous risk!). Thus, one must carefully trade off between different interpolation methods based on their performance characteristics. The following table compares the memory demand of local and global interpolation methods generally given the same accuracy demand.
Method | Required number of nodes | Basis matrix sparsity |
---|---|---|
Local | Large | Sparse |
Global | Not that large | Dense (usually) |
The following table compares some popular interpolation methods:
Method | Type | Time Complexity | Memory Complexity | Notes |
---|---|---|---|---|
Piecewise Linear | Local | $O(n)$ | $O(n)$ | Simple, fast, but less smooth; scales well for higher dimensions. |
Cubic Spline | Local | $O(n)$ | $O(n)$ | Provides smoother approximations than linear interpolation but may struggle in very high dimensions. |
Polynomial Basis | Global | $O(n^3)$ | $O(n^2)$ | Highly smooth but computationally expensive; impractical for high dimensions. |
Chebyshev Polynomials | Global | $O(n \log n)$ | $O(n)$ | Offers better numerical stability and smoothness; suitable for moderate dimensions. |
Radial Basis Functions (RBF) | Global | $O(n^2)$ | $O(n^2)$ | Flexible and accurate but memory-intensive; more suitable for low-dimensional problems. |
One can quickly realize that the trade-off does not break the curse of dimensionality. To really break it, one can explore other dimensionality reduction techniques such as
- Sparse grid: This method uses the idea of optimizing the grid structure. It reduces the number of nodes while keeping the sparsity of the basis matrix. If incorporating node adaption, then the number of required nodes can be further cut
- Machine learning: This method employs a highly efficient interpolation/approximation technique, where automatic differentiation and backpropagation can significantly accelerate the iteration process. In fact, it represents a generalized approach to the traditional collocation method.
Topic: Recipe of Newton’s method
To solve Eq ($\ref{eq:030}$), one can call any standard solvers (e.g. MATLAB’s optimization toolbox) and provide Eq ($\ref{eq:034}$) to the solver. However, when someone has to write their own iterator, the following simple updating formula is always good to recall:
$$ \vec\theta’ = \vec\theta - \left(\nabla \mathbf{F}(\vec\theta) \right)^{-1} \cdot \mathbf{F}(\vec\theta) $$
To improve the efficiency by avoiding inverting the Jacobian matrix when $m+m_e$ is large, one can check some alternative methods such as Newton-Krylov Method.
Topic: Optimization step
Even though collocation significantly accelerates the convergence of the entire iteration, 90% of the computational cost is still concentrated in the optimization step, where one must solve $m + m_e$ embarrassingly differentiated optimization problems in every iteration to get the instantaneous utility $U(C(Z),Z)$ and new endogenous states $X_a’(Z)$.
The performance depends heavily on the number of control variables and the complexity of the admissible space $\mathcal{C}$:
- It is fast if there is only one control variable (e.g., consumption) and a linear search can handle it efficiently.
- It is manageable with two control variables (e.g., consumption and leisure), where the simplex method can still work reasonably well in unconstrained case and with space rectangularization.
- It becomes very slow when there are more than three control variables, especially if constraints make the admissible space $\mathcal{C}$ non-convex. And often, even converngence is ensured, the smoothness of $C(Z)$ and $X_a’(Z)$ is not ensured.
Therefore, one must carefully select the optimizer while verifying the interpolation properties used in collocation. For example, the MATLAB function fmincon
is powerful and robust, but it may fail to converge within a reasonable time if the interpolation method causes numerical oscillations or destroys the convexity of the Hamiltonian.