The important paper by Barles and Souganidis (1991) provides a powerful framework to analyze a class of 2nd-order non-linear PDEs which are commonly used in economics. Among all the required conditions, the monotonicity of the numerical scheme is critical and often violated in practice. This blog post explains how to analyze the monotonicity of a wide class of numerical schemes with multiple examples that share similar structures with some commonly used economics models. One can follow the examples to design their own numerical schemes.
Notation
A class of 2nd-order non-linear PDE
Before discussing the monotonicity of numerical schemes, let’s set up the problem and introduce the notations. Let’s consider a generic non-linear PDE that is abstracted from an infinite-horizon stochastic optimal control problem with discounting:
where:
is the vector of state variables where represents the -th dimension element of the vector
is the drift function
is the volatility function that is an element of the volatility matrix
are the Jacobian and Hessian of function
is the flow payoff/utility function
is the discounting rate
is a 1-dimensional Brownian motion
is a collection of boundary conditions as functionals
Remark:
A generic volatility matrix is of size in which there are states and a -dimensional Brownian motion. WLOG, the above formulation assumes a full-size volatility matrix such that the dimension of the Brownian motion is which is exactly the dimension of the state variable . Neither “wide” () or “long” () shape of the volatility matrix does not change the conclusion.
If is a functional and includes infinite-dimensional objects (e.g. distribution function), then the conclusion of this post still holds. The above formulation can be seen as a finite-moment approximation of such problems.
In the left part of this post, the long function arguments are ignored if they don’t affect the interpretation. Meanwhile, for convenience, let’s define the following clamping functions:
Discretization & grid
To numerically solve this PDE, we need to discretize it over a hyper rectangle . In the case of linear state constraints (such as borrowing constraints), one can always rectanglize the computational domain by redefining . We don’t discuss the case of state constraints that cannot be rectanglized.
A standard even-spaced dense grid can be defined by the number of grid points along each dimension: . The total number of grid points is . To index a specific grid point, we use multi-index where for each . Let be a unit vector in which all elements are zero except the -th element being 1. And let be the grid mesh size of dimension . Thus, the first right neighbor along dimension of the current grid point can be denoted as .
For a more generic type of grid representation, we use a set of finite discrete supporting nodes . We will discuss the generic interpolation supported by .
Stacking
To well-define the linear algebra we will use later, we need an order of the elements in . For even-spaced dense grid , the stacking is done by Cartesian product. This leads to a lexicographical sorting of the multi-indices . A vector of the sorted multi-indices is denoted by . For the more generic type of grid, one can always determine an order of the stacking. We use the same notation .
Remark: an alternative way of stacking representation is to use a vector of supporting nodes rather than “set+sorting”. The reason of the split here is to: 1. represent neighbor nodes in the space; 2. accommodate the fact that users have to manually define the sorting and keep it consistent in their computer programs.
In general, a grid (structure) can be represented by a 2-tuple of supporting node set and the stacking order.
Similarly, the stacking of function value at grid nodes sorted by can be represented by , or a vector .
Interpolation & stencil
An interpolation of function is the linear combination of basis functions
where is the degree of the interpolation; is called an interpolant; and is the interpolation coefficient. In this post, we particularly discuss a class of linear interpolations in which is the function value evaluated at supporting nodes from set . Readers can quickly verify that most conclusions also hold for more generic types of interpolations in which is not necessarily to be the value of at the supporting nodes (e.g. orthogonal polynomials).
A stencil is a “template” of linear operations. Consider the following degree interpolation:
where is a constant which can be seen as ; is supporting node. We call the stacking basis vector a basis stencil which can fully determine the interpolated value at point given the grid . The introduction of constant term is to accommodate the strategy of hypothetical (ghost) outside nodes.
For interpolations where the coefficient is not the function value at supporting nodes, we can define a similar structure and also the basis stencil:
Remark: There are two common methods to handle boundary conditions:
Assume the boundary conditions hold exactly on the boundary nodes
Assume there exist hypothetical nodes outside but close to the boundary nodes. Assume boundary conditions hold on these hypothetical nodes.
The 2nd method is usually employed to generate solutions with smoother boundary behaviors.
The defined stencil representation allows for standard arithmetic such as addition, deduction, and scalars multiplication.
Monotonicity of numerical schemes
For second-order elliptic and parabolic PDEs, the equation's ellipticity determines both the system's tendency to reach a stationary state and its smoothness. In finite difference methods, a numerical scheme for a PDE consists of difference equations applied to discretized nodes and function values, approximating the original PDE. Within this framework, the monotonicity of a scheme is crucial for ensuring numerical stability and smoothness, and these properties are closely related.
In their seminal paper, Barles and Souganidis (1991) outline sufficient conditions for a numerical scheme to converge locally to the true solution, including monotonicity, consistency, and uniform stability. While consistency and uniform stability are typically easier to achieve in economic models, maintaining monotonicity requires special attention, as its absence is a common cause of failure in many proposed methods. In this post, we assume the other conditions are met and focus on the issue of monotonicity.
Consider a numerical scheme of for the boundary problem Eq ()-():
where is the explicitly written time dimension which is used to update the guess across iterations; and represent the stacked discretized Jacobian and Hessian, respectively which can be approximated by with finite difference.
This equation of consolidates all components of the discretized boundary problem equations onto one side. Monotonicity then requires that be monotonically increasing with respect to but monotonically decreasing with respect to and all other points in the stack , i.e., the neighbors of .
Example (Neo-classical growth model)
The neo-classical growth model:
is stationary (thus the time derivatives term ). By plugging in the policy function , we get a non-linear elliptic PDE:
By adding back the always-zero time derivative term, we make it a non-linear parabolic PDE:
Suppose now we use forward difference along the dimension to approximate and use forward difference along the time dimension to approximate at an inner point given discretization , there is:
which is a difference equation about and its neighbor points along dimensions. Rearranging the RHS to the LHS, we then get the same format as . The example scheme in Eq () is a non-linear scheme.
Remarks:
The monotonicity is required also on the boundary condition Eq (), which is often ignored in practice.
A commonly used alternative way of writing this (if applicable) is to split the terms about on one side of the equation, and put the left terms about to the other side. Then, both sides of the equation require monotonically increasing. We will use this style of organizing terms in the left part of this post.
The example scheme in Eq () is a one (time) layer scheme because by standing at a moment, we only use and information. Some more complex scheme may use more moments information. For example, Crank-Nicolson (two layer implicit method), "leapfrog" method (three layer explicit method), and Alternating direction implicit (ADI) method (multi layer method implicit method, fraction layers).
Explicit (forward Euler) method & CFL condition
Naive case
The explicit (forward Euler) method is one of the most intuitive schemes, relying solely on the previous iteration's guess to update the solution. It is called "forward" because it applies a forward difference along the time dimension. In the context of HJB equation, this approach typically avoids the need to solve any linear or nonlinear systems, requiring only a loop over each grid node. So it is easy to do parallelization. However, this method usually demands a small and adaptively adjusted time step to maintain stability, resulting in the so-called Courant–Friedrichs–Lewy (CFL) condition.
We illustrate the forward Euler method using the neo-classical growth model in Eq (). For convenience, we use solely to index the spatial dimension , while a prime mark indicates the forward direction in time. This notation is permissible since the scheme involves only one time layer. For instance, refers to the right/forward neighbor at time of the current node at time , with .
In Eq (), we provided an example of a fully non-linear scheme which is very hard to characterize its property. However, this is not the only approach. Due to the structure of the HJB equation, a common alternative is to treat the flow utility and drift terms as scalar functions without discretizing them. This renders the PDE linear with respect to the partial derivatives of the unknown function. Consistent with the directional choice in Eq (), we continue to use forward differences along the dimension. Thus,
Rearranging the scheme:
The monotonicity requires:
while there is no special restriction on if we do not discretize this term with and other nodes. The first inequality always holds because for sure. The CFL condition comes from the 2nd inequality which implies :
Eq () is in the typical form of CFL condition that we could see from the textbook. It is easy to see that quickly as increasing, which leads to slow convergence of the iteration.
The monotonicity of Eq () scheme depends on the computation domain : larger the space, smaller time step required, then slower convergence of the algorithm, so on until being infeasible. Meanwhile, The inequality of Eq () is irrelevant to such that the scheme is monotonic only in the region of net saving. These bad properties show that Eq () is not a working scheme. We need to find a strategy to improve Eq ().
Upwind scheme example: flux terms
In the context of second-order elliptic, parabolic, or hyperbolic PDEs, there is a strategy to improve a scheme’s monotonicity called the upwind scheme. This terminology reflects the physical intuition behind the approach, particularly for heat or wave equations. Here, the direction of differencing is set against the direction of wave or heat propagation, ensuring that information from the "origin of the wave/heat" (considered more crucial) is accurately incorporated. Mathematically, this strategy involves clamping the flux (drift) terms, which act as coefficients for the partial derivative terms, to ensure they remain positive or negative as required.
Let’s stick to the growth model example of Eq (), but this time we discretize the flux term as:
For an arbitrary value of the drift , only one direction of differencing is acceptable, or else , indicating a stationary state. But why Eq () improves monotonicity? If we simplify Eq (), there is:
Noticing that all the terms of Eq () appear on the RHS of the discretization. Specifically, let’s apply upwind scheme to the growth model example to get a full picture of this:
The monotonicity conditions for , , and are automatically satisfied. However, to ensure the monotonicity condition holds for the term, we need to solve for the CFL condition with respect to .
Compared to Eq (), applying the upwind scheme enables us to ensure monotonicity simply by adjusting the time step . The now can handle both net saving and de-saving cases. Soon, we will demonstrate that the upwind scheme is still effective in the multi-dimensional case.
Remark: The illustrative examples by Benjamin Moll suggests an estimate of , which works well for small . If you expect your discounting rate is significantly larger than the usual choice (e.g. ), then a safer estimate is .
Upwind scheme example: diffusion terms
HJB equation is a 2nd-order PDE where there could be diffusion terms as Eq (). We have shown how to apply upwind scheme onto the flux terms. Now, let’s discuss how to apply upwind scheme to the diffusion terms. WLOG, let’s consider a stochastic neo-classical growth model:
where represents the diffusion term in this equation. As before, we do not discretize but instead discretize . Similarly, we use a three-point finite difference, given by . By clamping the (co)variance , we have:
Readers can instantly figure out the unconditional monotonicity of and terms. Then, collecting the terms here and the terms from the flux terms of state and , we have the following CFL condition:
One may have realized that the CFL condition, in the case of explicit scheme + upwind scheme, has a formula for generic models as Eq ():
As we noted at the beginning of this section, the explicit method requires progressively smaller time steps as the dimensionality of the problem and the size of the computational space increase. For a medium-sized problem, this typically means millions of iterations are needed to achieve convergence.
Implicit (backward Euler) method
Compared to the explicit method, the implicit method, or backward Euler method, is often preferred in practice. For the HJB equation with an evenly spaced dense grid, this scheme is unconditionally monotonic for any size of when the upwind scheme is applied. However, the trade-off with the implicit method is that it requires solving a linear or nonlinear system at each iteration, which can be large but sparse in most cases.
The explicit method is “forward” in the meaning of forwardly difference along the time dimension (standing at moment ). The implicit method is “backward” such that we backwardly difference along the time dimension, but we are no longer standing at moment but standing at moment and looking back at moment .
A fully implicit scheme is one in which only uses information from moment , while all other discretizations along spatial dimensions use information from moment . This type of scheme typically results in nonlinear schemes (with respect to partial derivatives), which are challenging to analyze. Let’s use the deterministic growth model example again, but this time we have:
Observe that in the fully implicit scheme, both and are evaluated at moment . These two terms are highly nonlinear in . We cannot take them as scalar functions as before because we cannot evaluate them without the knowledge about . Their partial derivatives with respect to and its neighboring values being extremely complex. In most economic models, analyzing these derivatives is impractical.
Remark: We did not apply upwind scheme to Eq (). However, readers can quickly figure out that upwind scheme does not work here due to the high non-linearity of and .
Thus, we focus on a class of quasi-implicit methods, where we evaluate the flow utility term, flux terms, and diffusion terms using the estimate at moment , while leaving the partial derivatives at moment . This scheme corresponds precisely to the “implicit method” used by Benjamin Moll and other economists. Under such a quasi-implicit scheme and applying upwind scheme, the growth model is discretized as:
All coefficients of , , and its spatial neighbors satisfy the monotonicity condition unconditionally, regardless of the size of the time step . When collecting all difference equations (including the boundary conditions), one can obtain a linear system about the nodes at moment . Solving this linear system updates the guess of for time-invariant HJB (Markovian), or update the guess of moment for finite-horizon HJB equations.
Remark: 1. In later sections of this post, we briefly introduce how to check the monotonicity when a transition matrix is available. The check procedure suggests how to adaptively adjust the time step size per iteration. 2. A transition matrix is a linear approximation of the infinitesimal generator operator such that ; it is basically a stacking of the flux & covariance terms in our previous examples.
Non-monotonicity of a class of approximation-based methods
The standard finite difference method discussed above is non-parametric, relying on a dense grid with no dependencies between any two nodes in . However, this dense grid approach is highly susceptible to the curse of dimensionality, becoming quickly infeasible for three-dimensional and higher-dimensional problems. One may consider function interpolation or approximation methods (to approximate ), which offer a wide range of choices. However, it is important to recognize that when these approximations of are combined with finite difference methods, the monotonicity condition often fails to hold for a large class of these function approximation methods, even when the implicit method and upwind scheme are applied. This section is an intuitive illustration of the proof idea in Garcke and Ruttscheidt (2019).
We consider the two types of interpolation as outlined in the notation section.
First type
The first type uses interpolation coefficients that represent the function values at supporting nodes from the set . This type of interpolation can be abstracted as Eq (). Some examples: local methods such as piecewise polynomial, spline, etc. The multi-dimensional piecewise linear (aka multi-linear) interpolation exactly fits into the discussion of this sub-section.
The key idea here is that any approximated value can be represented as a linear combination of all supporting . Taking this a step further, any linear operation on the approximated function values (e.g., finite difference) among arbitrary points can be represented by a basis stencil. For example:
where , as previously arranged, is the stack of function values at the supporting nodes, formatted as a vector. Notably, all finite difference approximations of partial derivatives fall within this category.
To illustrate the non-monotonicity issue when combining the above interpolation with the finite difference method, we use the stochastic growth model in Eq () as an example. Here, we apply finite differences and the quasi-implicit scheme described in the previous section, but with a key modification: the finite difference is now computed between a supporting node and an interpolated neighboring point, rather than between two supporting nodes, for a given spatial mesh step size . For instance, .
Remark: The reason why not directly differencing two neighbor supporting nodes is that:
In some interpolations, there may not exist a “neighboring” node for supporting nodes in
Even neighboring supporting nodes can be defined, there would be inconsistency issue which diverges the solution from the true solution. (Garcke & Ruttscheidt, 2019)
Remark: In this illustration, we set up a difference equation system over the supporting nodes from such that the finite difference happens between a supporting node and an interpolated neighboring point. However, the conclusion of non-monotonicity trivially holds for a more generic case. This implies the failure of the following tries:
Sampling or designing an even-spaced “virtual” grid over
Evaluate the interpolants at these points
Apply standard finite difference + quasi implicit scheme
Discretizing Eq () and apply quasi implicit scheme and upwind scheme:
which appears very similar to Eq (). One might mistakenly assume that Eq () shares the same unconditional monotonicity as the conclusion from the previous section. However, it is crucial to recognize that monotonicity is required for all supporting nodes, not the interpolated points. Therefore, an additional step is needed: substituting the basis stencil and rearranging terms as in Eq (). A trick here is to represent the supporting node as an interpolated value but using unit vector as basis stencil.
After simplification, Eq () becomes a linear equation that simultaneously about all supporting nodes in :
where is a complex linear combination of all basis stencils; is the summation of all constant terms. The monotonicity condition, in fact, requires Eq () to satisfy:
All elements in the basis stencil must be non-negative such that the coefficient of every supporting node’s function value of moment is non-negative.
All elements in the basis stencil , except the current node’s coefficient of moment , must be non-positive.
Meanwhile, the node’s coefficient of moment must be non-negative.
These three conditions actually impose around sign restrictions on and require these restrictions to hold in each iteration. If elements of can be negative, then satisfying these numerous restrictions becomes nearly impossible. This can be verified by manually summarizing the term and examining it. A more straightforward example illustrates this vulnerability: if any element of the basis stencil is negative, then monotonicity with respect to (or ) is violated, leaving no feasible solution for economists to restore it.
Even though we are discussing a specific example, it contains all the ingredients needed to trivially generalize it to an arbitrary dimension HJB equation driven by an -dimensional Brownian motion, as shown in Eq (). I leave the formal statement for those who are interested in further exploration.
The issue is evident, but what could be a potential solution? The answer lies in using interpolation methods that represent any point as a convex combination of the function values of supporting nodes in . Examining Eq (), if all elements of the basis stencils used in this equation are non-negative, then applying the upwind scheme to Eq () preserves unconditional monotonicity. There are specific types of interpolation that meet this convexity requirement.
Remark: The non-negativity of basis stencil or function elements is NOT equivalent to the monotonicity of the interpolant. Some readers may recognize methods like monotonic interpolations (e.g., monotonic cubic splines). These interpolations, while designed to ensure monotonicity of the interpolant, do not necessarily guarantee monotonicity of numerical schemes—two forms of "monotonicity" that are fundamentally different.
More specifically, monotonic interpolation ensures that within a single iteration, if the guessed solution is monotonic along certain dimensions, then evaluating the interpolant will preserve this property. In contrast, the monotonicity of a numerical scheme ensures that, as the guessed solution is updated from one iteration to the next, the monotonicity of the previous iteration’s guess is maintained in the next iteration, thereby preserving the solution’s shape over successive updates. This is sometimes referred as “shape-preserving” property of the scheme.
Second type
The second type of interpolation we consider does not use function values at supporting nodes as interpolation coefficients but instead utilizes a generic . Here, the degree of interpolation is not necessarily equal to the number of supporting nodes . Examples of this type of interpolation include global methods such as orthogonal polynomials (spectral), neural networks, and Gaussian processes, among others.
In short, this type of interpolation also suffers from the same issue as the first type. The key to understand this is to recognize that, after substituting the interpolation into Eq (), we are simply projecting the original space onto the space of interpolation coefficients , while everything else remains unchanged. For example (note that now even is the evaluated value),
which results in a collection of difference equations. If the stencil elements cannot be guaranteed to be non-negative, then the same issue of irreparable monotonicity arises once again.
The solution in this case is the same as for the first type of interpolation: employ interpolation methods that ensure non-negative stencils for all . For example, one might consider using ReLU, Sigmoid, or other non-negative activation functions in neural network implementations.
Bonus section: determine which scheme to use
The combined strategy of the quasi-implicit method and the upwind scheme is sufficient for handling a broad class of economic models in research. However, certain models with unique structures may require customization beyond the standard approach. Learning how to design tailored linear or nonlinear schemes for specific problems can be highly beneficial (even though most customized schemes works less good than the standard approach when the latter is available).
A snack example, the absolute value of a partial derivative appears in some applications (e.g. adjustment costs). The trick to monotonically discretize it is:
i.e. use backward difference when , and use forward difference when .
The general steps to determine what numerical scheme to use is:
Define the economic model as a boundary problem and solve policy functions in the interior and on the boundary.
Plugging the policy functions back and observe:
How tractable the problem is? If maximum principle can gives the analytical solution, then solve it. If not, then consider numerically solve it.
Is finite difference the best choice? If yes, move on. If not, try generalized residual methods (I’ll discuss this in another post)
How non-linear the problem is? If linear enough or easy to linearize, then consider linearize the PDE and solve it analytically or numerically. e.g. some portfolio or risk-neutral problems can be approximated as a Linear Quadratic Regulator (LQR) problem which is tractable
Dimensionality of the problem? If the dimensionality is low, then primarily use even-spaced grid + standard strategy. If the dimensionality is high, then choose proper approximation methods that satisfy the above non-negativity conditions, and potentially combine it with other techniques such as sparse grids.
Does your model has mixed frequency of noises that require multi-grid methods? If yes, then things go more complex and you may need an advanced course of numerical PDE.
Based on your grid, decide if to use approximation methods
Even-spaced dense grid: No need to use approximation, everything is standard
Uneven-spaced dense grid: No need to use approximation, but needs to be careful about the mesh step size for difference
(Regular or adaptive) sparse grid: Must use approximation. Carefully choose approximation methods that satisfy the above conditions
Design the discretization strategy, i.e. the scheme. This is the heart part.
How many layers do you need? This is needed in some special cases such as doing dimension reduction using methods like ADI, or you are handling some special time-dependent (parabolic) problems.
Is fully implicit scheme possible to be unconditionally monotonic? If yes, then use it. This is faster convergent than the quasi implicit scheme. If not, one may need to compare it (after applying CFL condition) with a quasi implicit scheme.
Any non-linearity that must be discretized? If no, then great and just fit your PDE into the standard strategy. If yes, you need more tricks to handle these non-linearities before moving forward.
Do you get an excellent idea which beats the standard strategy even though the latter is applicable? If yes, then go ahead. If no, stay with the standard strategy.
Carefully double check if all Barles-Souganidis sufficient conditions are satisfied.
Carefully verify the chosen scheme on the discretized boundary conditions. The PDE, esp. elliptic/parabolic/hyperbolic PDEs are pinned down greatly by boundary conditions. For multi-dimensional problems, pay extra attention to “corners” i.e. where multiple boundary conditions need to simultaneously hold.
Write your code, debug, test and run.
Reference
Garcke, J., & Ruttscheidt, S. (2019). Finite differences on sparse grids for continuous time heterogeneous agent models.
Barles, G., & Souganidis, P. E. (1991). Convergence of approximation schemes for fully nonlinear second order equations. Asymptotic analysis, 4(3), 271-283.