Analyzing the monotonicity of numerical schemes with examples

Tianhao Zhao

Abstract

The important paper by Barles and Souganidis (1991) provides a powerful framework to analyze a class of 2nd-order non-linear PDEs which are commonly used in economics. Among all the required conditions, the monotonicity of the numerical scheme is critical and often violated in practice. This blog post explains how to analyze the monotonicity of a wide class of numerical schemes with multiple examples that share similar structures with some commonly used economics models. One can follow the examples to design their own numerical schemes.

Notation

A class of 2nd-order non-linear PDE

Before discussing the monotonicity of numerical schemes, let’s set up the problem and introduce the notations. Let’s consider a generic non-linear PDE that is abstracted from an infinite-horizon stochastic optimal control problem with discounting:

(1)ρv(x)=f(x,Dv(x),D2v(x))+j=1kμj(x,Dv(x),D2v(x))Djv(x)+12m=1kj=1kσmj2(x,Dv(x),D2v(x))Dmj2v(x)(2)s.t. j,dxj=μj(x,Dv(x),D2v(x))dt+m=1kσmj(x,Dv(x),D2v(x))dWm(3)B[x,v(),Dv(),D2v()]=0

where:

Remark:

  • A generic volatility matrix is of size k×m in which there are k states and a m-dimensional Brownian motion. WLOG, the above formulation assumes a full-size volatility matrix such that the dimension of the Brownian motion is k which is exactly the dimension of the state variable x. Neither “wide” (m>k) or “long” (m<k) shape of the volatility matrix does not change the conclusion.

  • If v is a functional and x includes infinite-dimensional objects (e.g. distribution function), then the conclusion of this post still holds. The above formulation can be seen as a finite-moment approximation of such problems.

In the left part of this post, the long function arguments are ignored if they don’t affect the interpretation. Meanwhile, for convenience, let’s define the following clamping functions:

(4)x+:=max{x,0},x:=min{x,0},xR

Discretization & grid

To numerically solve this PDE, we need to discretize it over a hyper rectangle X:=j=1k[xj,x¯j]. In the case of linear state constraints (such as borrowing constraints), one can always rectanglize the computational domain by redefining x. We don’t discuss the case of state constraints that cannot be rectanglized.

A standard even-spaced dense grid can be defined by the number of grid points along each dimension: N:=(N1,,Nk). The total number of grid points is N:=j=1kNj. To index a specific grid point, we use multi-index i:=(i1,,ik) where ij=1,,Nj for each j=1,,k. Let ejZk be a unit vector in which all elements are zero except the j-th element being 1. And let Δj:=v¯jvjNj1 be the grid mesh size of dimension j. Thus, the first right neighbor along dimension j of the current grid point x[i] can be denoted as x[i+Δjej].

For a more generic type of grid representation, we use a set of finite N discrete supporting nodes X^NX. We will discuss the generic interpolation supported by X^N.

Stacking

To well-define the linear algebra we will use later, we need an order of the elements in X^N. For even-spaced dense grid (X,N), the stacking is done by Cartesian product. This leads to a lexicographical sorting of the multi-indices {i}. A vector of the sorted multi-indices is denoted by INRN. For the more generic type of grid, one can always determine an order of the stacking. We use the same notation IN.

Remark: an alternative way of stacking representation is to use a vector of supporting nodes rather than “set+sorting”. The reason of the split here is to: 1. represent neighbor nodes in the space; 2. accommodate the fact that users have to manually define the sorting and keep it consistent in their computer programs.

In general, a grid (structure) can be represented by a 2-tuple (X^N,IN) of supporting node set and the stacking order.

Similarly, the stacking of function value v(x[i]) at grid nodes x[i]X^N sorted by IN can be represented by (VN,IN), or a vector VRN.

Interpolation & stencil

An interpolation of function v(x):RkR is the linear combination of basis functions φp(x)R

(5)v(x)v^(x):=p=1Pφp(x)θp

where P is the degree of the interpolation; v^(x) is called an interpolant; and θ:=(θ1,,θP)RP×1 is the interpolation coefficient. In this post, we particularly discuss a class of linear interpolations in which θp is the function value evaluated at supporting nodes from set X^N. Readers can quickly verify that most conclusions also hold for more generic types of interpolations in which θ is not necessarily to be the value of v(x) at the supporting nodes (e.g. orthogonal polynomials).

A stencil is a “template” of linear operations. Consider the following N+1 degree interpolation:

(6)v^(z)=φ,(X^N,IN)(z):=iINφi(z)x^[i]+C(z,(X^N,IN))

where C(z,(X^N,IN))R is a constant which can be seen as 1C(); x^[i]X^N is supporting node. We call the stacking basis vector φ(z) a basis stencil which can fully determine the interpolated value at point zX given the grid (X^N,IN). The introduction of constant term C is to accommodate the strategy of hypothetical (ghost) outside nodes.

For interpolations where the coefficient is not the function value at supporting nodes, we can define a similar structure and also the basis stencil:

(7)v^(z)=φ,θP(z):=i=1Pφi(z)θ[i]+C(z,θ)

Remark: There are two common methods to handle boundary conditions:

  1. Assume the boundary conditions hold exactly on the boundary nodes

  2. Assume there exist hypothetical nodes outside but close to the boundary nodes. Assume boundary conditions hold on these hypothetical nodes.

The 2nd method is usually employed to generate solutions with smoother boundary behaviors.

The defined stencil representation allows for standard arithmetic such as addition, deduction, and scalars multiplication.

Monotonicity of numerical schemes

For second-order elliptic and parabolic PDEs, the equation's ellipticity determines both the system's tendency to reach a stationary state and its smoothness. In finite difference methods, a numerical scheme for a PDE consists of difference equations applied to discretized nodes and function values, approximating the original PDE. Within this framework, the monotonicity of a scheme is crucial for ensuring numerical stability and smoothness, and these properties are closely related.

In their seminal paper, Barles and Souganidis (1991) outline sufficient conditions for a numerical scheme to converge locally to the true solution, including monotonicity, consistency, and uniform stability. While consistency and uniform stability are typically easier to achieve in economic models, maintaining monotonicity requires special attention, as its absence is a common cause of failure in many proposed methods. In this post, we assume the other conditions are met and focus on the issue of monotonicity.

Consider a numerical scheme of for the boundary problem Eq (1)-(3):

(8)S(t,x[i],v[i],VN,DVN,D2VN)=finite difference approximationS(t,x[i],v[i],VN)=0

where t is the explicitly written time dimension which is used to update the guess across iterations; DVN and D2VN represent the stacked discretized Jacobian and Hessian, respectively which can be approximated by VN with finite difference.

This equation of S consolidates all components of the discretized boundary problem equations onto one side. Monotonicity then requires that S be monotonically increasing with respect to v[i+Δt] but monotonically decreasing with respect to v[i] and all other points in the stack VN, i.e., the neighbors of v[i].

Example (Neo-classical growth model)

The neo-classical growth model:

(9)ρv(k)=maxcc1γ1γ+vk(k){Zkαδkc}

is stationary (thus the time derivatives term vt=v/t=0). By plugging in the policy function c=vk1/γ, we get a non-linear elliptic PDE:

(10)ρv=11γvkγ1γ+vk(Zkαδkvk1/γ)

By adding back the always-zero time derivative term, we make it a non-linear parabolic PDE:

(11)ρv=vt+11γvkγ1γ+vk(Zkαδkvk1/γ)

Suppose now we use forward difference along the k dimension to approximate vk and use forward difference along the time t dimension to approximate vt at an inner point (k[i],v[i]) given discretization (X^N,IN,VN), there is:

v[i+Δtet]v[i]Δt+ρv[i]=11γ(v[i+Δkek]v[i]Δk)γ1γ(12)+(v[i+Δkek]v[i]Δk){Zk[i]αδk[i](v[i+Δkek]v[i]Δk)1/γ}

which is a difference equation about v[i] and its neighbor points along (t,k) dimensions. Rearranging the RHS to the LHS, we then get the same format as S()=0. The example scheme in Eq (12) is a non-linear scheme.

Remarks:

  • The monotonicity is required also on the boundary condition Eq (3), which is often ignored in practice.

  • A commonly used alternative way of writing this (if applicable) is to split the terms about v[i+Δtet] on one side of the equation, and put the left terms about VNv[i+Δtet] to the other side. Then, both sides of the equation require monotonically increasing. We will use this style of organizing terms in the left part of this post.

  • The example scheme in Eq (12) is a one (time) layer scheme because by standing at a t moment, we only use t and t+Δt information. Some more complex scheme may use more moments information. For example, Crank-Nicolson (two layer implicit method), "leapfrog" method (three layer explicit method), and Alternating direction implicit (ADI) method (multi layer method implicit method, fraction layers).

Explicit (forward Euler) method & CFL condition

Naive case

The explicit (forward Euler) method is one of the most intuitive schemes, relying solely on the previous iteration's guess to update the solution. It is called "forward" because it applies a forward difference along the time dimension. In the context of HJB equation, this approach typically avoids the need to solve any linear or nonlinear systems, requiring only a loop over each grid node. So it is easy to do parallelization. However, this method usually demands a small and adaptively adjusted time step to maintain stability, resulting in the so-called Courant–Friedrichs–Lewy (CFL) condition.

We illustrate the forward Euler method using the neo-classical growth model in Eq (11). For convenience, we use k[i] solely to index the spatial dimension k, while a prime mark indicates the forward direction in time. This notation is permissible since the scheme involves only one time layer. For instance, v[i] refers to the right/forward neighbor at time t+Δt of the current node v[i] at time t, with vt(v[i]v[i])/Δt.

In Eq (12), we provided an example of a fully non-linear scheme which is very hard to characterize its property. However, this is not the only approach. Due to the structure of the HJB equation, a common alternative is to treat the flow utility and drift terms as scalar functions without discretizing them. This renders the PDE linear with respect to the partial derivatives of the unknown function. Consistent with the directional choice in Eq (12), we continue to use forward differences along the k dimension. Thus,

v[i]v[i]Δt+ρv[i]=u(c[i])+μk(k[i],v[i],VN)1Δk(v[i+Δk]v[i])(13)simplifyv[i]v[i]Δt+ρv[i]=u[i]+μk[i]1Δk(v[i+Δk]v[i])

Rearranging the scheme:

(14)1Δtv[i]={1Δtρμk[i]Δk}v[i]+μk[i]Δkv[i+Δk]+u[i]

The monotonicity requires:

(15)1Δt0(LHS)(16)1Δtρμk[i]Δk0(RHS)(17)μk[i]Δk0(RHS)

while there is no special restriction on u[i] if we do not discretize this term with v[i] and other nodes. The first inequality always holds because Δt>0 for sure. The CFL condition comes from the 2nd inequality which implies iIN:

(18)ΔtR+,if μk[i]<ρΔk(19)ΔtΔkρΔk+μk[i]<Δkμk[i],otherwise

Eq (19) is in the typical form of CFL condition that we could see from the textbook. It is easy to see that Δt0 quickly as μk[i] increasing, which leads to slow convergence of the iteration.

The monotonicity of Eq (14) scheme depends on the computation domain X: larger the space, smaller time step required, then slower convergence of the algorithm, so on until being infeasible. Meanwhile, The inequality of Eq (17) is irrelevant to Δt such that the scheme is monotonic only in the region of net saving. These bad properties show that Eq (14) is not a working scheme. We need to find a strategy to improve Eq (14).

Upwind scheme example: flux terms

In the context of second-order elliptic, parabolic, or hyperbolic PDEs, there is a strategy to improve a scheme’s monotonicity called the upwind scheme. This terminology reflects the physical intuition behind the approach, particularly for heat or wave equations. Here, the direction of differencing is set against the direction of wave or heat propagation, ensuring that information from the "origin of the wave/heat" (considered more crucial) is accurately incorporated. Mathematically, this strategy involves clamping the flux (drift) terms, which act as coefficients for the partial derivative terms, to ensure they remain positive or negative as required.

Let’s stick to the growth model example of Eq (11), but this time we discretize the flux term vkμk as:

(20)vk[i]μk[i]μk+[i]Δk(v[i+Δk]v[i])+μk[i]Δk(v[i]v[iΔk])

For an arbitrary value of the drift μk[i], only one direction of differencing is acceptable, or else vk[i]μk[i]=0, indicating a stationary state. But why Eq (20) improves monotonicity? If we simplify Eq (20), there is:

(21)μk+[i]Δk0v[i+Δk]+μk[i]Δk0v[iΔk](μk+[i]Δk+μk[i]Δk)0v[i]

Noticing that all the terms of Eq (21) appear on the RHS of the discretization. Specifically, let’s apply upwind scheme to the growth model example to get a full picture of this:

(22)v[i]v[i]Δt+ρv[i]=u[i]+μk+[i]Δk0v[i+Δk]+μk[i]Δk0v[iΔk](μk+[i]Δk+μk[i]Δk)0v[i](23)1Δtv[i]=[1Δtρ(μk+[i]Δk+μk[i]Δk)]v[i]+μk+[i]Δk0v[i+Δk]+μk[i]Δk0v[iΔk]

The monotonicity conditions for v[i], v[i+Δk], and v[iΔk] are automatically satisfied. However, to ensure the monotonicity condition holds for the v[i] term, we need to solve for the CFL condition with respect to Δt.

(24)1Δtρ(μk+[i]Δk+μk[i]Δk)0(25)1Δtρ+(μk+[i]Δk+μk[i]Δk)>0(26)ΔtΔkρΔk+(μk+[i]μk[i])=ΔkρΔk+|μk[i]|<Δk|μk[i]|,iIN

Compared to Eq (14), applying the upwind scheme enables us to ensure monotonicity simply by adjusting the time step Δt. The Δt now can handle both net saving and de-saving cases. Soon, we will demonstrate that the upwind scheme is still effective in the multi-dimensional case.

Remark: The illustrative examples by Benjamin Moll suggests an estimate of Δt0.9Δk|μk[i]|, which works well for small ρ. If you expect your discounting rate is significantly larger than the usual choice (e.g. 0.05), then a safer estimate is Δt0.9ΔkρΔk+|μk[i]|.

Upwind scheme example: diffusion terms

HJB equation is a 2nd-order PDE where there could be diffusion terms as Eq (1). We have shown how to apply upwind scheme onto the flux terms. Now, let’s discuss how to apply upwind scheme to the diffusion terms. WLOG, let’s consider a stochastic neo-classical growth model:

(27)ρv(k,z)=maxcc1γ1γ+vk(zkαδkc)+vzκ(z¯z)+vzz12σz2

where vzz12σz2 represents the diffusion term in this equation. As before, we do not discretize σz2 but instead discretize vzz. Similarly, we use a three-point finite difference, given by vzz(v[i+Δzez]2v[i]+v[iΔzez])/(Δz)2. By clamping the (co)variance σz2, we have:

(28)vzz[i]12σz2[i](σz2[i])+2Δz(v[i+Δzez]2v[i]+v[iΔzez])+(σz2[i])2Δz(v[i+Δzez]2v[i]+v[iΔzez])

Readers can instantly figure out the unconditional monotonicity of v[i+Δzez] and v[iΔzez] terms. Then, collecting the v[i] terms here and the v[i] terms from the flux terms of state k and z, we have the following CFL condition:

(29)1Δtρ(μk+[i]Δk+μk[i]Δk+μz+[i]Δz+μz[i]Δz+(σz2[i])+Δz+(σz2[i])Δz)0(30)Δt1ρ+|μkΔk|+|μzΔz|+|σk2Δk|

One may have realized that the CFL condition, in the case of explicit scheme + upwind scheme, has a formula for generic models as Eq (1):

(31)iIN,Δt[ρ+j=1k|μj[i]Δj|+m=1kj=1k|σmj2[i]ΔmΔj|]1

As we noted at the beginning of this section, the explicit method requires progressively smaller time steps as the dimensionality of the problem and the size of the computational space increase. For a medium-sized problem, this typically means millions of iterations are needed to achieve convergence.

Implicit (backward Euler) method

Compared to the explicit method, the implicit method, or backward Euler method, is often preferred in practice. For the HJB equation with an evenly spaced dense grid, this scheme is unconditionally monotonic for any size of Δt when the upwind scheme is applied. However, the trade-off with the implicit method is that it requires solving a linear or nonlinear system at each iteration, which can be large but sparse in most cases.

The explicit method is “forward” in the meaning of forwardly difference along the time dimension (standing at moment t). The implicit method is “backward” such that we backwardly difference along the time dimension, but we are no longer standing at moment t but standing at moment t+Δt and looking back at moment t.

A fully implicit scheme is one in which only vt uses information from moment t, while all other discretizations along spatial dimensions use information from moment t+Δt. This type of scheme typically results in nonlinear schemes (with respect to partial derivatives), which are challenging to analyze. Let’s use the deterministic growth model example again, but this time we have:

v[i]v[i]Δt+ρv[i]=11γ(c[i])γ1γu[i](32)+(v[i+Δkek]v[i]Δk){Zk[i]αδk[i](v[i+Δkek]v[i]Δk)1/γ}μk[i]

Observe that in the fully implicit scheme, both u[i] and μk[i] are evaluated at moment t+Δt. These two terms are highly nonlinear in Dv. We cannot take them as scalar functions as before because we cannot evaluate them without the knowledge about V. Their partial derivatives with respect to v[i] and its neighboring values being extremely complex. In most economic models, analyzing these derivatives is impractical.

Remark: We did not apply upwind scheme to Eq (32). However, readers can quickly figure out that upwind scheme does not work here due to the high non-linearity of u and μk.

Thus, we focus on a class of quasi-implicit methods, where we evaluate the flow utility term, flux terms, and diffusion terms using the estimate at moment t, while leaving the partial derivatives at moment t+Δt. This scheme corresponds precisely to the “implicit method” used by Benjamin Moll and other economists. Under such a quasi-implicit scheme and applying upwind scheme, the growth model is discretized as:

(33)v[i]v[i]Δt+ρv[i]=u[i]+μk+[i]Δk0v[i+Δk]+μk[i]Δk0v[iΔk](μk+[i]Δk+μk[i]Δk)0v[i](34){(1Δt+ρ)+(μk+[i]Δk+μk[i]Δk)}>0v[i]=u[i]+μk+[i]Δk0v[i+Δk]+μk[i]Δk0v[iΔk]+1Δt>0v[i]

All coefficients of v[i], v[i], and its spatial neighbors satisfy the monotonicity condition unconditionally, regardless of the size of the time step Δt. When collecting all N+2k difference equations (including the 2k boundary conditions), one can obtain a linear system about the N nodes at moment t+Δt. Solving this linear system updates the guess of V for time-invariant HJB (Markovian), or update the guess of moment t+Δt for finite-horizon HJB equations.

Remark: 1. In later sections of this post, we briefly introduce how to check the monotonicity when a transition matrix L is available. The check procedure suggests how to adaptively adjust the time step size Δt per iteration. 2. A transition matrix is a linear approximation of the infinitesimal generator operator such that LVLV; it is basically a stacking of the flux & covariance terms in our previous examples.

Non-monotonicity of a class of approximation-based methods

The standard finite difference method discussed above is non-parametric, relying on a dense grid with no dependencies between any two nodes in X^N. However, this dense grid approach is highly susceptible to the curse of dimensionality, becoming quickly infeasible for three-dimensional and higher-dimensional problems. One may consider function interpolation or approximation methods (to approximate v), which offer a wide range of choices. However, it is important to recognize that when these approximations of v are combined with finite difference methods, the monotonicity condition often fails to hold for a large class of these function approximation methods, even when the implicit method and upwind scheme are applied. This section is an intuitive illustration of the proof idea in Garcke and Ruttscheidt (2019).

We consider the two types of interpolation as outlined in the notation section.

First type

The first type uses interpolation coefficients θ that represent the function values at supporting nodes from the set X^N. This type of interpolation can be abstracted as Eq (6). Some examples: local methods such as piecewise polynomial, spline, etc. The multi-dimensional piecewise linear (aka multi-linear) interpolation exactly fits into the discussion of this sub-section.

The key idea here is that any approximated value v^(z),zX can be represented as a linear combination of all supporting v[i]V. Taking this a step further, any linear operation on the approximated function values (e.g., finite difference) among arbitrary points z1,z2,X can be represented by a basis stencil. For example:

for any z1,z2X;a1,a2Rv^(z1)=φ(z1)V+C(z1)v^(z2)=φ(z2)V+C(z2)a1v^(z1)+a2v^(z2)={a1φ(z1)+a2φ(z2)}V+{a1C(z1)+a2C(z2)}(35)=:φ1,2V+C1,2

where V, as previously arranged, is the stack of function values at the supporting nodes, formatted as a vector. Notably, all finite difference approximations of partial derivatives fall within this category.

To illustrate the non-monotonicity issue when combining the above interpolation with the finite difference method, we use the stochastic growth model in Eq (27) as an example. Here, we apply finite differences and the quasi-implicit scheme described in the previous section, but with a key modification: the finite difference is now computed between a supporting node and an interpolated neighboring point, rather than between two supporting nodes, for a given spatial mesh step size (Δk,Δz). For instance, vk[i][v^(k[i]+Δk,z[i])v[i]]/Δk.

Remark: The reason why not directly differencing two neighbor supporting nodes is that:

  1. In some interpolations, there may not exist a “neighboring” node for supporting nodes in X^N

  2. Even neighboring supporting nodes can be defined, there would be inconsistency issue which diverges the solution from the true solution. (Garcke & Ruttscheidt, 2019)

Remark: In this illustration, we set up a difference equation system over the N supporting nodes from X^N such that the finite difference happens between a supporting node and an interpolated neighboring point. However, the conclusion of non-monotonicity trivially holds for a more generic case. This implies the failure of the following tries:

  1. Sampling or designing an even-spaced “virtual” grid over X

  2. Evaluate the interpolants at these points

  3. Apply standard finite difference + quasi implicit scheme

Discretizing Eq (27) and apply quasi implicit scheme and upwind scheme:

v[i]v[i]Δt+ρv[i]=u[i]+μk+[i]Δk{v^[i+Δkek]v[i]}+μk[i]Δk{v[i]v^[iΔkek]}+μz+[i]Δz{v^[i+Δzez]v[i]}+μz[i]Δz{v[i]v^[iΔzez]}(36)+σz2[i]2(Δz)2{v^[i+Δzez]2v[i]+v^[iΔzez]}

which appears very similar to Eq (33). One might mistakenly assume that Eq (36) shares the same unconditional monotonicity as the conclusion from the previous section. However, it is crucial to recognize that monotonicity is required for all supporting nodes, not the interpolated points. Therefore, an additional step is needed: substituting the basis stencil and rearranging terms as in Eq (35). A trick here is to represent the supporting node as an interpolated value but using unit vector eiRN as basis stencil.

1Δt[(φ[i]Vφ[i]V)+(C[i]C[i])]+ρ(φ[i]V+C[i])=u[i]+μk+[i]Δk{[φ[i+Δkek]φ[i]]V+(C[i+Δkek]C[i])}+μk[i]Δk{[φ[i]φ[iΔkek]]V+(C[i]C[iΔkek])}+μz+[i]Δz{[φ[i+Δzez]φ[i]]V+(C[i+Δzez]C[i])}+μz[i]Δz{[φ[i]φ[iΔzez]]V+(C[i]C[iΔzez])}(37)+σz2[i]2(Δz)2{[φ[i+Δzez]2φ[i]+φ[iΔzez]]V+(C[i+Δzez]2C[i]+C[iΔzez])}

After simplification, Eq (37) becomes a linear equation that simultaneously about all supporting nodes in X^N:

(38)AV=1Δtφ[i]V+C+u[i]

where AR1×N is a complex linear combination of all basis stencils; CR is the summation of all constant terms. The monotonicity condition, in fact, requires Eq (38) to satisfy:

  1. All elements in the basis stencil φ[i] must be non-negative such that the coefficient of every supporting node’s function value v[q],qIN of moment t is non-negative.

  2. All elements in the basis stencil A, except the current i node’s coefficient of moment t+Δt, must be non-positive.

  3. Meanwhile, the i node’s coefficient of moment t+Δt must be non-negative.

These three conditions actually impose around (2moments2dim2neighbor)N sign restrictions on {φ,φ} and require these restrictions to hold in each iteration. If elements of φRN can be negative, then satisfying these numerous restrictions becomes nearly impossible. This can be verified by manually summarizing the A term and examining it. A more straightforward example illustrates this vulnerability: if any element of the basis stencil φ[i] is negative, then monotonicity with respect to V (or V) is violated, leaving no feasible solution for economists to restore it.

Even though we are discussing a specific example, it contains all the ingredients needed to trivially generalize it to an arbitrary k dimension HJB equation driven by an m-dimensional Brownian motion, as shown in Eq (1). I leave the formal statement for those who are interested in further exploration.

The issue is evident, but what could be a potential solution? The answer lies in using interpolation methods that represent any point zX as a convex combination of the function values of supporting nodes in X^N. Examining Eq (38), if all elements of the basis stencils used in this equation are non-negative, then applying the upwind scheme to Eq (38) preserves unconditional monotonicity. There are specific types of interpolation that meet this convexity requirement.

Remark: The non-negativity of basis stencil or function elements is NOT equivalent to the monotonicity of the interpolant. Some readers may recognize methods like monotonic interpolations (e.g., monotonic cubic splines). These interpolations, while designed to ensure monotonicity of the interpolant, do not necessarily guarantee monotonicity of numerical schemes—two forms of "monotonicity" that are fundamentally different.

More specifically, monotonic interpolation ensures that within a single iteration, if the guessed solution is monotonic along certain dimensions, then evaluating the interpolant will preserve this property. In contrast, the monotonicity of a numerical scheme ensures that, as the guessed solution is updated from one iteration to the next, the monotonicity of the previous iteration’s guess is maintained in the next iteration, thereby preserving the solution’s shape over successive updates. This is sometimes referred as “shape-preserving” property of the scheme.

Second type

The second type of interpolation we consider does not use function values at supporting nodes as interpolation coefficients but instead utilizes a generic θP. Here, the degree of interpolation P is not necessarily equal to the number of supporting nodes N. Examples of this type of interpolation include global methods such as orthogonal polynomials (spectral), neural networks, and Gaussian processes, among others.

In short, this type of interpolation also suffers from the same issue as the first type. The key to understand this is to recognize that, after substituting the interpolation into Eq (38), we are simply projecting the original space V onto the space of interpolation coefficients θP, while everything else remains unchanged. For example (note that now even v[i] is the evaluated value),

(39)v^[i+Δkek]v^[i]=(φ[i+Δkek]φ[i])θP+(C[i+Δkek]C[i])

which results in a collection of P+2k difference equations. If the stencil elements cannot be guaranteed to be non-negative, then the same issue of irreparable monotonicity arises once again.

The solution in this case is the same as for the first type of interpolation: employ interpolation methods that ensure non-negative stencils for all zX. For example, one might consider using ReLU, Sigmoid, or other non-negative activation functions in neural network implementations.

Bonus section: determine which scheme to use

The combined strategy of the quasi-implicit method and the upwind scheme is sufficient for handling a broad class of economic models in research. However, certain models with unique structures may require customization beyond the standard approach. Learning how to design tailored linear or nonlinear schemes for specific problems can be highly beneficial (even though most customized schemes works less good than the standard approach when the latter is available).

A snack example, the absolute value of a partial derivative |vk| appears in some applications (e.g. adjustment costs). The trick to monotonically discretize it is:

(40)|vk|:=max{vk,vk}=max{v[i]v[iΔkek]Δk,v[i+Δkek]v[i]Δk}

i.e. use backward difference when vk0, and use forward difference when vk<0.

The general steps to determine what numerical scheme to use is:

  1. Define the economic model as a boundary problem and solve policy functions in the interior and on the boundary.

  2. Plugging the policy functions back and observe:

    1. How tractable the problem is? If maximum principle can gives the analytical solution, then solve it. If not, then consider numerically solve it.

    2. Is finite difference the best choice? If yes, move on. If not, try generalized residual methods (I’ll discuss this in another post)

    3. How non-linear the problem is? If linear enough or easy to linearize, then consider linearize the PDE and solve it analytically or numerically. e.g. some portfolio or risk-neutral problems can be approximated as a Linear Quadratic Regulator (LQR) problem which is tractable

    4. Dimensionality of the problem? If the dimensionality is low, then primarily use even-spaced grid + standard strategy. If the dimensionality is high, then choose proper approximation methods that satisfy the above non-negativity conditions, and potentially combine it with other techniques such as sparse grids.

  3. If finite difference is your final choice, then

    1. Design your grid: even-spaced dense grid? uneven-spaced dense grid (e.g. Chebyshev)? sparse grid (e.g. Smolyak)? adaptive sparse grid (e.g. multi-linear ASG)?

    2. Does your model has mixed frequency of noises that require multi-grid methods? If yes, then things go more complex and you may need an advanced course of numerical PDE.

    3. Based on your grid, decide if to use approximation methods

      1. Even-spaced dense grid: No need to use approximation, everything is standard

      2. Uneven-spaced dense grid: No need to use approximation, but needs to be careful about the mesh step size for difference

      3. (Regular or adaptive) sparse grid: Must use approximation. Carefully choose approximation methods that satisfy the above conditions

    4. Design the discretization strategy, i.e. the scheme. This is the heart part.

      1. How many layers do you need? This is needed in some special cases such as doing dimension reduction using methods like ADI, or you are handling some special time-dependent (parabolic) problems.

      2. Is fully implicit scheme possible to be unconditionally monotonic? If yes, then use it. This is faster convergent than the quasi implicit scheme. If not, one may need to compare it (after applying CFL condition) with a quasi implicit scheme.

      3. Any non-linearity that must be discretized? If no, then great and just fit your PDE into the standard strategy. If yes, you need more tricks to handle these non-linearities before moving forward.

      4. Do you get an excellent idea which beats the standard strategy even though the latter is applicable? If yes, then go ahead. If no, stay with the standard strategy.

    5. Carefully double check if all Barles-Souganidis sufficient conditions are satisfied.

    6. Carefully verify the chosen scheme on the discretized boundary conditions. The PDE, esp. elliptic/parabolic/hyperbolic PDEs are pinned down greatly by boundary conditions. For multi-dimensional problems, pay extra attention to “corners” i.e. where multiple boundary conditions need to simultaneously hold.

    7. Write your code, debug, test and run.

    Reference

    1. Garcke, J., & Ruttscheidt, S. (2019). Finite differences on sparse grids for continuous time heterogeneous agent models.

    2. Barles, G., & Souganidis, P. E. (1991). Convergence of approximation schemes for fully nonlinear second order equations. Asymptotic analysis, 4(3), 271-283.