A 15 min tutorial for Adaptive Sparse Grid (ASG)

1. Introduction

The curse of dimensionality challenges economists in modeling complex economic dynamics. For instance, a five-state discrete-time lifecycle model might require millions of grid points, with higher-dimensional problems quickly becoming infeasible due to time and memory constraints. In recent decades, numerical methods have been developed to address this issue. Sparse grids, an extension of standard interpolation theory, provide a natural pathway to high-dimensional modeling.

Recent advances in sparse grid theory introduce adaptation, where nodes are added or removed based on the shape of the local function, further reducing the necessary number of interpolation nodes compared to regular sparse grids (RSG). This adaptive sparse grid (ASG) technique proves especially useful in economic research, where value functions often exhibit power law behaviors.

This blog post heuristically introduces the concept of ASG and its algorithm, without delving into the underlying mathematics, to help a broad audience quickly engage with the literature, begin learning, and apply this technique to their research.

 

2. Where do we need ASG?

In general, ASG works in all scenarios wherever an interpolation or function approximation is needed. Some typical scenarios are:

3. General idea

Let's begin with standard interpolation theory. In this post, we discuss single-valued real functions f(x):RmR. An interpolant is a function defined by a set of basis functions ϕj:RmR and a set of interpolation coefficients cjR, where n represents the degree of interpolation.

(1)f(x)f^(x):=j=1nϕj(x)cj

Different interpolation methods vary in the design of ϕj, but they share the same general structure as shown above. For example:

In this post, we discuss multi-dimensional piecewise linear interpolation (also known as multi-linear interpolation). For details on other methods, readers may refer to standard textbooks on numerical analysis. The regular degree-n piecewise linear interpolation uses n+1 distinct nodes and a "hat function" ϕ(x). The following figure provides a 1D illustration:

sss

In the figure, seven evenly spaced hat functions, weighted by interpolation coefficients, "support" the linear approximation (red line) of the target power-law shaped function. Notably:

  1. There are many unnecessary overlaps among the hat functions.

  2. The interpolant (red line) approximates the target function well in regions that behave approximately linearly, while the approximation is less accurate in regions with sharp changes.

For the first observation, one might ask: Given a desirable tolerance of approximation error, is every node necessary?

The answer is no. The following figure demonstrates that we can remove some of the hat functions while maintaining a similar approximation error. Here, the node distribution is "sparse" compared to the original piecewise linear interpolation. The corresponding numerical technique is known as (regular) sparse grid (RSG) theory, which designs generalized interpolation grids for arbitrary target functions. A wide range of methods, such as the Smolyak method, fall within this category.

sss

Regarding the second observation, a relevant question arises: Given a desirable tolerance of approximation error, how can we improve accuracy in regions with sharp changes?

The solution is to use adaptation, i.e., dynamically assigning nodes according to the local shape of the target function. In regions that change sharply, more supporting nodes are allocated; in regions where a linear approximation suffices, fewer supporting nodes are needed. The following figure illustrates how the adaptation procedure enhances approximation accuracy. Notice the concentration of supporting nodes on the left side of the figure, where changes are more abrupt. The spacing between neighboring nodes increases as the function’s shape becomes more linear. Ultimately, the approximation accuracy is excellent throughout the figure.

sss

When combining the sparse grid theory and adaption, we have so-called (multi-linear) Adaptive Sparse Grid (ASG) interpolation. For readers to start learning the literature, here are some important studies:

If readers need more related works, I would recommend to check the papers by Michael Griebel at the Institute for Numerical Simulation (Institutetut fur Numericalsche Simulation) at University of Bonn, Germany.

Remark: The sparse grid technique, whether RSG or ASG, is generally independent of the specific interpolation methods chosen. It is simply an enhancement of the grid structure. Any numerical method that requires a grid can potentially be adapted to work with sparse grids with minimal modifications. For instance, the paper by Judd et al. (2014) provides an excellent example of combining RSG with the spectral method, a global technique.

4. Algorithm

Given the generic definition in Equation (1), an interpolant f^(x) can be represented by a 3-tuple:

(2)(X,P(X),{cj})

where X denotes the domain of x, P represents the set of supporting nodes or grid structure, and cj is the set of interpolation coefficients. There are two primary operations on such an interpolant f^(x): training and evaluation. Training involves defining the interpolant by:

Evaluation is the process of computing the value of f^ at a specific point x0X. This evaluation can be performed straightforwardly by applying the interpolation definition.

Remark: In practice, a technique called residual fitting is used for adaptation. This technique modifies the evaluation process, yet it ultimately still aligns with Equation (1). For further details, see Schaab & Zhang (2022).

To define an adaptive algorithm for the training process, we need to design:

The multi-linear ASG interpolation follows this algorithm:

  1. Initialize the algorithm with one or more essential nodes.

  2. Repeat until the algorithm converges:

    1. For each node i added in the previous round:

      1. Identify all 2m candidate nodes on the surface of an m-dimensional sphere around node i, where each dimension has left and right neighboring nodes.

      2. For each candidate node:

        1. If the current interpolant (without adding any candidate nodes this round) has a sufficiently small interpolation error at the candidate node, skip this candidate node (since adding it would have minimal impact).

        2. Otherwise, add this candidate node.

    2. If no new candidate nodes are added, the algorithm has converged.

    3. Otherwise, reduce the radius of the spheres and proceed to the next round.

Remark:Bridging the gap between the above algorithm and its programming implementation requires practical considerations. The concept of an m-dimensional sphere is, in practice, implemented through a technique called hierarchical nodes, which establishes a rule for "growing" child nodes and determining the effective radius of these spheres. For more details, refer to Schaab & Zhang (2022).

Remark: It is worth noting that the algorithm only includes a node-adding operation, with no node-dropping operation. This design choice is made to ensure the completeness of the data structure esp. when the algorithm starts from an RSG. Further explanation on this can be found in later sections.

To illustrate the algorithm, consider a 2D function f(x,y) defined on a closed hypercube [0,1]2. The algorithm begins with an initialization at the center point (0.5,0.5) only.

sss

Standing at the blue "parent" point, examine the distant red "children" nodes located on the dashed-line sphere. Decide which of these nodes to add based on the tolerance criteria.

sss

Assuming all four red candidate children nodes are accepted, designate these four nodes as the new "parent" nodes and proceed with this round of the algorithm.

sss

Let's assume only 3 candidate children nodes are accepted. In this round, reduce the sphere radius and continue the algorithm.

sss

Similarly, assume only 2 children nodes are accepted this round.

sss

Suppose there is no new nodes are accepted around (1/4,1/4), then the algorithm stops. The converged grid structure looks like:

sss

One can see that the distance between nodes are uneven while some local regions have more nodes than the others.

Remark:

  1. Readers with a computer science background may recognize that the grid structure essentially forms a 2m-ary tree, and the algorithm described above effectively performs branch pruning.

  2. Dropping a non-leaf node disrupts the tree’s connectivity and alters the depth of every other node, making the data structure unpredictable.

  3. For small values of m, a linked list provides maximum flexibility. However, this becomes inefficient as m increases. For medium values of m, a hash table is recommended (Griebel, 1998), benefiting from strict node hierarchization. For large m, hash tables perform poorly due to an increased probability of hash collisions, and in this case, a database can efficiently manage the grid.

  4. In most scenarios involving economic models (where m ranges from 1 to 20), a hash table performs effectively.

5. Example

5.1 Function approximation

Here are some numerical example for:

(3)f(x):=10.1+||x||1,x[0,1]m

Where ||x||1 is the l-1 norm. Such a function mimics a m-dimensional CRRA value function c1γ1γ with γ=2. Let's set the relative training tolerance as 1% and check m=1, 2, and 3. As comparison to the multi-linear ASG interpolation, the regular piecewise linear (RPL) interpolation using dense grid is set to have about 50 nodes along each dimension, which is conventional in quantitative macroeconomic modeling.

5.1.1 1D example

sss

The first row displays the result of ASG, while the second row shows the result of RPL. It can be observed that, within the given tolerance, ASG uses only 30% as many supporting nodes compared to RPL, while:

Remark: The poor interpolation accuracy near the lower bound of x can lead to quantitatively significant issues in specific economic research contexts, such as studies on Hand-to-Mouth households or wealth distribution in the Aiyagari model.

5.1.2 2D example

sss

In the two-dimensional scenario, the sparsity advantage of ASG over RPL becomes even more pronounced. In the figure matrix above, the first column displays ASG results, while the second column shows RPL results. Similar patterns emerge as in the one-dimensional case. Here, ASG achieves comparable interpolation accuracy while using only 7.3% as many supporting nodes compared to RPL.

5.1.3 3D example

sss

If we move to the 3-dimensional scenario, the sparsity of ASG relative to RPL becomes even more pronounced. From m=3 onwards, direct visualization of the function becomes challenging. Here, I provide three illustrations:

  1. Spatial distribution of ASG supporting nodes

  2. Histogram of relative errors, ASG vs. RPL

  3. Tail distribution of relative errors by quantiles, ASG vs. RPL

The spatial distribution of ASG supporting nodes and the histogram of relative errors convey a similar message as in previous scenarios. The new tail distribution figure further illustrates that ASG and RPL have a comparable scale of errors in the most sharply changing regions.

With the same interpolation accuracy, ASG requires only 0.87% as many supporting nodes compared to RPL.

5.2 HJB equation: stochastic Neo-classical growth model

To illustrate the results of applying ASG to practical economic models. I solve the following stochastic Neo-classical growth model which is a standard testing case:

(4)ρv(k,z)=maxc0c1γ1γ+vk{zkαδkc}+vzξ(z¯z)+12vzzσz2

Where:

The numerical scheme of finite difference is the quasi-fully implicit scheme used by Benjamin Moll. The solved value function v(k,z), saving policy k˙(k,z), consumption policy c(k,z), and the spatial distribution of supporting nodes are displayed below. In contrast, the native dense grid needs 500010000 nodes to reach the same precision.

sss

6. Pros & Cons: Choose between ASG, RSG, and dense grid

ASG offers significant advantages for high-dimensional modeling. However, there is no free lunch. I compare multi-linear interpolation across ASG, RSG (e.g., Smolyak), and dense grid methods to help readers select the most suitable approach for their applications.

Remark: The comparisons presented in the table may vary significantly depending on the interpolation methods used. For example, the time and space complexity of spectral approximation across the three grid types depends solely on the number of supporting nodes.

IDItemASGRSGDense grid
1Time complexity: trainingHighMediumMedium
2Time complexity: evaluationMediumMediumLow
3Space complexity: RAM during trainingHighMediumHigh
4Space complexity: result storageLowMediumHigh
5Number of supporting nodesLowMediumVery high
6Largest feasible dimensionality on personal computer>40>204
 Discrete time macroeconomic models:   
7- Compatible?YesYesYes
 Continuous time macroeconomic models:   
8- Compatible?YesYesYes
9- Unconditional monotonicity of implicit method + upwind scheme?MaybeMaybeYes

Remark: ASG dynamically grows a tree of nodes where the final, converged set contains relatively few accepted nodes, but the algorithm must trial all possible candidate nodes. These trials account for 99% of the training time. In contrast, RSG determines the node set before performing residual fitting, which generally makes the algorithm faster than ASG. However, for high-dimensional functions, RSG requires significant time to filter through admissible node layers.

Remark: Both ASG and RSG require traversing the entire set of supporting nodes due to the nature of residual fitting. In contrast, a dense grid, in the context of piecewise linear interpolation, only needs to bracket the given point within the state space.

Remark: In the context of solving elliptic/parabolic PDEs using finite difference methods with implicit schemes, Garcke & Ruttscheidt (2019) demonstrate that interpolation methods requiring evaluation of the unknown function at points other than supporting nodes cannot guarantee the monotonicity of the numerical scheme, even when schemes such as the upwind scheme are applied. This limitation arises because the function approximation at these points may not be representable as a convex combination of the supporting nodes, potentially undermining the ellipticity of the approximated PDE under certain parameter values. I discuss this topic further in another blog post.

Question: If I want to use global interpolations such as spectral methods, which type of grid should I use?

Answer: Global methods uses global basis functions which are merely affected by some singular local shapes. However, global methods usually involve solving large dense linear system. Thus, ASG works the best since it mostly reduces the required number of nodes.

Reference

  1. Schaab, A., & Zhang, A. (2022). Dynamic programming in continuous time with adaptive sparse grids. Available at SSRN 4125702.

  2. Brumm, J., & Scheidegger, S. (2017). Using adaptive sparse grids to solve high‐dimensional dynamic models. Econometrica, 85(5), 1575-1612.

  3. Schiekofer, T. (1998). Die Methode der Finiten Differenzen auf d unnen Gittern zur L osung elliptischer und parabolischer partieller Di erentialgleichungen (Doctoral dissertation, PhD thesis, Universit at Bonn).

  4. Garcke, J., & Ruttscheidt, S. (2019). Finite differences on sparse grids for continuous time heterogeneous agent models.

  5. Judd, K. L., Maliar, L., Maliar, S., & Valero, R. (2014). Smolyak method for solving dynamic economic models: Lagrange interpolation, anisotropic grid and adaptive domain. Journal of Economic Dynamics and Control, 44, 92-123.

  6. Griebel, M. (1998). Adaptive sparse grid multilevel methods for elliptic PDEs based on finite differences. Computing, 61, 151-179.