A 15 min tutorial for Adaptive Sparse Grid (ASG)

1. Introduction

The curse of dimensionality makes it difficult for economists to model complex economic dynamics. For example, a five-state discrete-time lifecycle model might require millions of grid points, with higher-dimensional problems becoming infeasible due to time and memory constraints. In recent decades, numerical methods have been developed to address this issue. Sparse grids, an extension of standard interpolation theory, offer a natural transition to high-dimensional modeling.

Recent advances in sparse grid theory introduce adaptation, where nodes are added or removed based on the local function shape, further reducing the required number of interpolation nodes compared to regular sparse grids (RSG). This adaptive sparse grid (ASG) technique is particularly useful in economic research, where value functions often exhibit power law behaviors.

This blog post heuristically introduces the idea of ASG and its algorithm without discussing the underlying math, aiming to help a broad audience quickly engage with the literature, start learning, and apply this technique to their research.

2. Where do we need ASG?

In general, ASG works in all scenarios wherever an interpolation or function approximation is needed. Some typical scenarios are:

3. General idea

Let’s start from the standard interpolation theory. In this post, we discuss single-valued real functions f(x):RmR. An interpolant is a function defined by a collection of basis function ϕj:RmR and a collection of interpolation coefficients cjR in which n is the degree of interpolation.

(1)f(x)f^(x):=j=1nϕj(x)cj

Different interpolation methods differ from the design of ϕj but share the same structure as above. For example:

In this post, we discuss multi-dimensional piecewise linear interpolation (aka multi-linear interpolation). Readers may refer to the textbooks of numerical analysis for the details of the other methods. The regular degree-n piecewise linear interpolation uses n+1 distinct nodes and "hat function" ϕ(x). The following figure gives a 1D illustration:

sss

In the figure, weighted by the interpolation coefficients, 7 evenly spaced hat functions "support" the linear approximation (red line) of the target power-law shape function. One can observe that:

  1. There are many unnecessary overlaps of the hat functions

  2. The interpolant (read line) approximates the target function well in regions that approximately behave linearly, while the approximation is less accurate in regions that change sharply

Regarding the 1st observation, one may ask the following question: Given a desirable tolerance of approximation error, is every node necessary?

The answer is no. The following figure shows that: we can remove some of the hat functions while keeping a similar approximation error. The nodes distribution here is "sparse" relative to the original piecewise linear interpolation. The corresponding numerical technique is (regular) sparse grid (RSG) theory which designs generic interpolation grids for arbitrary target functions. A broad class of methods such as Smolyak method are classified into this category.

sss

Regarding the 2nd observation, one question is that: Given a desirable tolerance of approximation error, how to improve the accuracy in those sharp-changing regions?

The solution is to use adaption, i.e. dynamically assign nodes according to the local shape of the target function. In regions that change sharply, more supporting nodes are assigned; in other regions where linear approximation is good enough, less supporting nodes are assigned. The following figure illustrates how the adaption procedure improve the approximation accuracy. One can see there is a concentration of supporting nodes on the left part of the figure. The gap between neighbor nodes increases as the function shape becoming more linear. Finally, the approximation accuracy is great everywhere of the figure.

sss

When combining the sparse grid theory and adaption, we have so-called (multi-linear) Adaptive Sparse Grid (ASG) interpolation. For readers to start learning the literature, here are some important studies:

If readers need more related works, I would recommend to check the papers by Michael Griebel at the Institute for Numerical Simulation (Institutetut fur Numericalsche Simulation) at University of Bonn, Germany.

Remark: The sparse grid technique, either RSG or ASG, is generally parallel to the specific choice of interpolation techniques. It is simply an improvement of the grid structure. Any numerical methods that need a grid can potentially work with sparse grids without large modifications. For example, Judd et al. (2014) paper gives a great example of combining RSG and spectral method which is a global technique.

4. Algorithm

Given the generic definition in Equation (1), an interpolant f^(x) can be defined by a 3-tuple:

(2)(X,P(X),{cj})

Where X is the domain of x, P is the set of supporting nodes or say grid structure, and {cj} is the set of interpolation coefficients. There are two main operations on such an interpolant f^(x): training and evaluation. Training is the process of defining the interpolant, including:

Evaluation is the process of computing the value of f^ at a given point x0X. The evaluation process can be simply done by applying the definition of interpolation.

Remark: In practice, a technique called residual fitting is used for adaption. This technique changes the way of doing the evaluation. However, one can find it is eventually Equation (1) as well. Check Schaab & Zhang (2022) for more details.

Let's discuss the training process then. To define an algorithm with adaption, we need to design:

The multi-linear ASG interpolation uses the following algorithm:

  1. Initialize the algorithm with one or more must-have nodes

  2. Before the algorithm converge:

    1. Check all the nodes added last round. For each i of them:

      1. On the surface of an m-dimensional sphere around node i, find all 2m candidate nodes along each dimension (Remark: total m dimension while each dimension has left and right neighbors)

      2. For every candidate node:

        1. If the current interpolant (without adding any candidate nodes this round) has a small enough interpolation error at the candidate node, then skip this candidate node (because adding the node has little contribution).

        2. Otherwise, add this candidate node

    2. If no new candidate nodes are added, then the algorithm converges

    3. Otherwise, reduce the radius of spheres and move to the next round

Remark: There is a long distance from the above algorithm to programming implementation. The idea of m-dimensional sphere, in practice, is done by the technique called hierarchical nodes which defines a rule of "growing" children nodes and determining the radius of spheres. Check Schaab & Zhang (2022) for details.

Remark: One may notice that there is only node adding operation in the algorithm but no dropping operations. This is because of the data structure completeness consideration. Check later sections for more explanation.

To illustrate the algorithm, consider a 2D function f(x,y) defined on a closed hypercube [0,1]2. We initialize the algorithm with only the center point (0.5,0.5).

sss

Standing at the blue "parent" point, check the far-end red "children" nodes on the sphere of dashed line. Determine which of them to be added according to the tolerance.

sss

Let's assume all four red candidate children nodes are accepted. Turn the the four nodes as the new "parent" nodes and start this round's algorithm.

sss

Let's assume only 3 candidate children nodes are accepted. Then in this round, reduce the sphere radius and continue the algorithm.

sss

Similarly, assume only 2 children nodes are accepted this round.

sss

Suppose there is no new nodes are accepted around (1/4,1/4), then the algorithm stops. The converged grid structure looks like:

sss

One can see that the distance between nodes are uneven while some local regions have more nodes than the others.

Remark:

  1. Readers with computer science background may recognize that the grid structure is in fact a 2m tree. The algorithm above is in fact a branch pruning.

  2. Dropping a non-leaf node breaks the tree's connectivity and every other node's depth, which makes the data structure to be unpredictable.

  3. For small m, linked list can provide the maximum flexibility. However, this becomes very inefficient as m increases. For medium m, a hash table is recommended (Griebel, 1998) thanks to the strict nodes hierarchization. For large m, hash table does not work well due to higher probability of hash collision. In this case, a database can effectively manage the grid.

  4. In most scenario of economic models (m ranges from 1 to 20), a hash table works well.

5. Example

5.1 Function approximation

Here are some numerical example for:

(3)f(x):=10.1+||x||1,x[0,1]m

Where ||x||1 is the l-1 norm. Such a function mimics a m-dimensional CRRA value function c1γ1γ with γ=2. Let's set the relative training tolerance as 1% and check m=1, 2, and 3. As comparison to the multi-linear ASG interpolation, the regular piecewise linear (RPL) interpolation using dense grid is set to have about 50 nodes along each dimension, which is a conventional in quantitative macroeconomic modeling.

5.1.1 1D example

sss

The 1st row displays the result of ASG, while the 2nd row displays the result of RPL. One can find that within the given tolerance, ASG uses only 30% many supporting nodes cp. RPL, while:

Remark: The bad interpolation accuracy at the lower bound of x could raise quantitatively significant issues in specific context of economic research (e.g. Hand-to-Mouth households, wealth distribution in Aiyagari model.)

5.1.2 2D example

sss

When we move to the 2-dimensional scenario, the sparsity of ASG wrt RPL becomes more significant. In the above figure matrix, the 1st column is the results of ASG, and the 2nd column is for RPL. One can observe similar results as the 1-dimensional scenario. In this two-dimensional scenario, ASG reaches similar interpolation accuracy by using only 7.3% many supporting nodes cp. RPL.

5.1.3 3D example

sss

If we move to the 3-dimensional scenario, the sparsity becomes even larger. Starting from m=3, it is hard to directly visualize the function. Here, I give 3 illustrations:

  1. Spatial distribution of ASG supporting nodes

  2. Histogram of relative errors, ASG vs. RPL

  3. Tail distribution of relative errors by quantiles, ASG vs. RPL

The spatial distribution of ASG supporting nodes and the histogram of relative errors provide similar message as before. The new tail distribution figure further illustrates that ASG and RPL has similar scale of errors in the most sharp-changing regions.

With the same accuracy of interpolation, ASG needs only 0.87% many supporting nodes cp. RPL.

5.2 HJB equation: stochastic Neo-classical growth model

To illustrate the results of applying ASG to practical economic models. I solve the following stochastic Neo-classical growth model which is a standard testing case:

(4)ρv(k,z)=maxc0c1γ1γ+vk{zkαδkc}+vzξ(z¯z)+12vzzσz2

Where:

The numerical scheme of finite difference is the quasi-fully implicit scheme used by Benjamin Moll. The solved value function v(k,z), saving policy k˙(k,z), consumption policy c(k,z), and the spatial distribution of supporting nodes are displayed below. In contrast, the native dense grid needs 500010000 nodes to reach the same precision.

sss

6. Pros & Cons: Choose between ASG, RSG, and dense grid

ASG has significant advantages in high-dimensional modeling. However, there is no free lunch. I compare multi-linear interpolation using: ASG, RSG (e.g. Smolyak), and dense grid to help readers choose the best method for their applications.

Remark: The comparison in the table may vary significantly by the interpolation methods. e.g. The time and space complexity of spectral approximation over the three types of grids only depend on the number of supporting nodes.

IDItemASGRSGDense grid
1Time complexity: trainingHighMediumMedium
2Time complexity: evaluationMediumMediumLow
3Space complexity: RAM during trainingHighMediumHigh
4Space complexity: result storageLowMediumHigh
5Number of supporting nodesLowMediumVery high
6Largest feasible dimensionality on personal computer >40>204
 Discrete time macroeconomic models:   
7- Compatible?YesYesYes
 Continuous time macroeconomic models:   
8- Compatible?YesYesYes
9- Unconditional monotonicity of implicit method + upwind scheme?MaybeMaybeYes

Remark: ASG dynamically grows the tree of nodes in which the number of accepted nodes in the converged node set is small, but the algorithm must trial all possible candidate nodes. Such trials take 99% time of the training. RSG determines the node set before doing residual fitting, which makes the algorithm usually takes less time than ASG. However, for high dimensional function, RSG takes a long time of filtering admissible node layers.

Remark: Both ASG and RSG requires traversal the whole set of supporting nodes due to the nature of residual fitting. However, the dense grid, in the context of piecewise linear interpolation, only needs to bracket the given point in the state space.

Remark: In the context of solving elliptic/parabolic PDE using finite difference with implicit schemes, Garcke & Ruttscheidt (2019) shows that any interpolation methods, which require evaluating the unknown function at points other than supporting nodes, cannot ensure the monotonicity of the numerical scheme, even schemes such as upwind scheme is applied. The reason is that the function approximation at these points may not be expressed as convex combination of the supporting nodes, which potentially breaks the ellipticity of the approximated PDE under specific parameter values. I discuss this topic in another blog post.

Question: If I want to use global interpolations such as spectral methods, which type of grid should I use?

Answer: Global methods uses global basis functions which are merely affected by some singular local shapes. However, global methods usually involve solving large dense linear system. Thus, ASG works the best since it mostly reduces the required number of nodes.

Reference

  1. Schaab, A., & Zhang, A. (2022). Dynamic programming in continuous time with adaptive sparse grids. Available at SSRN 4125702.

  2. Brumm, J., & Scheidegger, S. (2017). Using adaptive sparse grids to solve high‐dimensional dynamic models. Econometrica, 85(5), 1575-1612.

  3. Schiekofer, T. (1998). Die Methode der Finiten Differenzen auf d unnen Gittern zur L osung elliptischer und parabolischer partieller Di erentialgleichungen (Doctoral dissertation, PhD thesis, Universit at Bonn).

  4. Garcke, J., & Ruttscheidt, S. (2019). Finite differences on sparse grids for continuous time heterogeneous agent models.

  5. Judd, K. L., Maliar, L., Maliar, S., & Valero, R. (2014). Smolyak method for solving dynamic economic models: Lagrange interpolation, anisotropic grid and adaptive domain. Journal of Economic Dynamics and Control, 44, 92-123.

  6. Griebel, M. (1998). Adaptive sparse grid multilevel methods for elliptic PDEs based on finite differences. Computing, 61, 151-179.