A 15 min tutorial for Adaptive Sparse Grid (ASG)1. Introduction2. Where do we need ASG?3. General idea4. Algorithm5. Example5.1 Function approximation5.1.1 1D example5.1.2 2D example5.1.3 3D example5.2 HJB equation: stochastic Neo-classical growth model6. Pros & Cons: Choose between ASG, RSG, and dense gridReference
The curse of dimensionality makes it difficult for economists to model complex economic dynamics. For example, a five-state discrete-time lifecycle model might require millions of grid points, with higher-dimensional problems becoming infeasible due to time and memory constraints. In recent decades, numerical methods have been developed to address this issue. Sparse grids, an extension of standard interpolation theory, offer a natural transition to high-dimensional modeling.
Recent advances in sparse grid theory introduce adaptation, where nodes are added or removed based on the local function shape, further reducing the required number of interpolation nodes compared to regular sparse grids (RSG). This adaptive sparse grid (ASG) technique is particularly useful in economic research, where value functions often exhibit power law behaviors.
This blog post heuristically introduces the idea of ASG and its algorithm without discussing the underlying math, aiming to help a broad audience quickly engage with the literature, start learning, and apply this technique to their research.
In general, ASG works in all scenarios wherever an interpolation or function approximation is needed. Some typical scenarios are:
Lifetime optimization problems, both discrete time and continuous time
High-dimensional non-parametric estimation
Numerical quadrature and other exercises
Let’s start from the standard interpolation theory. In this post, we discuss single-valued real functions
Different interpolation methods differ from the design of
Global interpolation
(Global) polynomial interpolation
Spectral method
(Global) spline
Local interpolation
Piecewise polynomial interpolation
Spline
In this post, we discuss multi-dimensional piecewise linear interpolation (aka multi-linear interpolation). Readers may refer to the textbooks of numerical analysis for the details of the other methods. The regular degree-
In the figure, weighted by the interpolation coefficients, 7 evenly spaced hat functions "support" the linear approximation (red line) of the target power-law shape function. One can observe that:
There are many unnecessary overlaps of the hat functions
The interpolant (read line) approximates the target function well in regions that approximately behave linearly, while the approximation is less accurate in regions that change sharply
Regarding the 1st observation, one may ask the following question: Given a desirable tolerance of approximation error, is every node necessary?
The answer is no. The following figure shows that: we can remove some of the hat functions while keeping a similar approximation error. The nodes distribution here is "sparse" relative to the original piecewise linear interpolation. The corresponding numerical technique is (regular) sparse grid (RSG) theory which designs generic interpolation grids for arbitrary target functions. A broad class of methods such as Smolyak method are classified into this category.
Regarding the 2nd observation, one question is that: Given a desirable tolerance of approximation error, how to improve the accuracy in those sharp-changing regions?
The solution is to use adaption, i.e. dynamically assign nodes according to the local shape of the target function. In regions that change sharply, more supporting nodes are assigned; in other regions where linear approximation is good enough, less supporting nodes are assigned. The following figure illustrates how the adaption procedure improve the approximation accuracy. One can see there is a concentration of supporting nodes on the left part of the figure. The gap between neighbor nodes increases as the function shape becoming more linear. Finally, the approximation accuracy is great everywhere of the figure.
When combining the sparse grid theory and adaption, we have so-called (multi-linear) Adaptive Sparse Grid (ASG) interpolation. For readers to start learning the literature, here are some important studies:
All theory, proof and math
Schiekofer (1998) - NOTE: This is a long dissertation written in German but it deserves a careful reading if readers want to really master the theory.
Practical algorithm
Schaab & Zhang (2022, SSRN) - TIPS: This paper gives detailed explanation about how to solve a household lifetime problem with ASG.
Garcke & Ruttscheidt (2019) - TIPS: Carefully check this paper if you are working with HANK-style model
Griebel (1998) - TIPS: Check this paper if you want to write your own programs
Applications in economics:
Continuous time
Schaab & Zhang (2022, SSRN)
Garcke & Ruttscheidt (2019)
In general
Brumm & Scheidegier (2017, ECTA) - TIPS: this paper also has a benchmarking that compares the number of nodes of dense grid, regular sparse grid, and ASG
Potential issues, tricks and solutions: Garcke & Ruttscheidt (2019)
If readers need more related works, I would recommend to check the papers by Michael Griebel at the Institute for Numerical Simulation (Institutetut fur Numericalsche Simulation) at University of Bonn, Germany.
Remark: The sparse grid technique, either RSG or ASG, is generally parallel to the specific choice of interpolation techniques. It is simply an improvement of the grid structure. Any numerical methods that need a grid can potentially work with sparse grids without large modifications. For example, Judd et al. (2014) paper gives a great example of combining RSG and spectral method which is a global technique.
Given the generic definition in Equation (1), an interpolant
Where
Determining the underlying grid structure
Finding the interpolation coefficients
Evaluation is the process of computing the value of
Remark: In practice, a technique called residual fitting is used for adaption. This technique changes the way of doing the evaluation. However, one can find it is eventually Equation (1) as well. Check Schaab & Zhang (2022) for more details.
Let's discuss the training process then. To define an algorithm with adaption, we need to design:
The rule of adding/dropping a supporting node
What criteria to decide if to add or drop a node?
In the case of adding nodes, how to find all the candidate node(s) to be added?
The formula of computing interpolation coefficients
The multi-linear ASG interpolation uses the following algorithm:
Initialize the algorithm with one or more must-have nodes
Before the algorithm converge:
Check all the nodes added last round. For each
On the surface of an
For every candidate node:
If the current interpolant (without adding any candidate nodes this round) has a small enough interpolation error at the candidate node, then skip this candidate node (because adding the node has little contribution).
Otherwise, add this candidate node
If no new candidate nodes are added, then the algorithm converges
Otherwise, reduce the radius of spheres and move to the next round
Remark: There is a long distance from the above algorithm to programming implementation. The idea of
-dimensional sphere, in practice, is done by the technique called hierarchical nodes which defines a rule of "growing" children nodes and determining the radius of spheres. Check Schaab & Zhang (2022) for details.
Remark: One may notice that there is only node adding operation in the algorithm but no dropping operations. This is because of the data structure completeness consideration. Check later sections for more explanation.
To illustrate the algorithm, consider a 2D function
Standing at the blue "parent" point, check the far-end red "children" nodes on the sphere of dashed line. Determine which of them to be added according to the tolerance.
Let's assume all four red candidate children nodes are accepted. Turn the the four nodes as the new "parent" nodes and start this round's algorithm.
Let's assume only 3 candidate children nodes are accepted. Then in this round, reduce the sphere radius and continue the algorithm.
Similarly, assume only 2 children nodes are accepted this round.
Suppose there is no new nodes are accepted around
One can see that the distance between nodes are uneven while some local regions have more nodes than the others.
Remark:
Readers with computer science background may recognize that the grid structure is in fact a
tree. The algorithm above is in fact a branch pruning. Dropping a non-leaf node breaks the tree's connectivity and every other node's depth, which makes the data structure to be unpredictable.
For small
, linked list can provide the maximum flexibility. However, this becomes very inefficient as increases. For medium , a hash table is recommended (Griebel, 1998) thanks to the strict nodes hierarchization. For large , hash table does not work well due to higher probability of hash collision. In this case, a database can effectively manage the grid. In most scenario of economic models (
ranges from 1 to 20), a hash table works well.
Here are some numerical example for:
Where
The 1st row displays the result of ASG, while the 2nd row displays the result of RPL. One can find that within the given tolerance, ASG uses only 30% many supporting nodes cp. RPL, while:
The large interpolation error of RPL mostly comes from the sharp-changing region close to the left boundary of
Meanwhile, the concentration of nodes in the other areas generates "too-accurate" approximation with respect to the error tolerance.
However, ASG generates a relatively uniform error distribution over
Remark: The bad interpolation accuracy at the lower bound of
could raise quantitatively significant issues in specific context of economic research (e.g. Hand-to-Mouth households, wealth distribution in Aiyagari model.)
When we move to the 2-dimensional scenario, the sparsity of ASG wrt RPL becomes more significant. In the above figure matrix, the 1st column is the results of ASG, and the 2nd column is for RPL. One can observe similar results as the 1-dimensional scenario. In this two-dimensional scenario, ASG reaches similar interpolation accuracy by using only 7.3% many supporting nodes cp. RPL.
If we move to the 3-dimensional scenario, the sparsity becomes even larger. Starting from
Spatial distribution of ASG supporting nodes
Histogram of relative errors, ASG vs. RPL
Tail distribution of relative errors by quantiles, ASG vs. RPL
The spatial distribution of ASG supporting nodes and the histogram of relative errors provide similar message as before. The new tail distribution figure further illustrates that ASG and RPL has similar scale of errors in the most sharp-changing regions.
With the same accuracy of interpolation, ASG needs only 0.87% many supporting nodes cp. RPL.
To illustrate the results of applying ASG to practical economic models. I solve the following stochastic Neo-classical growth model which is a standard testing case:
Where:
The TFP shock follows an Ornstein–Uhlenbeck process with mean-reverting coefficient
All parameters are set by convention.
The numerical scheme of finite difference is the quasi-fully implicit scheme used by Benjamin Moll. The solved value function
ASG has significant advantages in high-dimensional modeling. However, there is no free lunch. I compare multi-linear interpolation using: ASG, RSG (e.g. Smolyak), and dense grid to help readers choose the best method for their applications.
Remark: The comparison in the table may vary significantly by the interpolation methods. e.g. The time and space complexity of spectral approximation over the three types of grids only depend on the number of supporting nodes.
ID | Item | ASG | RSG | Dense grid |
---|---|---|---|---|
1 | Time complexity: training | High | Medium | Medium |
2 | Time complexity: evaluation | Medium | Medium | Low |
3 | Space complexity: RAM during training | High | Medium | High |
4 | Space complexity: result storage | Low | Medium | High |
5 | Number of supporting nodes | Low | Medium | Very high |
6 | Largest feasible dimensionality on personal computer | >40 | >20 | 4 |
Discrete time macroeconomic models: | ||||
7 | - Compatible? | Yes | Yes | Yes |
Continuous time macroeconomic models: | ||||
8 | - Compatible? | Yes | Yes | Yes |
9 | - Unconditional monotonicity of implicit method + upwind scheme? | Maybe | Maybe | Yes |
Remark: ASG dynamically grows the tree of nodes in which the number of accepted nodes in the converged node set is small, but the algorithm must trial all possible candidate nodes. Such trials take 99% time of the training. RSG determines the node set before doing residual fitting, which makes the algorithm usually takes less time than ASG. However, for high dimensional function, RSG takes a long time of filtering admissible node layers.
Remark: Both ASG and RSG requires traversal the whole set of supporting nodes due to the nature of residual fitting. However, the dense grid, in the context of piecewise linear interpolation, only needs to bracket the given point in the state space.
Remark: In the context of solving elliptic/parabolic PDE using finite difference with implicit schemes, Garcke & Ruttscheidt (2019) shows that any interpolation methods, which require evaluating the unknown function at points other than supporting nodes, cannot ensure the monotonicity of the numerical scheme, even schemes such as upwind scheme is applied. The reason is that the function approximation at these points may not be expressed as convex combination of the supporting nodes, which potentially breaks the ellipticity of the approximated PDE under specific parameter values. I discuss this topic in another blog post.
Question: If I want to use global interpolations such as spectral methods, which type of grid should I use?
Answer: Global methods uses global basis functions which are merely affected by some singular local shapes. However, global methods usually involve solving large dense linear system. Thus, ASG works the best since it mostly reduces the required number of nodes.
Schaab, A., & Zhang, A. (2022). Dynamic programming in continuous time with adaptive sparse grids. Available at SSRN 4125702.
Brumm, J., & Scheidegger, S. (2017). Using adaptive sparse grids to solve high‐dimensional dynamic models. Econometrica, 85(5), 1575-1612.
Schiekofer, T. (1998). Die Methode der Finiten Differenzen auf d unnen Gittern zur L osung elliptischer und parabolischer partieller Di erentialgleichungen (Doctoral dissertation, PhD thesis, Universit at Bonn).
Garcke, J., & Ruttscheidt, S. (2019). Finite differences on sparse grids for continuous time heterogeneous agent models.
Judd, K. L., Maliar, L., Maliar, S., & Valero, R. (2014). Smolyak method for solving dynamic economic models: Lagrange interpolation, anisotropic grid and adaptive domain. Journal of Economic Dynamics and Control, 44, 92-123.
Griebel, M. (1998). Adaptive sparse grid multilevel methods for elliptic PDEs based on finite differences. Computing, 61, 151-179.