Jason Hilton and Jakub Bijak, 2016
This is a chapter in Bavel and Grow’s book on agent-based modeling in demography. Per the title, this chapter provides guidance on best practices for how to design and analyze simulations. Most of these tips apply generally across any simulation. The techniques provided are meant to handle the complexity of agent-based models, which often combine multiple behavioral functions in ways that lead to nonlinearities that can be challenging to model.
Section 1: Design
The first topic they discuss is how to design experiments that test how a simulation changes in response to changes in inputs. Ideally we would want to rerun a simulation over a grid of all of our inputs. For instance in a typical marriage market simulation, there is an input that determines the number of married individuals at the start of the simulation, two parameters determining the radius of the region within which agents search for partners, and one parameter determining the size of an agents’ social network.
Ideally we would want to perform a complete grid search over these parameters, checking the output for every possible combination of inputs. For continuous inputs this is outright impossible. Even if we make every variable discrete, the number of possible combinations of inputs increases in a combinatorial way.
To get around this, the authors suggest using a latin hypercube instead of a grid. A latin hypercube is a subset of the grid chosen so that inputs never reoccur together with the same values. This strategy is meant to approximate covering the full parameter space without checking every single entry in the grid1. Doing this can often reduce the number of simulations that need to be run by ten times or more.
Section 2: Analysis
This section focusses on how to account for uncertainty in the models, of which the authors note there are many sources. The ‘brute force’ method for this, and the method I’ve seen most in other agent-based modeling papers, is to simply run a simulation many times in a Monte-Carlo to get sampling distributions over outputs for each input combination. The authors note that this, too, can be computationally prohibitive even when researchers have access to remote computing power and large numbers of processors.
They suggest an alternative approach of using statistical emulators. Basically, once enough simulations have been run, these can be used to learn the parameters of a statistical model that describes how outputs are a function of inputs. Once the model is trained, the modeler does not need to rerun simulations anymore, they can just generate predictions from the statistical model instead. They can also describe the total uncertainty of the model in terms of the spread parameters of the emulator. The authors present Gaussian process emulators as an example. These treat the distribution of outputs as a joint multivariate normal distribution and assumes that similar sets of input parameters yield similar outputs.
This is an interesting idea and it makes a lot of sense as a logical solution to the challenge of efficiently capturing uncertainty. However, I have some reservations. It seems as though replacing a simulation with a statistical model does get away from the ‘heart and soul’ of the exercise, particularly given that, as the authors note, agent-based models are often full of unexpected nonlinearities that might get lost in the modelling. It seems like to whole point of running simulations instead of collecting empirical data is to have more control over inputs and to be able to fully explore the input space. In treating simulations like empirical data, we seem to lose a lot of that flexibility.
This seems useful more generally. For instance, when developing machine learning models we often perform grid search cross-validation to obtain the best parameters for the model. Complex modeling pipelines have dozens to hundreds of parameters and complete grid searches over them are impossible. Latin hypercubes may represent a solution, at least to narrow the grid.↩︎