Computational Demography

I think of computational demography as encapsulating the measurement and analysis of demographic processes via non-traditional data sources and computationally intensive methods. Together with Matt Hall, I maintain a reading list of papers in this field, which I keep available here. I have four ongoing projects in this field:

Small Area Estimates from Consumer Data

I work on this project together with Professors Arthur Acolin and Matthew Hall. Consumer data are large datasets, often containing hundreds of millions of individual-level observations, maintained and sold by commercial vendors. These vendors maintain the data by combining and cross-referencing information about people’s interactions with various private and public institutions, like utility payments, voter registrations, real estate tax assessments, and credit card billing statements. Vendors assemble these datasets for sale to marketers. We have purchased the data to see if we can leverage them for policy and research purposes. We have had success using the data for a few different research questions so far:

  • Measuring the effect of gentrification-driven residential mobility on people’s exposure to contextual determinants of health (link).

  • Estimating the number of households in each Census tract on an annual frequency (link).

  • Estimating annual tract-to-tract migration flows across the United States (In progress).

  • Assessing whether neighborhood politics affect households’ preferences about where to live in the United States (In progress).

On the predictability of individual mortality

During summer 2022 I was part of a collaborative project at the Max Planck Institute for Demographic Research, where we asked how well individual mortality can be predicted. We wrote a paper using longitudinal survey data on the aging US population from the Health and Retirement Survey (HRS), comparing a range of machine learning survival models to quantify exactly how well we can predict mortality.

Political Methodology

During the summer of 2020 I was a Data Science for Social Good Fellow at the University of Washington’s eScience Institute. Our team worked to revamp and publish an R package called eiCompare, which we continue to maintain. The package contains tools for estimating the voting behavior of different race groups using Bayesian statistical methods. We hope this package can help ensure that the 2021 round of electoral redistricting results in fair representation across the country. Out of our work came two additional research projects:

  • On the effect of a researcher’s choice of race/ethnicity proxy on downstream conclusions about the voting preferences of different race/ethnic groups (link).

  • On machine learning methods as an alternative to the standard Bayesian approach to proxying race/ethnicity (link).

Knowledge graphs for social science research

Knowledge graphs represent heterogeneous networks containing multiple types of nodes and edges. Technology companies use them to represent their data. For instance, Amazon might use a knowledge graph to simultaneously represent information about who purchased, viewed, and reviewed which products, which products belong to which brands. Knowledge graphs can be embedded and used to cluster products or predict future links in the data. Amazon might use this to generate product recommendations, or to categorize their products (though this is just speculation). I worked with knowledge graphs at Graphika, and in a couple research projects:

  • Using knowledge graphs to measure partisanship in social media discourse (link).

  • Quantifying lifestyle politics in online marketplaces using an Amazon co-purchase network (preprint)