Scalable Methods for Discovering Latent Structure in Societal-Scale Data



Joshua Blumenstock
Sham M. Kakade

Student and Postdocs:

Weihao Kong
Ramya Korlakai Vinayak
Gabriel Cadamuro
Raza Khan

Other Collaborators:

The Project:

Recently, the rapid proliferation of mobile phones and other digital devices can created an unparalleled opportunity to observe and understand the rapidly changing structure of communities in developing and conflict-affected states. In recent years, Call Detail Records (CDR) from commercial mobile phone networks have been used to study not just the frequency and timing of communication events, but also reflect the intricate structure of an individual's social network, patterns of travel and location choice, as well as the socioeconomic and demographic structure of national and sub-regional populations.

However, current state-of-the-art computational methods used to analyse such data are notoriously ill-suited to answer basic, fundamental questions in the social science and policy arena. While many new, provably efficient algorithms for community detection have been recently developed, these methods have several key limitations: they rarely scale to real-world datasets consisting of millions of interconnected actors; they are not applicable to dynamic contexts where network structure evolves over time; and they are almost never validated.


  • Maximum Likelihood Estimation for Learning Populations of Parameters.
    Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, Sham M. Kakade
    In ICML, 2019
    ArXiv Report, arXiv: 1902.04553.

  • Multi-GCN: Graph Convolutional Networks for Multi-View Networks, with Applications to Global Poverty
    Muhammad Raza Khan and Joshua Blumenstock
    In AAAI, 2019.
    Paper, pdf.

  • The Illusion of Change: Correcting for Bias when Inferring Changes in Sparse, Societal-Scale Data.
    Gabriel Cadamuro, Ramya Korlakai Vinayak, Joshua Blumenstock, Sham Kakade, Jacob N. Shapiro
    The Web Conference 2019.
    Paper, pdf.

  • Recovering Structured Probability Matrices.
    Qingqing Huang, Sham Kakade, Wenhao Kong, Gregory Valiant.
    In ITCS, 2018.
    ArXiv Report, arXiv:1602.06586.

  • Prediction with a Short Memory.
    Sham Kakade, Percy Liang, Vatsal Sharan, Gregory Valiant.
    In STOC, 2018.
    ArXiv Report, arXiv:1612.02526.

  • Determinants of Mobile Money Adoption in Pakistan.
    Raza Khan and Joshua Blumenstock
    In NIPS, 2017, Workshop on Machine Learning for the Developing World
    ArXiv Report, arXiv:1712.01081.

  • Predictors without Borders: Behavioral Modeling of Product Adoption in Three Developing Countries.
    Muhammad Raza Khan and Joshua Blumenstock
    In KDD, 2017.
    ACM Digital Library, #2939710 .

Related technical methods for scalability and time series modeling:
  • Accelerating Stochastic Gradient Descent.
    Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford
    In COLT, 2018.
    ArXiv Report, arXiv:1704.08227.

  • On the insufficiency of existing momentum schemes for Stochastic Optimization.
    Rahul Kidambi, Praneeth Netrapalli, Prateek Jain, Sham M. Kakade
    In ICLR, 2018.
    ArXiv Report, arXiv:1803.05591.

  • Learning Overcomplete HMMs.
    Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant.
    In NIPS, 2017.
    ArXiv Report, arXiv:1711.02309.

Related Code:

  • Predicting poverty and wealth with mobile phone metadata
    Open ICPSR.

  • Code for accelerating stochastic optimization.
    github repository.

Support and Funding:

This material is based upon work supported by the National Science Foundation under Grant CCF - 1637360 (Algorithms in the Field). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Sham Kakade also acknowledges funding from Washington Research Foundation Fund for Innovation in Data-Intensive Discovery.

Contact Info

Email: sham [at] cs [dot] washington [dot] edu

Email: jblumenstock [at] berkeley [dot] edu