Scalable Methods for Discovering Latent Structure in Societal-Scale Data



Joshua Blumenstock
Sham M. Kakade

Student and Postdocs:

Ramya Korlakai Vinayak
Gabriel Cadamuro
Raza Khan

Other Collaborators:

The Project:

Recently, the rapid proliferation of mobile phones and other digital devices can created an unparalleled opportunity to observe and understand the rapidly changing structure of communities in developing and conflict-affected states. In recent years, Call Detail Records (CDR) from commercial mobile phone networks have been used to study not just the frequency and timing of communication events, but also reflect the intricate structure of an individual's social network, patterns of travel and location choice, as well as the socioeconomic and demographic structure of national and sub-regional populations.

However, current state-of-the-art computational methods used to analyse such data are notoriously ill-suited to answer basic, fundamental questions in the social science and policy arena. While many new, provably efficient algorithms for community detection have been recently developed, these methods have several key limitations: they rarely scale to real-world datasets consisting of millions of interconnected actors; they are not applicable to dynamic contexts where network structure evolves over time; and they are almost never validated.


  • Recovering Structured Probability Matrices.
    Qingqing Huang, Sham Kakade, Wenhao Kong, Gregory Valiant.
    In ITCS, 2018.
    ArXiv Report, arXiv:1602.06586.

  • Prediction with a Short Memory.
    Sham Kakade, Percy Liang, Vatsal Sharan, Gregory Valiant.
    In STOC, 2018.
    ArXiv Report, arXiv:1612.02526.

  • Determinants of Mobile Money Adoption in Pakistan.
    Raza Khan and Joshua Blumenstock
    In NIPS, 2017, Workshop on Machine Learning for the Developing World
    ArXiv Report, arXiv:1712.01081.

  • Predictors without Borders: Behavioral Modeling of Product Adoption in Three Developing Countries.
    Raza Khan and Joshua Blumenstock
    In KDD, 2017.
    ACM Digital Library, #2939710 .

Related technical methods for scalability and time series modeling:
  • Accelerating Stochastic Gradient Descent.
    Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford
    In COLT, 2018.
    ArXiv Report, arXiv:1704.08227.

  • On the insufficiency of existing momentum schemes for Stochastic Optimization.
    Rahul Kidambi, Praneeth Netrapalli, Prateek Jain, Sham M. Kakade
    In ICLR, 2018.
    ArXiv Report, arXiv:1803.05591.

  • Learning Overcomplete HMMs.
    Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant.
    In NIPS, 2017.
    ArXiv Report, arXiv:1711.02309.

Related Code:

  • Predicting poverty and wealth with mobile phone metadata
    Open ICPSR.

  • Code for accelerating stochastic optimization.
    github repository.

Support and Funding:

The primary support for this project comes from National Science Foundation Grant #CCF - 1637360 (Algorithms in the Field). Sham Kakade also acknowledges funding from Washington Research Foundation Fund for Innovation in Data-Intensive Discovery.

Contact Info

Email: sham [at] cs [dot] washington [dot] edu

Email: jblumenstock [at] berkeley [dot] edu