Scalable Methods for Discovering Latent Structure in Societal-Scale Data



Joshua Blumenstock
Sham M. Kakade

Student and Postdocs:

Ramya Korlakai Vinayak
Weihao Kong
Gabriel Cadamuro
Raza Khan
Emily Aiken

Other Collaborators:

COVID19 Reports and Contact Tracing:

Due to the COVID19 pandemic, both PI-Kakade and PI-Blumenstock rapidly pivoted efforts to focus on several COVID-related efforts. The first was focused on public health management, and consisted of work to improve contact tracing (both manual and digital) and the methods for estimating COVID prevalence and resource requirements. The second used societal scale digital data from developing countries to help direct emergency humanitarian aid. Each of these efforts were huge interdisciplinary collaborations between theoretical computer scientists, mobile networking experts, social scientists, epidemiologists, and medical doctors.

Our work on the COVID19 pandemic had immediate consequences and influenced policy makers guidance for when lockdowns should occur, and it also provided guidelines for the number of tests and contact tracers required for controlling outbreaks. We brought together theoretical computer scientists (and cryptographers), sensor networking experts, social scientists, epidemiologists, and medical doctors to develop new contact tracing methodologies. This laid the foundations for the new Exposure Notification App now deployed in WA state.

This work was done with support from the National Science Foundation, under NSF Awards CCF-1637360, CCF-1703574, CCF-1740551, and IIS-1942702

Rapid Response Reports:

  • Machine learning can help get COVID-19 aid to those who need it most
    Joshua E Blumenstock
    Nature, 581 (7807), PDF

  • PACT: Privacy Sensitive Protocols and Mechanisms for Mobile Contact Tracing.
    Justin Chan, Landon Cox, Dean Foster, Shyam Gollakota, Eric Horvitz, Joseph Jaeger, Sham Kakade, Tadayoshi Kohno, John Langford, Jonathan Larson, Puneet Sharma, Sudheesh Singanamalla, Jacob Sunshine, Stefano Tessaro.
    In IEEE Bulletin on Data Engineering, Vol. 43 No. 2, 2020.
    Special Issue on Data Technologies Behind Digital Contact Tracing for COVID19
    ArXiv Report, arXiv: 2004.03544.

  • Privacy Guarantees for Personal Mobility Data in Humanitarian Response
    Nitin Kohli, Emily Aiken, and Joshua E Blumenstock.
    KDD 2020 Workshop on Humanitarian Mapping, San Diego, CA

  • Mitigate/Suppress/Maintain: Local Targets for Victory Over COVID
    Divya Siddarth et. al.
    Rapid Response Initiative.
    In The Edmond J. Safra Center at Harvard University May 2020.
    Report, PDF.

  • Pandemic Resilience: Getting it Done
    Danielle Allen et. al.
    Rapid Response Initiative.
    In The Edmond J. Safra Center at Harvard University May 2020.
    Report, PDF.

  • Outpacing the Virus: Digital Response to Containing the Spread of COVID-19 while Mitigating Privacy Risks.
    Vi Hart et. al.
    Rapid Response Initiative.
    In The Edmond J. Safra Center at Harvard University May 2020.
    Report, PDF.

  • Public Mobility Data Enables COVID-19 Forecasting and Management at Local and Global Scales
    Cornelia Ilin, Sébastien E. Annan-Phan, Xiao Hui Tai, Shikhar Mehra, Solomon M. Hsiang, and Joshua E. Blumenstock
    NBER Working Paper #28120, PDF.

  • News and The App:

  • The CommonCircle App.

  • In the News: from WA Governor's Office .

  • The WA Exposure Notification App: WA Notify.

  • Discovering Latent Structure:

    Recently, the rapid proliferation of mobile phones and other digital devices can created an unparalleled opportunity to observe and understand the rapidly changing structure of communities in developing and conflict-affected states. In recent years, Call Detail Records (CDR) from commercial mobile phone networks have been used to study not just the frequency and timing of communication events, but also reflect the intricate structure of an individual's social network, patterns of travel and location choice, as well as the socioeconomic and demographic structure of national and sub-regional populations.

    However, current state-of-the-art computational methods used to analyse such data are notoriously ill-suited to answer basic, fundamental questions in the social science and policy arena. While many new, provably efficient algorithms for community detection have been recently developed, these methods have several key limitations: they rarely scale to real-world datasets consisting of millions of interconnected actors; they are not applicable to dynamic contexts where network structure evolves over time; and they are almost never validated.

    Publications and Preprints:

    • Targeting Development Aid with Machine Learning and Mobile Phone Data: Evidence from an Anti-Poverty Intervention in Afghanistan.
      Emily Aiken, Guadalupe Bedoya, Aidan Coville, and Joshua E. Blumenstock
      ACM SIGCAS Computing and Sustainable Societies (COMPASS '20), PDF.

    • Optimal Estimation of Change in a Population of Parameters.
      Ramya Korlakai Vinayak, Weihao Kong, Sham M. Kakade.
      ArXiv Report, arXiv: 1911.12568.

    • Robust Meta-learning for Mixed Linear Regression with Small Batches.
      Weihao Kong, Raghav Somani, Zhao Song, Sham Kakade, Sewoong Oh.
      To appear in NeurIPS.
      ArXiv Report, arXiv:2006.09702.

    • Meta-learning for mixed linear regression.
      Weihao Kong, Raghav Somani, Zhao Song, Sham Kakade, Sewoong Oh.
      In ICML.
      ArXiv Report, arXiv: 2002.08936.

    • Maximum Likelihood Estimation for Learning Populations of Parameters.
      Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, Sham M. Kakade
      In ICML.
      ArXiv Report, arXiv: 1902.04553.

    • The Illusion of Change: Correcting for Bias when Inferring Changes in Sparse, Societal-Scale Data.
      Gabriel Cadamuro, Ramya Korlakai Vinayak, Joshua Blumenstock, Sham Kakade, Jacob N. Shapiro
      The Web Conference 2019.
      Paper, pdf.

    • Recovering Structured Probability Matrices.
      Qingqing Huang, Sham Kakade, Wenhao Kong, Gregory Valiant.
      In ITCS, 2018.
      ArXiv Report, arXiv:1602.06586.

    • Prediction with a Short Memory.
      Sham Kakade, Percy Liang, Vatsal Sharan, Gregory Valiant.
      In STOC, 2018.
      ArXiv Report, arXiv:1612.02526.

    • Determinants of Mobile Money Adoption in Pakistan.
      Raza Khan and Joshua Blumenstock
      In NIPS, 2017, Workshop on Machine Learning for the Developing World
      ArXiv Report, arXiv:1712.01081.

    • Predictors without Borders: Behavioral Modeling of Product Adoption in Three Developing Countries.
      Raza Khan and Joshua Blumenstock
      In KDD, 2017.
      ACM Digital Library, #2939710 .

    Related technical methods for scalability, sparsity, and time series modeling:
    • Soft Threshold Weight Reparameterization for Learnable Sparsity.
      Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman, Prateek Jain, Sham Kakade, Ali Farhadi.
      In ICML.
      ArXiv Report, arXiv: 2002.03231.

    • Accelerating Stochastic Gradient Descent.
      Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford
      In COLT, 2018.
      ArXiv Report, arXiv:1704.08227.

    • On the insufficiency of existing momentum schemes for Stochastic Optimization.
      Rahul Kidambi, Praneeth Netrapalli, Prateek Jain, Sham M. Kakade
      In ICLR, 2018.
      ArXiv Report, arXiv:1803.05591.

    • Learning Overcomplete HMMs.
      Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant.
      In NIPS, 2017.
      ArXiv Report, arXiv:1711.02309.

    Related Code:

    • Predicting poverty and wealth with mobile phone metadata
      Open ICPSR.

    • Code for accelerating stochastic optimization.
      github repository.

    Support and Funding:

    The primary support for this project comes from National Science Foundation Grant #CCF - 1637360 (Algorithms in the Field). Sham Kakade also acknowledges funding from Washington Research Foundation Fund for Innovation in Data-Intensive Discovery.

    Contact Info

    Email: sham [at] cs [dot] washington [dot] edu

    Email: jblumenstock [at] berkeley [dot] edu