Conference papers

  • DataComp-LM: In search of the next generation of training sets for language models DataComp-LM: In search of the next generation of training sets for language models
    Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muenninghoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner, Maciej Kilian, Hanlin Zhang, Rulin Shao, Sarah Pratt, Sunny Sanyal, Gabriel Ilharco, Giannis Daras, Kalyani Marathe, Aaron Gokaslan, Jieyu Zhang, Khyathi Chandu, Thao Nguyen, Igor Vasiljevic, Sham Kakade, Shuran Song, Sujay Sanghavi, Fartash Faghri, Sewoong Oh, Luke Zettlemoyer, Kyle Lo, Alaaeldin El-Nouby, Hadi Pouransari, Alexander Toshev, Stephanie Wang, Dirk Groeneveld, Luca Soldani, Pang Wei Koh, Jenia Jitsev, Thomas Kollar, Alexandros G Dimakis, Yair Carmon, Achal Dave, Ludwig Schmidt, Vaishaal Shankar
    NeurIPS 2024, Datasets and Benchamrks Track

  • Uncertainty Quantification with User-level Differential Privacy
    Abhradeep Guha Thakurta, Dj Dvijotham, Georgie Evans, Peter Kairouz, Ryan McKenna, Sewoong Oh
    working paper

    • present at Theory and Practice of Differential Privacy 2023

  • One-shot Empirical Privacy Estimation for Federated Learning
    Andrew, Galen, Peter Kairouz, Sewoong Oh, Alina Oprea, H. Brendan McMahan, and Vinith Suriyakumar
    ICLR 2024, (Oral presentation)

    • International Workshop on Federated Learning in the Age of Foundation Models in Conjunction with NeurIPS 2023

  • Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment
    Rachel Cummings, Damien Desfontaines, David Evans, Roxana Geambasu, Matthew Jagielski, Yangsibo Huang, Peter Kairouz, Gautam Kamath, Sewoong Oh, Olga Ohrimenko, Nicolas Papernot, Ryan Rogers, Milan Shen, Shuang Song, Weijie Su, Andreas Terzis, Abhradeep Thakurta, Sergei Vassilvitskii, Yu-Xiang Wang, Li Xiong, Sergey Yekhanin, Da Yu, Huanyu Zhang, Wanrong Zhang
    Harvard Data Science Review, 6(1), January 2024

    • In July 2022, we hosted a workshop titled “Differential Privacy (DP): Challenges towards the Next Frontier” with experts from industry, academia, and the public sector to discuss and find solutions to the challenges of differential privacy. This is a report from that workshop.

  • Label Poisoning is All You Need
    Rishi D. Jha, Jonathan Hayase, Sewoong Oh
    NeurIPS, 2023

    • presented at NeurIPS 2023 workshop on Federated Learning in the Age of Foundation Models

  • Improving multimodal datasets with image captioning
    Thao Nguyen, Samir Yitzhak Gadre, Gabriel Ilharco, Sewoong Oh, Ludwig Schmidt
    NeurIPS 2023, Datasets and Benchmarks Track

    • presented at ICML 2023 workshop on Data-centric Machine Learning Research (DMLR)

  • DataComp: In search of the next generation of multimodal datasets
    Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt
    NeurIPS 2023, Datasets and Benchmarks Track (Oral presentation)

  • MAML and ANIL Provably Learn Representations
    Liam Collins, Aryan Mokhtari, Sewoong Oh, Sanjay Shakkottai
    ICML, 2022

    • video of a talk on Dec 2022 at C3 Digital Transformation Institute is available here

    • slides from my talk at C3 Digital Transformation Institute is available here

    • 5 minutes presentation by Liam Collins at ICML is available here

  • Differential privacy and robust statistics in high dimensions
    Xiyang Liu, Weihao Kong, Sewoong Oh
    COLT, 2022

    • Presented at the third AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI-22), the recording of a 12 minute presentation is available here

    • video of a talk on Nov 2021 at SNAPP seminar series is available here

    • slides from my talk at SNAPP seminar is available here

    • slides from my talk at Google is available here

  • Robust and Differentially Private Mean Estimation
    Xiyang Liu, Weihao Kong, Sham Kakade, Sewoong Oh
    NeurIPS 2021,

    • presented at the ICML 2021 Workshop on Federated Learning for User Privacy and Data Confidentiality (ICML-FL 2021)

    • presented at the CCS 2021 workshop Privacy Preserving Machine Learning (PPML’21)

    • video of a talk on Oct 2021 at Simons Institute is available here

    • slides from my talk is available here

    • code is available here

  • Gradient Inversion with Generative Image Prior
    Jaechang Kim, Jinwoo Jeon, Kangwook Lee, Sewoong Oh, and Jungseul Ok,
    NeurIPS, 2021

    • presented at the ICML 2021 Workshop on Federated Learning for User Privacy and Data Confidentiality (ICML-FL 2021)

  • DeepTurbo: Deep Turbo Decoder
    Yihan Jiang, Hyeji Kim, Himanshu Asnani, Sreeram Kannan, Sewoong Oh, and Pramod Viswanath
    2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2019

  • Detecting Sponsored Recommendations Subhashini Krishnasamy, Rajat Sen, Sewoong Oh, and Sanjay Shakkottai
    SIGMETRICS (short paper) 2015

  • Learning Mixed Multinomial Logit Model from Ordinal Data
    Sewoong Oh and Devavrat Shah
    NIPS 2014

  • What's your choice? Learning the mixed multi-nomial logit model
    Ammar Ammar, Sewoong Oh, Devavrat Shah, and Luis-Filipe Voloch
    SIGMETRICS (short paper) 2014

  • Gossip PCA
    Satish Babu Korada, Andrea Montanari, and Sewoong Oh
    ACM SIGMETRICS 2011

Journal papers

Sequence-to-sequence translation from mass spectra to peptides with a transformer model

Melih Yilmaz, William E. Fondrie, Wout Bittremieux, Carlo F. Melendez, Rowan Nelson, Varun Ananth, Sewoong Oh, William Stafford Noble
Nature Comunications, 2024, 15(1), pp.6427

Accounting for digestion enzyme bias in Casanovo

Carlo Melendez, Justin Sanders, Melih Yilmaz, Wout Bittremieux, William E Fondrie, Sewoong Oh, William Stafford Noble
Journal of Proteome Research, 2024,

MAUVE scores for generative models: Theory and practice

Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha Swayamdipta, Rowan Zellers, Sewoong Oh, Yejin Choi, Zaid Harchaoui
Journal of Machine Learning Research, 2023

Towards a Defense Against Federated Backdoor Attacks Under Continuous Training

Shuaiqi Wang, Jonathan Hayase, Giulia Fanti, Sewoong Oh
Transactions on Machine Learning Research (TMLR), 2023

Machine Learning-Aided Efficient Decoding of Reed-Muller Subcodes

Mohammad Vahid Jamali, Xiyang Liu, Ashok Vardhan Makkuva, Hessam Mahdavifar, Sewoong Oh, Pramod Viswanath
IEEE Transactions on Selected Areas in Information Theory (JSAIT) , 2023

Gradient flows on graphons: existence, convergence, continuity equations

Sewoong Oh, Soumik Pal, Raghav Somani, Raghavendra Tripathi
Journal of Theoretical Probability, 2023,
presented at the NeurIPS 2021 workshop on Optimal Transport and Machine Learning

Evaluating proteomics imputation methods with improved criteria

L Harris, WE Fondrie, S Oh, WS Noble
Journal of Proteome Research, 2023, 22 (11), 3427-3438

Reducing peptide sequence bias in quantitative mass spectrometry data with machine learning

AB Dincer, Y Lu, DK Schweppe, S Oh, WS Noble
Journal of Proteome Research, 2022, 21 (7), 1771-1782

Physical Layer Communication via Deep Learning

Hyeji Kim, Sewoong Oh, Pramod Viswanath
IEEE Transactions on Selected Areas in Information Theory (JSAIT), Vol.1, no.1, pp.5-18, 2020,

LEARN Codes: Inventing Low-latency Codes via Recurrent Neural Networks

Yihan Jiang, Hyeji Kim, Himanshu Asnani, Sreeram Kannan, Sewoong Oh, and Pramod Viswanath
IEEE Transactions on Selected Areas in Information Theory (JSAIT), Vol.1, no.1, pp.207-216, 2020,

PacGAN: The power of two samples in generative adversarial networks

Zinan Lin, Ashish Khetan, Giulia Fanti, Sewoong Oh
IEEE Transactions on Selected Areas in Information Theory (JSAIT), 2 Vol.1, no.1, pp.324-335, 2020, [ code ], [ project page]

Deepcode: Feedback Codes via Deep Learning

Hyeji Kim, Yihan Jiang, Sreeram Kannan, Sewoong Oh, Pramod Viswanath
IEEE Transactions on Selected Areas in Information Theory (JSAIT), Vol.1, no.1, pp.194-206, 2020, [ code by Hyeji Kim ], [ code by Yihan Jiang ]

Spectrum Estimation from a Few Entries

Ashish Khetan, Sewoong Oh
Journal of Machine Learning Research, Vol.20, Issue:21, January 2019

Learning from Comparisons and Choices

Sahand Negahban, Sewoong Oh, Kiran Thekumparampil, and Jiaming Xu,
Journal of Machine Learning Research, Vol.19, Issue:40, pp.1-95, September 2018

Generalized Rank-breaking: Computational and Statistical Tradeoffs

Ashish Khetan, Sewoong Oh
Journal of Machine Learning Research, Vol.19, Issue:28, pp.1-42, September 2018 [bibtex]

Optimality of Belief Propagation for Crowdsourced Classification

Jungseul Ok, Sewoong Oh, Jinwoo Shin, Yung Yi
IEEE Transactions on Information Theory, Vol.64, Issue:9, pp.6127-6138, September 2018,

Demystifying Fixed k-Nearest Neighbor Information Estimators

Weihao Gao, Sewoong Oh, Pramod Viswanath
IEEE Transactions on Information Theory, Vol.64, Issue:8, pp.5629-5661 February 2018, [bibtex]

Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation

Weihao Gao, Sewoong Oh, Pramod Viswanath
IEEE Transactions on Information Theory, Vol.64, Issue:5, pp.3313-3330, May 2018, [bibtex]

Discovering Potential Correlations via Hypercontractivity

Hyeji Kim, Weihao Gao, Sreeram Kannan, Sewoong Oh, and Pramod Viswanath
Entropy, Vol.19, Issue:11, pp.586, October 2017, [ code ], [bibtex]

Data-driven Rank Breaking for Efficient Rank Aggregation

Ashish Khetan, Sewoong Oh
Journal of Machine Learning Research, Vol.17, no.193, pp.1-54, October 2016 [bibtex]

Hiding the Rumor Source

Giulia Fanti, Peter Kairouz, Sewoong Oh, Kannan Ramchandran, and Pramod Viswanath
IEEE Transactions on Information Theory, Vol.63, Issue:10, pp.6679-6713, October 2017 [bibtex]

Metadata-conscious Anonymous Messaging

Giulia Fanti, Peter Kairouz, Sewoong Oh, Kannan Ramchandran, and Pramod Viswanath
IEEE Transactions on Signal and Information Processing over Networks, Volume: 2, Issue: 4, pp.582 - 594, December 2016

Detecting Sponsored Recommendations

Subhashini Krishnasamy, Rajat Sen, Sewoong Oh, and Sanjay Shakkottai
ACM Transactions on Modeling and Performance Evaluation of Computing Systems, Volume 2, Issue 1, pp.6:1–6:29, November 2016

The Composition Theorem for Differential Privacy

Peter Kairouz, Sewoong Oh and Pramod Viswanath
IEEE Transaction on Information Theory, Volume 63, Issue 6, pp.4037-4049, June 2017 [bibtex]

Extremal Mechanisms for Local Differential Privacy

Peter Kairouz, Sewoong Oh, and Pramod Viswanath
Journal of Machine Learning Research, Volume 17, no.17, pp.1-51, April 2016 [bibtex]

RankCentrality: Ranking from Pair-wise Comparisons

Sahand Negahban, Sewoong Oh, and Devavrat Shah
Operations Research, Vol.65, no.1, pp.266-287, October 2016 [bibtex]

The Staircase Mechanisms in Differential Privacy

Q. Geng, P. Kairouz, S. Oh, and P. Viswanath
Selected Topics in Signal Processing, April 2015

Budget-optimal Task Allocation for Reliable Crowdsourcing Systems

David R. Karger, Sewoong Oh and Devavrat Shah
Operations Research, Volume 62 Issue 1, pp.1-24, January-February 2014 [bibtex]

Robust Localization from Incomplete Local Information

Amin Karbasi and Sewoong Oh
IEEE Transactions on Networking, Vol 21, pp.1131-1144, August 2013, [bibtex]

Calibration using Matrix Completion with Application to Ultrasound Tomography

Reza Parhizkar, Amin Karbai, Sewoong Oh and Martin Vetterli
IEEE Transactions on Signal Processing, Vol 61, pp.4923-4933, October 2013, [bibtex]

Counting with the Crowd

Adam Marcus, David Karger, Samuel Madden, Robert Miller, Sewoong Oh
Journal of the VLDB Endowment, Vol. 6, issue 2, pp.109-120, December 2012, [bibtex]

Matrix Completion from Noisy Entries

Raghunandan Keshavan, Andrea Montanari and Sewoong Oh
Journal of Machine Learning Research, vol. 11, pp.2057-2078, July 2010, [ bibtex , code ]

Matrix Completion from a Few Entries

Raghunandan Keshavan, Andrea Montanari and Sewoong Oh
IEEE Transactions on Information Theory,vol. 56,no. 6, pp.2980-2998, June 2010, [ bibtex , code ]

Dissertation

Matrix Completion: Fundamental Limits and Efficient Algorithms

Ph.D. Dissertation, Stanford Univesiry, December 2010