John Thickstun

Contact: thickstn at

I am a PhD candidate in Computer Science & Engineering at the University of Washington, co-advised by Sham Kakade and Zaid Harchaoui. I completed my undergraduate degree in Applied Mathematics at Brown University, where I was advised by Eugene Charniak and Björn Sandstede. My current research interests include generative models, sampling, time series, and applications to music. My research has been supported by a 2017 NSF graduate fellowship, and a 2020 Qualcomm innovation fellowship.

I will be completing my PhD in Summer 2021, and joining Stanford in Autumn as a postdoc working with Percy Liang.

My CV is available here.


Autumn 2020: CSE 599 Generative Models

Research Directions

Source Separation and Conditional Sampling

We can use a generative model as a prior for decomposing a linear mixture of sources into its constituent parts. We demonstrate this on the left for linearly superimposed images of churches and bedrooms. The idea is to train train two generative models: one that estimates the likelihood of the distribution over images of churches, and another that does the same for images of bedrooms. We can separate a mixture of images by decomposing it into two images that are likely under the priors for churches and bedrooms respectively, that satisfy the constraint that decomposition must sum to the given mixture. This can be interpreted as conditional sampling from the posterior distribution over sources given a mixture. Our work on visual source separation has appeared in ICML 2020. Extensions of these results to audio separation and more general conditional sampling problems are in preparation.

[Conference Paper] [Recorded Presentation] [Code]

Generative Modeling of Musical Scores

Generative models are powerful tools for revealing structure in data. Features learned by fitting an unsupervised generative modeling objective can be transferred to other tasks. Or, as see in the source separation project above, we can directly leverage these generative models as priors. A fun aspect of these models is that you can sample from them; a generative model over musical scores can be turned into a kind of automatic music composer (see left). Musical scores are highly structured, heterogenous objects. Their two-dimensional structure is reminiscent of visual data, but their time-series structure and sparsity are reminiscent of language. In contrast to both language and visual imagery, the number of scores in a particular musical genre is limited. This makes score modeling an inherently low-resource learning problem. In work appearing in ISMIR 2019, we discuss domain-specific modeling choices that help maximize what we can learn from limited data.

[Conference Paper] [Demos] [Code]

MusicNet and Music Transcription

Musical scores comprise a dense set of labels on performances of classical western music. These labelings are analogous to semantic segmentations of visual imagery. However, in order to use a score as a segmentation map, we must first align it to particular audio recording by warping precise timings in the score onto expressive timings chosen by the performers. This is the music alignment problem. Using an alignment algorithm, we created the MusicNet dataset by aligning scores to a collection of freely-licensed recordings. We can use these labels to, for example, train an automatic music transcription system. In work appearing at ICASSP 2018, we describe a state-of-the-art transcription model trained using MusicNet that to-date (2019) is the best-performing algorithm in the MIREX transcription challenge.

[Conference Paper] [Code]

Publications and Preprints

MAUVE: Human-Machine Divergence Curves for Evaluating Open-Ended Text Generation.
Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, Yejin Choi, Zaid Harchaoui.
ArXiv Preprint, 2021.
Experiments available on GitHub.

Rethinking Evaluation Methodology for Audio-to-Score Alignment.
John Thickstun, Jennifer Brennan, and Harsh Verma.
ArXiv Preprint, 2020.
Experiments available on GitHub.

Faster Policy Learning with Continuous-Time Gradients.
Samuel Ainsworth, Kendall Lowrey, John Thickstun, Zaid Harchaoui, Siddhartha Srinivasa.
Learning for Dynamics & Control (L4DC), 2021.

An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction.
Bhargavi Paranjape, Mandar Joshi, John Thickstun, Hannaneh Hajishirzi, and Luke Zettlemoyer.
Empirical Methods in Natural Language Processing (EMNLP), 2020.
Experiments available on GitHub.

Source Separation with Deep Generative Priors.
Vivek Jayaram* and John Thickstun*.
International Conference on Machine Learning (ICML), 2020.
Experiments available on GitHub. Conference presentation available on YouTube.
* Equal Contribution

Convolutional Composer Classification.
Harsh Verma and John Thickstun.
International Society for Music Information Retrieval (ISMIR), 2019.
Experiments available on GitHub.

Coupled Recurrent Models for Polyphonic Music Composition.
John Thickstun, Zaid Harchaoui, Dean P. Foster, and Sham M. Kakade.
International Society for Music Information Retrieval (ISMIR), 2019.
Experiments available on GitHub. Demos available here.

Invariances and Data Augmentation for Supervised Music Transcription.
John Thickstun, Zaid Harchaoui, Dean P. Foster, and Sham M. Kakade.
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
Experiments available on GitHub.

Learning Features of Music from Scratch (MusicNet).
John Thickstun, Zaid Harchaoui, and Sham M. Kakade.
International Conference on Learning Representations (ICLR), 2017.
Experiments available on GitHub.

Notes and Tutorials

A brief mathematical introduction to GAN's

The Transformer model in equations

Information theory: an alternative introduction with applications to concentration of measure

Conditional Random Fields as a generalization of logistic regression

Some notes on Hilbert-Schmidt operators

Kernels and Mercer's Theorem

Estimating the Shannon capacity of a graph

Thoughts on proof assistants with companion code


Three perspectives on the Black-Scholes formula:

Heuristics for manipulating stochastic differential equations

Negative probabilities in the binomial option pricing model


The fast Johnson-Lindenstrauss transform

The coin flip martingale

Probability densities from a measure-theoretic perspective

Change of measure

Fun stuff

Some linguistic observations

Quotient sigma-algebras

Climbing a tower of abstractions