I am an assistant professor in the Paul G. Allen School of Computer Science & Engineering, at the University of Washington. I'm also an adjunct professor at the Language Technologies Institute at CMU. I work on Natural Language Processing–a subfield of computer science focusing on computational processing of human languages.
I am particularly interested in hybrid solutions at the intersection of machine learning and theoretical or social linguistics, i.e., solutions that combine interesting learning/modeling methods and insights about human languages or about people speaking these languages.
Much of my research group's work focuses on NLP for social good, multilingual NLP, and language generation. This research is motivated by a unified goal: to extend the capabilities of human language technology beyond individual populations and across language boundaries, thereby enabling NLP for diverse and disadvantaged users, the users that need it most.
Here are my CV and Google Scholar page.
Previously, I was an assistant professor in the Language Technologies Institute, School of Computer Science at Carnegie Mellon University, and before that a postdoc in the Stanford NLP Group. I got my PhD from CMU.
Teaching
Algorithms for NLP (undergraduate IITP course; co-teaching with David Mortensen)
Algorithms for NLP (undergraduate IITP course; co-teaching with David Mortensen)
Controlled Analyses of Social Biases in Wikipedia Bios. PDF
Proc. TheWebConf'22.SimVLM: Simple Visual Language Model Pretraining with Weak Supervision. PDF
Proc. ICLR'22.Controlled Text Generation as Continuous Optimization with Multiple Constraints. PDF
Proc. NeurIPS'21.SelfExplain: A Self-Explaining Architecture for Neural Text Classifiers. PDF
Proc. EMNLP'21.Evaluating the Morphosyntactic Well-formedness of Generated Texts. PDF
Proc. EMNLP'21.Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates. PDF
Proc. Findings of EMNLP'21.Detecting Community Sensitive Norm Violations in Online Conversations. PDF
Proc. Findings of EMNLP'21.Efficient Test Time Adapter Ensembling for Low-resource Language Varieties. PDF
Proc. Findings of EMNLP'21.Simple and Efficient ways to Improve REALM. PDF
Proc. MRQA'21.Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs. PDF
Proc. MRL'21.Improving Span Representation for Domain-adapted Coreference Resolution. PDF
Proc. CRAC'21.A Survey of Race, Racism, and Anti-Racism in NLP. PDF
Proc. ACL'21.Machine Translation into Low-resource Language Varieties. PDF
Proc. ACL'21.Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation. PDF
Proc. Findings of ACL'21.Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics. PDF
Proc. NAACL'21.Controlling Dialogue Generation with Semantic Exemplars. PDF
Proc. NAACL'21.DialoGraph: Incorporating Interpretable Strategy-Graph Networks into Negotiation Dialogues. PDF
Proc. ICLR'21.Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models. (Spotlight) PDF
Proc. ICLR'21.StructSum: Incorporating Latent and Explicit Sentence Dependencies for Single Document Summarization. PDF
Proc. EACL'21.Ranking Transfer Languages with Pragmatically-Motivated Features for Multilingual Sentiment Analysis. PDF
Proc. EACL'21.Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia. PDF
Proc. ICWSM'21.An Exploration of Data Augmentation Techniques for Improving English to Tigrinya Translation. PDF
Proc. AfricaNLP'21.Unsupervised Discovery of Implicit Gender Bias. PDF
Proc. EMNLP'20.On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment. PDF
Proc. EMNLP'20.Fortifying Toxic Speech Detectors Against Veiled Toxicity. PDF
Proc. EMNLP'20.Automatic Extraction of Rules Governing Morphological Agreement. PDF
Proc. EMNLP'20.Understanding Linguistic Accommodation in Code-Switched Human-Machine Dialogues. PDF
Proc. CoNLL'20.LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for Multi-Granular Propaganda Span Identification. PDF
Proc. SemEval'20.A Computational Analysis of Polarization on Indian and Pakistani Social Media. (Nominated for best paper award) PDF
Proc. SocInfo'20.A framework for the computational linguistic analysis of dehumanization. PDF
Frontiers in Artificial Intelligence.Demoting Racial Bias in Hate Speech Detection. PDF
Proc. SocialNLP'20.A Deep Reinforced Model for Cross-Lingual Summarization with Bilingual Semantic Similarity Reward. PDF
Proc. WNGT'20.Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions PDF
Proc. ACL'20.Balancing Training for Multilingual Neural Machine Translation PDF
Proc. ACL'20.Stress and Burnout in Open Source: Toward Finding, Understanding, and Mitigating Unhealthy Interactions PDF
Proc. of International Conference on Software Engineering -- New Ideas Track (ICSE-NIER).Augmenting Non-Collaborative Dialog Systems with Explicit Semantic and Strategic Dialog History PDF
Proc. ICLR'20.What Code-Switching Strategies are Effective in Dialog Systems? PDF
Proc. SCiL'20.Where New Words Are Born: Distributional Semantic Analysis of Neologisms and Their Semantic Neighborhoods PDF
Proc. SCiL'20.Topics to Avoid: Demoting Latent Confounds in Text Classification PDF
Proc. EMNLP'19.Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media PostsPDF
Proc. EMNLP'19.Learning to Generate Word- and Phrase-Embeddings for Efficient Phrase-Based Neural Machine Translation PDF
Proc. WNGT'19.A Margin-based Loss with Synthetic Negative Samples for Continuous-output Machine Translation PDF
Proc. WNGT'19.A Dynamic Strategy Coach for Effective Negotiation PDF
Proc. SIGdial'19.Entity-Centric Contextual Affective AnalysisPDF
Proc. ACL'19.CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in MorphologyPDF (Interpretability Prize)
Proc. SIGMORPHON'19.Quantifying Social Biases in Contextual Word RepresentationsPDF
Proc. of Workshop on Gender Bias for NLP.Contextual Affective Analysis: A Case Study of People Portrayals in Online #MeToo StoriesPDF
Proc. ICWSM'19.Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word EmbeddingsPDF
Proc. NAACL'19.Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous OutputsPDF
Proc. ICLR'19.Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political StrategiesPDF
Proc. EMNLP'18.Style Transfer Through Back-TranslationPDF
Proc. ACL'18.Native Language Cognate Effects on Second Language Lexical ChoicePDF DATA
Proceedings of the Transactions of Association for Computational Linguistics (TACL). 2018.RtGender: A Corpus for Studying Differential Responses to GenderPDF DATA
Proc. LREC'18.Incorporating Dialectal Variability for Socially Equitable Language IdentificationPDF CODE
Proc. ACL'17.Writer Profiling Without the Writer's TextPDF
Proc. SocInfo'17.Linguistic Knowledge in Data-Driven Natural Language ProcessingPDF
PhD thesis, September 2016.Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation LearningPDF
Proc. ACL'16.Correlation-based Intrinsic Evaluation of Word Vector RepresentationsPDF CODE
In RepEval'16.Problems With Evaluation of Word Embeddings Using Word Similarity TasksPDF
In RepEval'16.Polyglot Neural Language Models: Case Study in Cross-Lingual Phonetic Representation LearningPDF
Proc. NAACL'16.Morphological Inflection Generation Using Character Sequence to Sequence LearningPDF
Proc. NAACL'16.Massively Multilingual Word Embeddings PDF
arXiv preprintCross-Lingual Bridges with Models of Lexical Borrowing.PDF
Journal of Artificial Intelligence Research (JAIR). 2016.Evaluation of Word Vector Representations by Subspace Alignment.PDF CODE
In Proc. EMNLP'15.Not All Contexts Are Created Equal: Better Word Representations with Variable Attention.PDF
In Proc. EMNLP'15.Lexicon Stratification for Translating Out-of-Vocabulary Words.PDF
In Proc. ACL'15.Sparse Overcomplete Word Vector Representations.PDF
In Proc. ACL'15.A Bottom Up Approach to Category Mapping and Meaning Change.PDF
In Proc. NetWordS'15.Constraint-Based Models of Lexical Borrowing.PDF
In Proc. NAACL'15.Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources.PDF
Computational Linguistics, 40(2):449-468, 2014.Metaphor Detection with Cross-Lingual Model Transfer.PDF CODE DATA
In Proc. ACL'14.Augmenting Translation Models with Simulated Acoustic Confusions for Improved Spoken Language Translation.PDF
In Proc. EACL'14.Augmenting English Adjective Senses with Supersenses.PDF CODE DATA
In Proc. LREC'14.Unified Annotation Scheme for the Semantic/Pragmatic Components of Definiteness.PDF DATA
In Proc. LREC'14.Automatic Classification of Communicative Functions of Definiteness.PDF
In Proc. COLING'14.The CMU Machine Translation Systems at WMT 2014.PDF
In Proc. WMT'14.Generating English Determiners in Phrase-Based Translation with Synthetic Translation Options.PDF
In Proc. WMT'13.The CMU Machine Translation Systems at WMT 2013: Syntax, Synthetic Translation Options, and Pseudo-References.PDF
In Proc. WMT'13.Identifying the L1 of non-native writers: the CMU-Haifa system.PDF
In Proc. the 8th Workshop on Innovative Use of NLP for Building Educational Applications, 2013.Cross-Lingual Metaphor Detection Using Common Semantic Features.PDF
In Proc. Meta4NLP Workshop, 2013.Identification and Modeling of Word Fragments in Spontaneous Speech.PDF
In Proc. ICASSP'13.Extraction of Multi-word Expressions from Small Parallel Corpora.PDF
In Natural Language Engineering 18(4):549-573, 2012.Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources.PDF
In Proc. EMNLP'11.Extraction of Multi-word Expressions from Small Parallel Corpora.PDF
University of Haifa M.Sc. thesis, September 2010.Extraction of Multi-word Expressions from Small Parallel Corpora.PDF
In Proc. COLING'10.Automatic Acquisition of Parallel Corpora from Websites with Dynamic Content.PDF
In Proc. LREC'10.