Semantic matching against a corpus

Domain experts are often interested in the extent to which a particular idea is expressed in a corpus. We explore (a) methods for efficiently identifying semantic matches of a query, and (b) domain-specific analyses enabled through their output.


Natural disaster recovery

Goal: increased understanding of the factors that go into long-term disaster recovery (e.g., rebuilding, funding, community wellbeing).

Data/domain: recovery after the 2010-2011 Canterbury earthquake sequence; collected ~1k articles from local NZ news (2011-2015) as our text corpus.

Papers:

Semantic matching against a corpus: New applications and methods
Lucy H. Lin, Scott B. Miles, Noah A. Smith.
Technical report; presented at NW-NLP (2018).
[paper | code being updated]
Natural language processing for analyzing disaster recovery trends expressed in large text corpora

Religious rhetoric & policy attitudes

Goal: understanding how U.S. senators, representatives, and other congressional actors use religious rhetoric in the context of different policy priorities.

Data/domain: >100m website captures from the house.gov and senate.gov websites (1997-2013) obtained by the Internet Archive.

Paper:

Religiosity and public policy in Congress: A text analysis of U.S. federal legislators' religious rhetoric and policy attitudes
Sarah K. Dreier, Lucy H. Lin, Sofia Serrano, Emily K. Gade, Noah A. Smith.
Manuscript in preparation; presented at Text As Data (2018) and Conference on Politics and Computational Social Science (2019).

This work was funded by the National Science Foundation (grant #1541025, graduate fellowship).