Fact Checking:

In this project, we performed linguistic analysis over unreliable news sources as well as fact-checked statements from politifact.com. Resources from our EMNLP'17 short paper are below.

Unreliable News Data

We also compiled a large corpus of unreliable news articles. These articles come from different types of unreliable sources including satire, propaganda, and hoaxes. See table below for more details:

News type Source # Doc # Tokens/Doc
Satire The Onion 14,170 350
Satire Borowitz Report 627 250
Satire Clickhole 188 303
Hoax American News 6,914 204
Hoax DC Gazette 5,133 582
Propaganda Natural News 15,580857
Propaganda Activist Report 17,869 1,169

Labelled Unreliable News Data Set

Politifact Data

Collection of rated statments from Politifact fact-checkers (as of Mar 2016) and their connected sites ( punditfact). Statements are rated on 6 pt. scale from True to Pants-on-fire-False. More details about the rubric that is used at Politifact can be found at the politifact website.

~10,000 graded politifact statements

Wiktionary Lexicons

We compiled the following lexicons for textual analysis of "dramatic" language found in unreliable (esp. hoax and propaganda) articles. All of the following lists were compiled from Wiktionary and include:

Wiktionary-based Lexicons

Relevant Papers


hrashkin at cs dot washington dot edu