An Atlas of Machine Commonsense for If-Then Reasoning

Quick links:   [download the data]   [read the paper]

Knowledge Graph Browser

Select an event and see our annotations (or type to search):

No event selected


For some events, annotations are quite diverse, does this mean the data is noisy?

Importantly, some events invoke highly selective commonsense anticipations, while others invoke much more diverse anticipations. Knowledge about this varying degree of uncertainty (i.e., a relatively flat distribution over diverse inference) is a natural and important part of our commonsense knowledge. Thus, for some events, it is correct to see diverse annotations.

Can ML models (such as neural networks) learn from potentially diverse annotations?

Yes! the reason why this is possible is the same as the reason why it is possible to train a "language model". Despite the high variation in language, it is possible to learn the generalizable patterns in language as probabilistic models. We view commonsense also as a stochastic modeling problem.

What is the agreement level anyway?

To shed light on data quality for all dimensions, we run a separate data quality verification study on a random subset of 100 events, asking five MTurkers to validate whether an individual annotation is correct given an event and dimension. We find that on average, annotations are valid 86% of the time, with a breakdown per dimension shown below.

Disclaimer/Content warning: the events in atomic have been automatically extracted from blogs, stories and books written at various times. The events might depict violent or problematic actions, which we left in the corpus for the sake of learning the (probably negative but still important) commonsense implications associated with the events. We removed a small set of truly out-dated events, but might have missed some so please email us if you have any concerns.