Moe Kayali

ReviewData Dataset

This is a relational dataset of academic peer-review. It consists of four main tables and relations between them. The data covers two thousand submissions to ten conferences and workshops in computer science. The years 2017—2019 are represented. Importantly, it contains both accepted and rejected submissions.

ReviewData was created by compiling data from OpenReview, Scopus and the Shanghai University Rankings.

If you use this dataset, kindly cite the paper “Causal Relational Learning’’ as follows:

author    = {Babak Salimi and  
             Harsh Parikh and  
              Moe Kayali and  
              Lise Getoor and  
              Sudeepa Roy and  
              Dan Suciu},  
 title     = {Causal Relational Learning},  
 booktitle = {{SIGMOD} Conference},  
 pages     = {241--256},  
 publisher = {{ACM}},  
 year      = {2020}  

Download .xz format (2.9 MiB zipped, 18.4 MiB unzipped)
Download .zip format (4.3 MiB zipped)

Unzipped SHA1 checksum: 8288413d7e5f9803708ea2244ee3c742e1df6176.

Data description

Data is provided in SQLite 3 format. Schemas for the four tables—Authors, Conferences, Contributed, Papers, and Reviews—are provided below.

Authors Table Schema

Attribute Description Type
aid Author ID, primary key integer
name Full name of the author string
email Author’s email string
inst_guess Guess of the author’s main affiliation (whois lookup against email domain) string
world_rank Ranking of the author’s main affiliation integer
document_count Count of papers the author has published integer
citation_count Sum of citations this author has recieved integer
h_index h-index of author integer
coauthor_count Total count of lifetime collaborators integer
year_experience Length of academic publication career integer

Conferences Table Schema

Attribute Description Type
cid Conference ID, primary key integer
name Name and year of the conference string
accept_count Count of papers accepted at the conference integer
reject_count Count of papers rejected from conference integer
selectivity Synthetic, accept_count / reject_count real
is_workshop True if a workshop, false if a conference bool
double_blind True if double-blind reviewing used, false if single-blind reviewing used bool

Contributed Table Schema

Attribute Description Type
ctr_aid ID of author that contributed, primary key (1/2) integer
ctr_pid ID of paper that was contributed to, primary key (2/2) string

Papers Table Schema

Attribute Description Type
pid Paper ID, primary key string
title Paper Title string
abstract Paper Abstract string
decision True if accepted, false if rejected bool
submitted_to Conference ID of venue paper was submitted to integer

Reviews Table Schema

Attribute Description Type
rid Review ID, primary key integer
review_of ID of the paper this review is about string
title Review title string
review Review body text string
rating Rating on [0, 1] of paper quality, where 1 is a perfect score, normalized across conferences real
confidence Rating on [0, 1] of reviewer confidence, where 1 is total certainty, normalized accross conferences real
raw_rating Raw rating string, not comparable across conferences string
raw_confidence Raw confidence string, not comparable across conferences string