Moe Kayali

ReviewData Dataset

This is a relational dataset of academic peer-review. It consists of four main tables and relations between them. The data covers two thousand submissions to ten conferences and workshops in computer science. The years 2017—2019 are represented. Importantly, it contains both accepted and rejected submissions.

ReviewData was created by compiling data from OpenReview, Scopus and the Shanghai University Rankings.

If you use this dataset, kindly cite the paper “Causal Relational Learning’’ as follows:

@inproceedings{DBLP:conf/sigmod/SalimiPKGRS20,  
author    = {Babak Salimi and  
             Harsh Parikh and  
              Moe Kayali and  
              Lise Getoor and  
              Sudeepa Roy and  
              Dan Suciu},  
 title     = {Causal Relational Learning},  
 booktitle = {{SIGMOD} Conference},  
 pages     = {241--256},  
 publisher = {{ACM}},  
 year      = {2020}  
}

Download .xz format (2.9 MiB zipped, 18.4 MiB unzipped)
Download .zip format (4.3 MiB zipped)

Unzipped SHA1 checksum: 8288413d7e5f9803708ea2244ee3c742e1df6176.

Data description

Data is provided in SQLite 3 format. Schemas for the four tables—Authors, Conferences, Contributed, Papers, and Reviews—are provided below.

`Authors` Table Schema

Attribute	Description	Type
`aid`	Author ID, primary key	integer
`name`	Full name of the author	string
`email`	Author’s email	string
`inst_guess`	Guess of the author’s main affiliation (whois lookup against email domain)	string
`world_rank`	Ranking of the author’s main affiliation	integer
`document_count`	Count of papers the author has published	integer
`citation_count`	Sum of citations this author has recieved	integer
`h_index`	h-index of author	integer
`coauthor_count`	Total count of lifetime collaborators	integer
`year_experience`	Length of academic publication career	integer

`Conferences` Table Schema

Attribute	Description	Type
`cid`	Conference ID, primary key	integer
`name`	Name and year of the conference	string
`accept_count`	Count of papers accepted at the conference	integer
`reject_count`	Count of papers rejected from conference	integer
`selectivity`	Synthetic, `accept_count` / `reject_count`	real
`is_workshop`	True if a workshop, false if a conference	bool
`double_blind`	True if double-blind reviewing used, false if single-blind reviewing used	bool

`Contributed` Table Schema

Attribute	Description	Type
`ctr_aid`	ID of author that contributed, primary key (1/2)	integer
`ctr_pid`	ID of paper that was contributed to, primary key (2/2)	string

`Papers` Table Schema

Attribute	Description	Type
`pid`	Paper ID, primary key	string
`title`	Paper Title	string
`abstract`	Paper Abstract	string
`decision`	True if accepted, false if rejected	bool
`submitted_to`	Conference ID of venue paper was submitted to	integer

`Reviews` Table Schema

Attribute	Description	Type
`rid`	Review ID, primary key	integer
`review_of`	ID of the paper this review is about	string
`title`	Review title	string
`review`	Review body text	string
`rating`	Rating on [0, 1] of paper quality, where 1 is a perfect score, normalized across conferences	real
`confidence`	Rating on [0, 1] of reviewer confidence, where 1 is total certainty, normalized accross conferences	real
`raw_rating`	Raw rating string, not comparable across conferences	string
`raw_confidence`	Raw confidence string, not comparable across conferences	string

ReviewData Dataset

Data description

Authors Table Schema

Conferences Table Schema

Contributed Table Schema

Papers Table Schema

Reviews Table Schema

`Authors` Table Schema

`Conferences` Table Schema

`Contributed` Table Schema

`Papers` Table Schema

`Reviews` Table Schema