Scalable Probabilistic Inference for Large Knowledge Bases

Description

Small: Scalable Probabilistic Inference for Large Knowledge Bases

Acknowledgment:
This material is based upon work supported by the National Science Foundation under Grant No. 1614738

Disclaimer:
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

NSF Proposal Number: NSF III-1614738

Duration: 2016 - 2020

Amount: $500,000

NSF Abstract page: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1614738

The proposal addresses the query evaluation problem in Probabilistic Databases. It is motivated by Knowledge Base Construction, an important, new problem in data man- agement, which constructs a large structured database from unstructured data. The approach to query evaluation is based on a novel technique, called lifted inference, which allows very efficient evaluation of SQL queries over large probabilistic databases. Lifted inference computes the probability of a SQL query inductively on the structure of the query, without having to first ground the query to compute the large factor graph. While lifted inference is every efficient, it is possible only for some queries. The project pursues several venues to develop lifted inference, such as combining lifted inference with sampling, extending lifted inference to symmetric databases, and extending lifted inference to queries with negation. It also examines a variety of applications of probabilistic databases and lifted inference.

Principal Investigator: Dan Suciu

Data and software are available here:
Cosette SQL Solver
Pessimistic Cardinality Estimation
HypDB (demonstration at VLDB'2018)


Supported by:

NSF III-1614738

Members

Mahmoud Abo Khamis,
Hung Ngo,
Dan Olteanu,
Dan Suciu,
Walter Cai,
Magdalena Balazinska,
Babak Salimi,
Luke Rodriguez,
Bill Howe,
Shumo Chu,
Alvin Cheung,
Brendan Murphy,
Jared Roesch,
Daniel Li,
Remy Wang,
Chenglong Wang,
Laurel Orr,

Publications

Mahmoud Abo Khamis, Hung Ngo, Dan Olteanu, Dan Suciu,
Boolean Tensor Decomposition for Conjunctive Queries with Negation
In 22nd International Conference on Database Theory, ICDT 2019, March 26-28, 2019, Lisbon, Portugal, pp. 21:1--21:19, 2019
Walter Cai, Magdalena Balazinska, Dan Suciu,
Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities
In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019., pp. 18--35, 2019
Babak Salimi, Luke Rodriguez, Bill Howe, Dan Suciu,
Interventional Fairness: Causal Database Repair for Algorithmic Fairness
In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019., pp. 793--810, 2019
Babak Salimi, Dan Suciu,
Bias in OLAP Queries: Detection, Explanation, and Removal
In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, pp. 1021--1035, 2018
Shumo Chu, Alvin Cheung, Dan Suciu,
Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries
Published in CoRR, vol. abs/1802.02229 , 2018
Shumo Chu, Brendan Murphy, Jared Roesch, Alvin Cheung, Dan Suciu,
Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries
Published in PVLDB, vol. 11 , no. 11 , pp. 1482--1495 , 2018
Mahmoud Abo Khamis, Hung Ngo, Dan Suciu,
What Do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Have to Do with One Another?
In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017, Chicago, IL, USA, May 14-19, 2017, pp. 429--444, 2017
Shumo Chu, Daniel Li, Remy Wang, Chenglong Wang, Alvin Cheung, Dan Suciu,
Demonstration of the Cosette Automated SQL Prover
In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, pp. 1591--1594, 2017
Laurel Orr, Magdalena Balazinska, Dan Suciu,
Probabilistic Database Summarization for Interactive Data Exploration
Published in CoRR, vol. abs/1703.03856 , 2017
Laurel Orr, Dan Suciu, Magdalena Balazinska,
Probabilistic Database Summarization for Interactive Data Exploration
Published in PVLDB, vol. 10 , no. 10 , pp. 1154--1165 , 2017
Shumo Chu, Alvin Cheung, Dan Suciu,
HoTTSQL: proving query rewrites with univalent SQL semantics
In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, pp. 510--524, 2017
Shumo Chu, Daniel Li, Remy Wang, Chenglong Wang, Alvin Cheung, Dan Suciu,
Demonstration of the Cosette Automated SQL Prover
In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, pp. 1591--1594, 2017