NSF III:Small 1907997: Optimal Query Processing Meets Information Theory: from Proofs to Algorithms
Acknowledgment:
This material is based upon work supported by the National Science Foundation under Grant
No. 1907997.
Disclaimer:
Any opinions, findings, and conclusions or recommendations expressed in this material are
those of the author(s) and do not necessarily reflect the views of the National Science
Foundation.
NSF Proposal Number: NSF III-1907997
The demand to analyze large datasets increases at a rate faster than our ability to compute meaningful insights from the data. Despite the advances in query processing, query optimization, and novel hardware, efficient query evaluation remains a major challenge for the database community to date. This project will explore news ways to design query processing algorithms, by using techniques from information theory. The field of information theory has been developed over the last seven decades, and has found many applications, ranging from communications, to machine learning, but it has not been used yet in a significant way to support query processing. This project transfers techniques from information theory to the task of processing massive amounts of data.
The project introduces a new paradigm for query evaluation, called "from proofs to algorithms", which maps an information-theoretic proof of a query's upper bounds into optimal query evaluation algorithms. The project pursues four thrusts. In the first thrust it uses information theory to derive new bounds on the query answer based on database statistics. The second thrust develops a mapping from the information theoretic inequalities to relational operators, and from the complete proof of the query size to a complete query plan with provably optimal runtime. The third thrust extends these techniques to finer grained statistics, including sketches, distinct values, heavy hitters. Finally, the fourth thrust extends the algorithm to queries with projections, aggregates, and group-by's.
Principal Investigator: Dan Suciu