MystiQ

Description

This is the research page for MystiQ. The MystiQ prototype downloadable from here .

The MystiQ prototpye was redeveloped during 2006 using TGIF funds. If you'd like to obtain a copy, please send email to suciu@cs.washington.edu

MystiQ is a system that uses a probabilistic data model to find answers in large numbers of data sources exhibiting various kinds of imprecisions. Examples of imprecisions: the same data item may have different representation in different sources; the schema alignments needed by a query system are imperfect and noisy; different sources may contain contradictory information, and, in particular, their combined data may violate some global integrity constraints; fuzzy matches between objects from different sources may return false positives or negatives. Even in such environment, users sometimes want to ask complex, structurally rich queries, using query constructs typically found in SQL queries: joins, subqueries, existential/universal quantifiers, aggregate and group-by queries: for example scientists may use such queries to query multiple scientific data sources, or a law enforcement agency may use it in order to find rare associations from multiple data sources. If standard query semantics were applied to such queries, all but the most trivial queries will return an empty answer. The goal of MystiQ is to develop efficient query processing techniques for fining answers in large probabilistic databases.

A tutorial on probabilistic databases is here ; accompanying bibliography is here ;


Supported by:

TGIF Fund

Members

Nilesh Dalvi,
Chris Re,
Dan Suciu,
Michael Cafarella,
Oren Etzioni,
Nodira Khoussainova,
Magdalena Balazinska,
Jihad Boulos,
Bhushan Mandhani,
Shobhit Mathur,
Gerome Miklau,

Publications

Nilesh Dalvi, Chris Re, Dan Suciu,
Queries and Materialized Views on Probabilistic Databases
Unpublished ,2009
Note: to appear in JCSS
Nilesh Dalvi, Chris Re, Dan Suciu,
Probabilistic Databases: Diamonds in the Dirt (Extended Version)
Unpublished ,2009
Nilesh Dalvi, Chris Re, Dan Suciu,
Probabilistic Databases: Diamonds in the Dirt
Published in CACM, vol. 52 , no. 7 , pp. 86-96 , 2009
Chris Re, Nilesh Dalvi, Dan Suciu,
Efficient Top-k Query Evaluation on Probabilistic Data
In ICDE, 2007
Nilesh Dalvi, Dan Suciu,
Management of Probabilistic Data: Foundations and Challenges
In PODS, pp. 1-12, 2007
Note: (invited talk)
Michael Cafarella, Dan Suciu, Oren Etzioni,
Navigating Extracted Data with Schema Discovery
In WebDB, 2007
Michael Cafarella, Chris Re, Dan Suciu, Oren Etzioni,
Structured Querying of Web Text: A Technical Challenge
In CIDR, pp. 225-234, 2007
Nilesh Dalvi, Dan Suciu,
Efficient Query Evaluation on Probabilistic Databases
Published in VLDBJ, vol. 16 , no. 4 , pp. 523-544 , 2007
Chris Re, Dan Suciu,
Efficient Evaluation of HAVING Queries on a Probabilistic Database
In Proceedings of DBPL, 2007
Nilesh Dalvi, Chris Re, Dan Suciu,
Query Evaluation on Probabilistic Databases
Published in IEEE Data Engineering Bulletin, vol. 29 , no. 1 , pp. 25-31 , 2006
Nodira Khoussainova, Magdalena Balazinska, Dan Suciu,
Towards correcting input data errors probabilistically using integrity constraints
In MobiDB, pp. 43-50, 2006
Jihad Boulos, Nilesh Dalvi, Bhushan Mandhani, Shobhit Mathur, Chris Re, Dan Suciu,
MYSTIQ: A system for finding more answers by using probabilities
In SIGMOD, 2005
Note: system demo
Nilesh Dalvi, Dan Suciu,
Answering Queries from Statistics and Probabilistic Views
In VLDB, 2005
Nilesh Dalvi, Gerome Miklau, Dan Suciu,
Asymptotic Conditional Probabilities for Conjunctive Queries
In ICDT, 2005
Nilesh Dalvi, Dan Suciu,
Efficient Query Evaluation on Probabilistic Databases
In VLDB, 2004
Nilesh Dalvi, Dan Suciu,
Efficient Query Evaluation on Probabilistic Databases (extended version)
University of Washington, Technical Report, 04-03-04, 2004
Note: available from www.cs.washington.edu