MystiQ
Description
This is the research page for MystiQ. The MystiQ prototype
downloadable from
here
.
The MystiQ prototpye was redeveloped during 2006 using TGIF funds.
If you'd like to obtain a copy, please send email to
suciu@cs.washington.edu
MystiQ is a system that uses a probabilistic data model to find
answers in large numbers of data sources exhibiting various kinds of
imprecisions. Examples of imprecisions: the same data item may have
different representation in different sources; the schema alignments
needed by a query system are imperfect and noisy; different sources
may contain contradictory information, and, in particular, their
combined data may violate some global integrity constraints; fuzzy
matches between objects from different sources may return false
positives or negatives. Even in such environment, users sometimes
want to ask complex, structurally rich queries, using query constructs
typically found in SQL queries: joins, subqueries,
existential/universal quantifiers, aggregate and group-by queries: for
example scientists may use such queries to query multiple scientific
data sources, or a law enforcement agency may use it in order to find
rare associations from multiple data sources. If standard query
semantics were applied to such queries, all but the most trivial
queries will return an empty answer. The goal of MystiQ is to develop
efficient query processing techniques for fining answers in large
probabilistic databases.
A tutorial on probabilistic databases
is
here
;
accompanying bibliography
is
here
;
Supported by:
TGIF Fund
Members
Nilesh Dalvi,
Chris Re,
Dan Suciu,
Michael Cafarella,
Oren Etzioni,
Nodira Khoussainova,
Magdalena Balazinska,
Jihad Boulos,
Bhushan Mandhani,
Shobhit Mathur,
Gerome Miklau,
Publications
Nilesh Dalvi, Chris Re, Dan Suciu,
Queries and Materialized Views on Probabilistic Databases
Unpublished ,2009
Note: to appear in JCSS
Nilesh Dalvi, Chris Re, Dan Suciu,
Probabilistic Databases: Diamonds in the Dirt (Extended Version)
Unpublished ,2009
Nilesh Dalvi, Chris Re, Dan Suciu,
Probabilistic Databases: Diamonds in the Dirt
Published in CACM, vol. 52 , no. 7 , pp. 86-96 , 2009
Chris Re, Nilesh Dalvi, Dan Suciu,
Efficient Top-k Query Evaluation on Probabilistic Data
In ICDE, 2007
Nilesh Dalvi, Dan Suciu,
Management of Probabilistic Data: Foundations and Challenges
In PODS, pp. 1-12, 2007
Note: (invited talk)
Michael Cafarella, Dan Suciu, Oren Etzioni,
Navigating Extracted Data with Schema Discovery
In WebDB, 2007
Michael Cafarella, Chris Re, Dan Suciu, Oren Etzioni,
Structured Querying of Web Text: A Technical Challenge
In CIDR, pp. 225-234, 2007
Nilesh Dalvi, Dan Suciu,
Efficient Query Evaluation on Probabilistic Databases
Published in VLDBJ, vol. 16 , no. 4 , pp. 523-544 , 2007
Chris Re, Dan Suciu,
Efficient Evaluation of HAVING Queries on a Probabilistic Database
In Proceedings of DBPL, 2007
Nilesh Dalvi, Chris Re, Dan Suciu,
Query Evaluation on Probabilistic Databases
Published in IEEE Data Engineering Bulletin, vol. 29 , no. 1 , pp. 25-31 , 2006
Nodira Khoussainova, Magdalena Balazinska, Dan Suciu,
Towards correcting input data errors probabilistically using integrity constraints
In MobiDB, pp. 43-50, 2006
Jihad Boulos, Nilesh Dalvi, Bhushan Mandhani, Shobhit Mathur, Chris Re, Dan Suciu,
MYSTIQ: A system for finding more answers by using probabilities
In SIGMOD, 2005
Note: system demo
Nilesh Dalvi, Dan Suciu,
Answering Queries from Statistics and Probabilistic Views
In VLDB, 2005
Nilesh Dalvi, Gerome Miklau, Dan Suciu,
Asymptotic Conditional Probabilities for Conjunctive Queries
In ICDT, 2005
Nilesh Dalvi, Dan Suciu,
Efficient Query Evaluation on Probabilistic Databases
In VLDB, 2004
Nilesh Dalvi, Dan Suciu,
Efficient Query Evaluation on Probabilistic Databases (extended version)
University of Washington,
Technical Report, 04-03-04, 2004
Note: available from www.cs.washington.edu