As of August, 2019, I'm a Director at Facebook AI in Menlo Park,
I was the CEO of Megagon Labs from November 2015 until December 2018. Prior to Megagon, I headed the Structured Data Group of Google Research in Mountain View, California for a decade (here are a few thoughts about that decade). I joined Google in 2005 with the acquisition of my company, Transformic. Prior to that, I was a professor of Computer Science at the University of Washington, where I founded the UW CSE Database Group in 1998. You can follow me on Twitter for more (un)frequent updates. In the past, I used to blog and maybe I'll return to it some day.
My Google Scholar author page
My most recent thoughts (March, 2019) are about the
In 2011 I published a book about coffee The Infinite Emotions of Coffee. With tales, photos and data visualizations from 30 countries on 6 continents, the book will introduce you to the amazing world of coffee today. I recommend reading it while drinking a macchiatone, a drink known to be the perfect combination of foamed milk and espresso. (In the meantime, I've changed my preferences and drink either espresso or black coffee brewed with an aeropress.)
The main goals of my work at Google are to make data management tools collaborative and much easier to use, and to leverage the incredible collections of structured data on the Web.
My group is responsible for Google Fusion Tables, a service for managing data in the cloud that focuses on ease of use, collaboration and data integration. Fusion Tables enables users to upload spreadsheets, CSV and KML files and share them with collaborators or with the public. You can easily integrate data from multiple sources (and organizations) and use a collection of visualizations to look at your data. In particular, Fusion Tables is deeply integrated with Google Maps, making it easy to visualize large geographic data sets. To facilitate collaboration, users can conduct fine-grained discussions on the data. You can see some examples of how Fusion Tables is being used. You can interact with Fusion Tables through our UI or our API.
Some other notable projects from my group at Google include our Deep-web crawl and WebTables, the first system to ever collect offer search over the collection of HTML tables on the Web. WebTables was incorporated into Google search and the original 2008 paper on WebTables received the VLDB 2018 10-year Test of Time Award.
Both of these projects are examples of the broader area of research on Dataspace Systems, which provide pay-as-you-go data management based on best-effort services.
In the past I have worked extensively on data integration (and a book
on the topic is in preparation), as well as personal information
management, XML, query optimization, peer-data management systems, knowledge representation
(relevance reasoning and combining Horn rules with Description
Logics). In general, I am
very interested in the combination of techniques from Artificial
Intelligence and Data Management.
Awards and FundingI was elected Fellow of the ACM in 2006.
The 2008 WebTables paper received the VLDB 2018 10-year Best Paper Award.
The 2003 ICDE paper on Piazza received the ICDE 2013 Test-of-Time Award.
The Information Manifold paper received the VLDB 2006 10-year Best Paper Award. In honor of the occasion, the original co-authors wrote a 10-year retrospective on data integration research and industry.
In 2000 I received the Presidential Early Career Award for Scientists and Engineers (PECASE), and I was awarded a Sloan Fellowship (1999-2000).
The award I'm most proud of is one that isn't really mine -- my former Ph.D student, AnHai Doan, received the 2003 ACM Distinguished Dissertation Award.
As a faculty member I was funded by
grants from the National Science Foundation (PECASE,
IDM, ITR, SEIII), DARPA (CALO), and the National Institute of
I have also received research gifts from Microsoft
Research, NEC Corporation, NTT Corporation of Japan, and Ford Motor
My Entrepreneurial ActivitiesI am also an entrepreneur and regularly consult for various companies.
In 2004 I founded Transformic, a company that built search engines for the deep web, i.e., the content on the web that is stored in databases and hides behind forms. Its first search engine offered access to hundreds of classifieds sites on the web. The following paper describes the problems of schema heterogeneity in today's commercial world and some of the recent work addressing it. Transformic was acquired by Google in September, 2005.
In 1999 I co-founded Nimble Technology, one of the first companies in the space of Enterprise Information Integration. Nimble built tools for querying and integrating disparate heterogeneous data sources. Nimble was acquired by Actuate in August, 2003. The following paper describes some of the successes, challenges and controversies surrounding the EII industry today (all presented in an industrial session at SIGMOD 2005).
I've also consulted for several companies, including SAP, Microsoft, a few startups and expert witness cases.