ICML-2007 Tutorial on Practical Statistical Relational Learning
Goals and Summary
Statistical relational learning (SRL) focuses on learning when samples are
non-i.i.d. (independent and identically distributed). Domains where data is
non-i.i.d. are widespread; examples include Web search, information
extraction, perception, medical diagnosis/epidemiology, molecular and
systems biology, social science, security, ubiquitous computing, and
others. In all of these domains, modeling dependencies between examples can
greatly improve predictive performance, and lead to better understanding of
the relevant phenomena. However, doing this can be much more complex than
treating examples independently. The goal of this tutorial is to provide
researchers and practitioners with the tools needed to learn from
interdependent examples with no more difficulty than they learn from
isolated examples today.
There have been a number of previous tutorials on SRL. This tutorial
differs from them in a number of ways:
- It focuses on the practical application of statistical relational
techniques to a broad variety of areas. Previous tutorials focused
mainly on surveying and comparing different representations and
approaches to SRL. While this is important for SRL researchers, a
tutorial focusing on the key ideas and their application is likely
to be of interest - and of more immediate use - to a broader audience.
- It covers inference as well as learning, recognizing that
SRL is inseparable from inference.
- It incorporates the latest advances in the area, which has
seen very rapid progress in recent years.
- It uses Markov logic as the foundation, while also covering other
approaches. Markov logic is both general and simple, making it ideally
suited for a tutorial.
- It uses the Alchemy
open source software and programming language. Alchemy has the full range
of capabilities required for SRL, including state-of-the-art learning
and inference algorithms.
Presenter
Pedro Domingos is an Associate Professor of Computer Science and
Engineering at the University of Washington. His research interests are in
artificial intelligence, machine learning and data mining. He received a
PhD in Information and Computer Science from the University of California
at Irvine, and is the author or co-author of over 100 technical publications.
He is a member of the advisory board of JAIR, a member of the editorial
board of the Machine Learning journal, and a co-founder of the
International Machine Learning Society. He was program co-chair of
KDD-2003, and has served on numerous program committees. He has received
several awards, including a Sloan Fellowship, an NSF CAREER Award, a
Fulbright Scholarship, an IBM Faculty Award, and best paper awards at
KDD-98, KDD-99 and PKDD-2005. He has carried out extensive research in the
tutorial area, and served on the program committees of the last three SRL
workshops (at ICML-2006, ICML-2004 and IJCAI-2003). He has taught several
graduate and undergraduate courses in AI and related topics at the
University of Washington, including a course in
the specific area of the tutorial.
Outline
The tutorial will be composed of three parts:
- Foundational areas. The first part will consist of a brief
introduction to each of the four foundational areas of SRL: logical
inference, inductive logic programming, probabilistic inference, and
statistical learning. Obviously, in the short time available no attempt
will be made to comprehensively survey these areas; rather, the focus will
be on providing the key concepts and techniques required for the subsequent
parts. For example, the logical inference part will focus on the basics of
satisfiability testing, and the probabilistic/statistical parts on Markov
networks. The duration of this part will be approximately two hours (half
hour per subtopic).
- Putting the pieces together. The second part will introduce the
key ideas in SRL and survey major approaches, using Markov logic as the
unifying framework. It will present state-of-the-art algorithms for
statistical relational learning and inference, and give an overview of the
Alchemy open-source software. This part will essentially consist of putting
together the pieces introduced in the first part. Its duration will be
approximately an hour.
- Applications. The third and final part will describe how to
efficiently develop state-of-the-art non-i.i.d. applications in various
areas, including: hypertext classification, link-based information
retrieval, information extraction and integration, natural language
processing, social network modeling, computational biology, and ubiquitous
computing. This part will also include practical tips on using SRL, Markov
logic and Alchemy - the kind of information that is seldom found in
research papers, but is key to developing successful applications. The
duration of this part will be approximately an hour.