CIKM-2013 Tutorial on Statistical Relational Learning
Goals and Summary
Statistical relational learning (SRL) focuses on learning when samples are
non-i.i.d. (independent and identically distributed). Domains where data is
non-i.i.d. are widespread; examples include Web search, information
extraction, perception, medical diagnosis/epidemiology, molecular and
systems biology, social science, security, ubiquitous computing, and
others. In all of these domains, modeling dependencies between examples can
greatly improve predictive performance, and lead to better understanding of
the relevant phenomena. However, doing this can be much more complex than
treating examples independently. The goal of this tutorial is to provide
researchers and practitioners with the tools needed to learn from
interdependent examples with no more difficulty than they learn from
isolated examples today.
There have been a number of previous tutorials on SRL. This tutorial
differs from them in a number of ways:
- It focuses on the practical application of statistical relational
techniques to a broad variety of areas. Previous tutorials focused
mainly on surveying and comparing different representations and
approaches to SRL. While this is important for SRL researchers, a
tutorial focusing on the key ideas and their application is likely
to be of interest - and of more immediate use - to a broader audience.
- It covers inference as well as learning, recognizing that
SRL is inseparable from inference.
- It incorporates the latest advances in the area, which has
seen very rapid progress in recent years.
- It uses Markov logic as the foundation, while also covering other
approaches. Markov logic is both general and simple, making it ideally
suited for a tutorial.
- It uses the Alchemy
open source software and programming language. Alchemy has the full range
of capabilities required for SRL, including state-of-the-art learning
and inference algorithms.
Presenter
Pedro Domingos
is Professor of Computer Science and Engineering at the University of
Washington. His research interests are in artificial intelligence, machine
learning and data mining. He received a PhD in Information and Computer
Science from the University of California at Irvine, and is the author or
co-author of over 200 technical publications. He is member of the
editorial board of the Machine Learning journal, co-founder of the
International Machine Learning Society, and past associate editor of
JAIR. He was program co-chair of KDD-2003 and SRL-2009, and has served on
numerous program committees. He is a AAAI Fellow, and has received several
awards, including a Sloan Fellowship, an NSF CAREER Award, a Fulbright
Scholarship, an IBM Faculty Award, and best paper awards at several leading
conferences.
He has carried out extensive research in the tutorial area, and served on
the program committees of most SRL and statistical relational AI workshops
to date. He has taught several graduate and undergraduate courses in AI and
related topics, including courses at
Carnegie Mellon University and the
University of Washington in the specific area of the tutorial.
Outline
The tutorial will be composed of three parts:
- Foundational areas. The first part will consist of a brief
introduction to each of the four foundational areas of SRL: logical
inference, inductive logic programming, probabilistic inference, and
statistical learning. Obviously, in the short time available no attempt
will be made to comprehensively survey these areas; rather, the focus will
be on providing the key concepts and techniques required for the subsequent
parts. For example, the logical inference part will focus on the basics of
satisfiability testing, and the probabilistic/statistical parts on Markov
networks. The duration of this part will be approximately two hours (half
hour per subtopic).
- Putting the pieces together. The second part will introduce the
key ideas in SRL and survey major approaches, using Markov logic as the
unifying framework. It will present state-of-the-art algorithms for
statistical relational learning and inference, and give an overview of the
Alchemy open-source software. This part will essentially consist of putting
together the pieces introduced in the first part. Its duration will be
approximately an hour.
- Applications. The third and final part will describe how to
efficiently develop state-of-the-art non-i.i.d. applications in various
areas, including: hypertext classification, link-based information
retrieval, information extraction and integration, natural language
processing, social network modeling, computational biology, and ubiquitous
computing. This part will also include practical tips on using SRL, Markov
logic and Alchemy - the kind of information that is seldom found in
research papers, but is key to developing successful applications. The
duration of this part will be approximately an hour.