ICML-2007 Tutorial on Practical Statistical Relational Learning

Goals and Summary

Statistical relational learning (SRL) focuses on learning when samples are non-i.i.d. (independent and identically distributed). Domains where data is non-i.i.d. are widespread; examples include Web search, information extraction, perception, medical diagnosis/epidemiology, molecular and systems biology, social science, security, ubiquitous computing, and others. In all of these domains, modeling dependencies between examples can greatly improve predictive performance, and lead to better understanding of the relevant phenomena. However, doing this can be much more complex than treating examples independently. The goal of this tutorial is to provide researchers and practitioners with the tools needed to learn from interdependent examples with no more difficulty than they learn from isolated examples today. There have been a number of previous tutorials on SRL. This tutorial differs from them in a number of ways:

It focuses on the practical application of statistical relational techniques to a broad variety of areas. Previous tutorials focused mainly on surveying and comparing different representations and approaches to SRL. While this is important for SRL researchers, a tutorial focusing on the key ideas and their application is likely to be of interest - and of more immediate use - to a broader audience.
It covers inference as well as learning, recognizing that SRL is inseparable from inference.
It incorporates the latest advances in the area, which has seen very rapid progress in recent years.
It uses Markov logic as the foundation, while also covering other approaches. Markov logic is both general and simple, making it ideally suited for a tutorial.
It uses the Alchemy open source software and programming language. Alchemy has the full range of capabilities required for SRL, including state-of-the-art learning and inference algorithms.

Presenter

Pedro Domingos is an Associate Professor of Computer Science and Engineering at the University of Washington. His research interests are in artificial intelligence, machine learning and data mining. He received a PhD in Information and Computer Science from the University of California at Irvine, and is the author or co-author of over 100 technical publications. He is a member of the advisory board of JAIR, a member of the editorial board of the Machine Learning journal, and a co-founder of the International Machine Learning Society. He was program co-chair of KDD-2003, and has served on numerous program committees. He has received several awards, including a Sloan Fellowship, an NSF CAREER Award, a Fulbright Scholarship, an IBM Faculty Award, and best paper awards at KDD-98, KDD-99 and PKDD-2005. He has carried out extensive research in the tutorial area, and served on the program committees of the last three SRL workshops (at ICML-2006, ICML-2004 and IJCAI-2003). He has taught several graduate and undergraduate courses in AI and related topics at the University of Washington, including a course in the specific area of the tutorial.

Outline

The tutorial will be composed of three parts:

Foundational areas. The first part will consist of a brief introduction to each of the four foundational areas of SRL: logical inference, inductive logic programming, probabilistic inference, and statistical learning. Obviously, in the short time available no attempt will be made to comprehensively survey these areas; rather, the focus will be on providing the key concepts and techniques required for the subsequent parts. For example, the logical inference part will focus on the basics of satisfiability testing, and the probabilistic/statistical parts on Markov networks. The duration of this part will be approximately two hours (half hour per subtopic).
Putting the pieces together. The second part will introduce the key ideas in SRL and survey major approaches, using Markov logic as the unifying framework. It will present state-of-the-art algorithms for statistical relational learning and inference, and give an overview of the Alchemy open-source software. This part will essentially consist of putting together the pieces introduced in the first part. Its duration will be approximately an hour.
Applications. The third and final part will describe how to efficiently develop state-of-the-art non-i.i.d. applications in various areas, including: hypertext classification, link-based information retrieval, information extraction and integration, natural language processing, social network modeling, computational biology, and ubiquitous computing. This part will also include practical tips on using SRL, Markov logic and Alchemy - the kind of information that is seldom found in research papers, but is key to developing successful applications. The duration of this part will be approximately an hour.

ICML-2007 Tutorial on Practical Statistical Relational Learning

Goals and Summary

Presenter

Outline

Slides