This page mentions a few recent projects in our group.
GraphLab: Abstraction and System for Parallel Machine Learning
Many efficient machine learning algorithms cannot be naturally decomposed as computations over independent subproblem, and are thus inefficient when implemented under a Map-Reduce abstraction. GraphLab addresses this challenge by providing a natural abstraction for asynchronous, iterative, parallel algorithms over sparse (graph-represented) data. The approach has been applied to a wide range of machine learning tasks, including inference and learning in graphical models, collaborative filtering and matrix factorization, and clustering.
- The first GraphLab paper was presented at UAI 2010: GraphLab: A New Framework for Parallel Machine Learning.
- Code for GraphLab is available at www.graphlab.org.
- PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs.
- GraphChi: Large-Scale Graph computation on Just a PC.
Taming Information Overload
We are overwhelmed with the amounts of information available on the web, with the volumes of opinions expressed in our political process, and with the numbers of scientific papers published every year. Our goal is to discover new ways to connect seemly dispare pieces of information, and to quickly learn a user's preference, thus providing personalized recommendations. Recent results include:
- Connecting the Dots Between News Articles
- Beyond Keyword Search: Discovering Relevant Scientific Literature
Parallel Machine Learning Algorithms
As our computer architectures have transitioned to multicore and the cloud, we must develop parallel machine learning algorithms that can exploit such hardware. Our recent results include:
- Parallel Coordinate Descent for L1-Regularized Loss Minimization
- Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees
- Residual Splash for Optimally Parallelizing Belief Propagation
Submodular Function Optimization, Optimizing Information Gathering, Sensor Placement, and Experiment Design
We have showed that applications in Civil Engineering, environmental monitoring, and information gathering on the web require us to find the most relevant information at the minimal cost. Although these problems are often solved by heuristics without theoretical guarantees, we demonstrated that they can often be formulated as submodular function optimization problems that are amenable to efficient algorithms with theoretical guarantees. Some of our results in this are include:
- Optimizing Sensing: From Water to the Web
- Robust Submodular Observation Selection
- Cost-effective Outbreak Detection in Networks
- Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies
Representing Probability Distributions over Rankings and Permutations
Permutation problems are ubiquitous is ranking, voting, and data association. However, representing probability distributions over such problems is very challenging, because there are n! possibilities, and standard techniques, such as graphical models, are inadequate to represent structure in these domains. We have developed the notion of riffle independence in rankings, and algorithms using Fourier representations over permutations to tackle these challenging tasks. Our results include:
- Efficient Probabilistic Inference with Partial Ranking Queries
- Learning Hierarchical Riffle Independent Groupings from Rankings
- Fourier Theoretic Probabilistic Inference over Permutations
Probabilistic Graphical Models
We have also developed algorithms for a number of core machine learning problems, especially in the are of graphical models. A core theme in this work is the development of methods with theoretical guarantees that address significant, real-world challenges. Some results include:
- Kernel Belief Propagation
- Inference with Multivariate Heavy-Tails in Linear Models
- Optimal Value of Information in Graphical Models
- Efficient Principled Learning of Thin Junction Trees
- Max-Margin Markov Networks
Distributed Algorithms for Inference in Sensor Networks
Data in a sensor network is highly distributed throughout the environment. We thus need to develop techniques that can fuse the data and make predictions in a distributed fashion, while being robust to communication and node failures. Our results here include:
- A Robust Architecture for Distributed Inference in Sensor Networks
- Model-Driven Data Acquisition in Sensor Networks
- Distributed Regression: an Efficient Framework for Modeling Sensor Network Data
Multiagent Reinforcement Learning and Factored MDPs
My thesis work demonstrated how to exploit problem-specific structure to scale up MDPs and enable the efficient coordination of multiple agents. These results include:
- Generalizing Plans to New Environments in Relational MDPs
- Efficient Solution Algorithms for Factored MDPs
- Coordinated Reinforcement Learning