This page mentions a few recent projects in our group.

## GraphLab: Abstraction and System for Parallel Machine Learning

Many efficient machine learning algorithms cannot be naturally decomposed as computations over independent subproblem, and are thus inefficient when implemented under a Map-Reduce abstraction. GraphLab addresses this challenge by providing a natural abstraction for asynchronous, iterative, parallel algorithms over sparse (graph-represented) data. The approach has been applied to a wide range of machine learning tasks, including inference and learning in graphical models, collaborative filtering and matrix factorization, and clustering.

- The first GraphLab paper was presented at UAI 2010: GraphLab: A New Framework for Parallel Machine Learning.
- Code for GraphLab is available at www.graphlab.org.
- PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs.
- GraphChi: Large-Scale Graph computation on Just a PC.

## Taming Information Overload

We are overwhelmed with the amounts of information available on the web, with the volumes of opinions expressed in our political process, and with the numbers of scientific papers published every year. Our goal is to discover new ways to connect seemly dispare pieces of information, and to quickly learn a user's preference, thus providing personalized recommendations. Recent results include:

- Connecting the Dots Between News Articles
- Beyond Keyword Search: Discovering Relevant Scientific Literature

## Parallel Machine Learning Algorithms

As our computer architectures have transitioned to multicore and the cloud, we must develop parallel machine learning algorithms that can exploit such hardware. Our recent results include:

- Parallel Coordinate Descent for L1-Regularized Loss Minimization
- Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees
- Residual Splash for Optimally Parallelizing Belief Propagation

## Submodular Function Optimization, Optimizing Information Gathering, Sensor Placement, and Experiment Design

We have showed that applications in Civil Engineering,
environmental monitoring, and information gathering on the web
require us to find the most relevant information at the minimal
cost. Although these problems are often solved by heuristics
without theoretical guarantees, we demonstrated that they can
often be formulated as *submodular function optimization
problems* that are amenable to efficient algorithms with
theoretical guarantees. Some of our results in this are
include:

- Optimizing Sensing: From Water to the Web
- Robust Submodular Observation Selection
- Cost-effective Outbreak Detection in Networks
- Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies

## Representing Probability Distributions over Rankings and Permutations

Permutation problems are ubiquitous is ranking, voting, and
data association. However, representing probability
distributions over such problems is very challenging, because
there are n! possibilities, and standard techniques, such as
graphical models, are inadequate to represent structure in these
domains. We have developed the notion of *riffle
independence* in rankings, and algorithms using Fourier
representations over permutations to tackle these challenging
tasks. Our results include:

- Efficient Probabilistic Inference with Partial Ranking Queries
- Learning Hierarchical Riffle Independent Groupings from Rankings
- Fourier Theoretic Probabilistic Inference over Permutations

## Probabilistic Graphical Models

We have also developed algorithms for a number of core machine learning problems, especially in the are of graphical models. A core theme in this work is the development of methods with theoretical guarantees that address significant, real-world challenges. Some results include:

- Kernel Belief Propagation
- Inference with Multivariate Heavy-Tails in Linear Models
- Optimal Value of Information in Graphical Models
- Efficient Principled Learning of Thin Junction Trees
- Max-Margin Markov Networks

## Distributed Algorithms for Inference in Sensor Networks

Data in a sensor network is highly distributed throughout the environment. We thus need to develop techniques that can fuse the data and make predictions in a distributed fashion, while being robust to communication and node failures. Our results here include:

- A Robust Architecture for Distributed Inference in Sensor Networks
- Model-Driven Data Acquisition in Sensor Networks
- Distributed Regression: an Efficient Framework for Modeling Sensor Network Data

## Multiagent Reinforcement Learning and Factored MDPs

My thesis work demonstrated how to exploit problem-specific structure to scale up MDPs and enable the efficient coordination of multiple agents. These results include:

- Generalizing Plans to New Environments in Relational MDPs
- Efficient Solution Algorithms for Factored MDPs
- Coordinated Reinforcement Learning