As a large-scale machine learning researcher, I like to build real things that can be used in production. I build some very widely used machine learning and systems packages. I am initiator DMLC group, to make large-scale machine learning widely available to the community. Here I list projects that I created and heavily involved in. Some of these projects are also great example of large-scale machine learning research.

Most of these projects are actively developed and maintained as open source package. I am honored to work with many outstanding collaborators on these projects.

TVM: An End to End IR Stack for Deploying Deep Learning Workloads on Hardware Platforms

TVM is a Tensor intermediate representation(IR) stack for deep learning systems. It is designed to close the gap between the productivity-focused deep learning frameworks, and the performance- and efficiency-focused hardware backends. TVM works with deep learning frameworks to provide end to end compilation to different backends.

XGBoost: Scalable Tree Boosting

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting(also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. XGBoost is used to many machine learning challenges and has been deployed in production environments. You can use it in any of your favorite language including python, R, Julia, java, scala. The distributed version can be easily deployed on Hadoop, MPI, SGE and more recently DataFlow frameworks(e.g. Flink and Spark)

Backgroun Story: I created XGBoost when doing research related to variants of tree boosting and cannot find a fast tree boosting package for my experiments. It then become part of my research in building scalable learning systems.

MXNet: Efficient and Flexible Deep Learning

MXNet stands for mix and maximize. The idea is to combine the power of declartive programming together with imperative programming. In its core, a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. The library is portable and lightweight, and it scales to multiple GPUs and multiple machines.

Background Story: MXNet starts as a combination of many deep learning project: CXXNet which I created together with Bing Xu, Minverva from NYU and Purine from NUS. We are later joined by Chiyuan who is the creator of Mocha.jl. This is a truely collaborative project that combines the wisdom from several deep learning projects and researchers from many different inistitute.

MShadow: A Unified CPU/GPU Matrix Template Library in C++/CUDA

MShadow is an efficient, device invariant and simple tensor library for machine learning project that aims for both simplicity and performance. Basically, it allows you to write expressions and translate them to GPU code during compilation. This is the backbone library behind many deep learning platforms, including MXNet, CXXNet and Apache Singa.

Background Story: I started to write CUDA code since my undergrad when doing my first deep learning project on restricted boltzmann machine. As much as I liked hacking GPU, I hated the repeative effort on writing similar code over again. Then we build this libary, with the power of template programming of C++, we are finally able to write weight += learning_rate * gradient :)

DMLC-Core: Distributed Machine Learning Common Codebase

When you write distributed machine learning. There are many common non-trivial utility functions that you need. Load the data from distributed filesystem in a sharded way; parse input file fast enough so your bigdata program do not load too slowly; launch jobs on various environment; define and check parameters. dmlc-core is the C++ library that solves all these common pains in distributed machine learning.

Background Story: It started when Mu Li and I complained that we are doing the repeative jobs when building distributed machine learning systems. “Why not work together to build these common parts together?” So we get support from our advisors and created the library that backs many distributed learning systems, including MXNet and XGBoost.

Rabit: Reliable Allreduce and Broadcast Interface

A light weight library that provides a fault tolerant interface of Allreduce and Broadcast for portable , scalable and reliable distributed machine learning programs. Rabit programs can run on various platforms such as Hadoop, MPI and Sun Grid Engine. It backs the communication behind the Distributed XGBoost. The newest version can also be used within DataFlow frameworks such as Flink and Spark.

Background Story: Rabit starts as a course project when I was taking on System Course at UW when we want to build something related to machine learning. The original goal is simply to beat other frameworks. Now it is to be able to port and run on all the frameworks :)

SVDFeature: A Scalable and Flexible toolkit for collaborative filtering

This is a project I created when I was a M.S. at Shanghai Jiaotong University. This project provides an abstract framework to build new matrix factorization variants simply by defining features. This is the project that helped us to win two KDD Cups.