Paul G. Allen School of Computer Science and Engineering, University of Washington
3800 E Stevens Way NE, Seattle WA 98195
xiangyun (at) cs (dot) washington (dot) eduCV
I obtained my bachelor degree from National University of Singapore (NUS). Currently I am a PhD candidate at University of Washington under Prof. Dieter Fox. My interests are in Vision and Robotics.
[paper] [website] [code] Learning high-level navigation behaviors has important implications: it enables robots to build compact visual memory for repeating demonstrations and to build sparse topological maps for planning in novel environments. Existing approaches only learn discrete, short-horizon behaviors. These standalone behaviors usually assume a discrete action space with simple robot dynamics, thus they cannot capture the intricacy and complexity of real-world trajectories. To this end, we propose Composable Behavior Embedding (CBE), a continuous behavior representation for long-horizon visual navigation. CBE is learned in an end-to-end fashion; it effectively captures path geometry and is robust to unseen obstacles. We show that CBE can be used to performing memory-efficient path following and topological mapping, saving more than an order of magnitude of memory than behavior-less approaches.
[paper] [website] [code] Visual topological navigation has been revitalized recently thanks to the advancement of deep learning that substantially improves robot perception. However, the scalability and reliability issue remain challenging due to the complexity and ambiguity of real world images and mechanical constraints of real robots. We present an intuitive solution to show that by accurately measuring the capability of a local controller, large-scale visual topological navigation can be achieved while being scalable and robust. Our approach achieves state-of-the-art results in trajectory following and planning in large-scale environments. It also generalizes well to real robots and new environments without finetuning.
[paper] [website] [code] End-to-end learning for autonomous navigation has received substantial attention recently as a promising method for reducing modeling error. However, its data complexity, especially around generalization to unseen environments, is high. We introduce a novel image-based autonomous navigation technique that leverages in policy structure using the Riemannian Motion Policy (RMP) framework for deep learning of vehicular control. We design a deep neural network to predict control point RMPs of the vehicle from visual images, from which the optimal control commands can be computed analytically. We show that our network trained in the Gibson environment can be used for indoor obstacle avoidance and navigation on a real RC car, and our RMP representation generalizes better to unseen environments than predicting local geometry or predicting control commands directly.
A neural policy enabling a 7-DoF robotic arm (the PR2 robot) to hit a high-speed ball (>8m/s) thrown at it, learned without supervision.
Policy is first learned through Reinforcement Learning in the MuJoCo simulator and later transferred to the real robot.
Real robot uses 30Hz depth images to estimate ball states. From the ball is thrown till the ball hits the robot, the robot only has about 0.3 seconds to react.
Surprisingly, the robot also learns to act like a Jedi!
Joint work with Boling Yang and Felix Leeb.
Augmented reality (AR) systems currently overlay information based on location and general compass orientation. However, interacting virtually with such information in the environment through AR is still in its infancy. With the proliferation of mobile phones and wearables, we foresee applications that will adopt AR as one of their key modes for user interactions.
The key challenge to enable human-object interaction is how to reliably detect and localize an object, which requires instance-level recognition. Existing image-classification techniques have limited usage because they cannot generalize to new classes and they are not able to distinguish between two objects of the same class.
I am currently developing new techniques that enable users to "tag" an object and later retrieve such object in a new image.
Here is a demo on how it works. Note that this is only for demonstrative purpose. I am improving the underlying detection algorithm to make it work more robustly in the wild.
SkyStitch is a multi-UAV based video surveillance system. Compared with a traditional video surveillance system that captures videos with a single camera, SkyStitch removes the constraints of field of view and resolution by deploying multiple UAVs to cover the target area. Videos captured from all UAVs are streamed to the ground station, which are stitched together to provide a panoramic view of the area in real time.
This is the biggest project I have ever accomplished. It consists of 16k lines of high-optimized code running on hetrogeneous processors (x86 & ARM CPU, desktop & mobile GPU, microcontroller on the flight controller, etc.). Moreover, there is tons of mechanical work to do. We built 4 generations of prototypes and had to deal with numerous number of crashes.
More information can be found on the project webpage.
This robotic segway is capable of balancing with two wheels. It can be controlled wirelessly from a PC or by a PS3 game controller. I started this project back in 2011 when I was in my first year. I built everything from scratch, including all the machining work (My father lent me a hand).
The experience has been very rewarding. The state estmation algorithm inspired me to adopt it for homography estimation in my SkyStitch project.
More information can be found on the project webpage.
There are many interesting aspects in this project, including how to convert Pascal source code into C and how to make it compatible with WebGL and non-blocking IO. I wrote a report to discuss some related technical details.
Firefox 17.0: https://ftp.mozilla.org/pub/firefox/releases/17.0b6/
Scaling Local Control to Large Scale Topological Navigation Xiangyun Meng, Nathan Ratliff, Yu Xiang and Dieter Fox. ICRA 2020.
Neural Autonomous Navigation with Riemannian Motion Policy Xiangyun Meng, Nathan Ratliff, Yu Xiang and Dieter Fox. ICRA 2019.
SkyStitch: a Cooperative Multi-UAV-based Realtime Video Surveillance System with Stitching Xiangyun Meng, Wei Wang, and Ben Leong. Proceedings of the ACM Multimedia Conference 2015 (ACMMM 2015), Brisbane, Australia. Oct 2015.