Paul G. Allen School of Computer Science and Engineering, University of Washington
3800 E Stevens Way NE, Seattle WA 98195
xiangyun (at) cs (dot) washington (dot) eduCV
I obtained my bachelor degree from National University of Singapore (NUS). Currently I am a PhD candidate at University of Washington under Prof. Dieter Fox. My interests are in Vision and Robotics.
[paper] [website] [code] 6D grasping in cluttered scenes is a longstanding problem in robotic manipulation. Open-loop manipulation pipelines may fail due to inaccurate state estimation, while most end-to-end grasping methods have not yet scaled to complex scenes with obstacles. In this work, we propose a new method for end-to-end learning of 6D grasping in cluttered scenes. Our hierarchical framework learns collision-free target-driven grasping based on partial point cloud observations. We learn an embedding space to encode expert grasping plans during training and a variational autoencoder to sample diverse grasping trajectories at test time. Furthermore, we train a critic network for plan selection and an option classifier for switching to an instance grasping policy through hierarchical reinforcement learning. We evaluate our method and compare against several baselines in simulation, as well as demonstrate that our latent planning can generalize to real-world cluttered-scene grasping tasks.
[paper] [website] [code] Producing dense and accurate traversability maps is crucial for autonomous off-road navigation. In this paper, we focus on the problem of classifying terrains into 4 cost classes (free, low-cost, medium-cost, obstacle) for traversability assessment. This requires a robot to reason about both semantics (what objects are present?) and geometric properties (where are the objects located?) of the environment. To achieve this goal, we develop a novel Bird’s Eye View Network (BEVNet), a deep neural network that directly predicts a local map encoding terrain classes from sparse LiDAR inputs. BEVNet processes both geometric and semantic information in a temporally consistent fashion. More importantly, it uses learned prior and history to predict terrain classes in unseen space and into the future, allowing a robot to better appraise its situation. We quantitatively evaluate BEVNet on both on-road and off-road scenarios and show that it outperforms a variety of strong baselines.
[paper] [website] [code] Learning high-level navigation behaviors has important implications: it enables robots to build compact visual memory for repeating demonstrations and to build sparse topological maps for planning in novel environments. Existing approaches only learn discrete, short-horizon behaviors. These standalone behaviors usually assume a discrete action space with simple robot dynamics, thus they cannot capture the intricacy and complexity of real-world trajectories. To this end, we propose Composable Behavior Embedding (CBE), a continuous behavior representation for long-horizon visual navigation. CBE is learned in an end-to-end fashion; it effectively captures path geometry and is robust to unseen obstacles. We show that CBE can be used to performing memory-efficient path following and topological mapping, saving more than an order of magnitude of memory than behavior-less approaches.
[paper] [website] [code] Visual topological navigation has been revitalized recently thanks to the advancement of deep learning that substantially improves robot perception. However, the scalability and reliability issue remain challenging due to the complexity and ambiguity of real world images and mechanical constraints of real robots. We present an intuitive solution to show that by accurately measuring the capability of a local controller, large-scale visual topological navigation can be achieved while being scalable and robust. Our approach achieves state-of-the-art results in trajectory following and planning in large-scale environments. It also generalizes well to real robots and new environments without finetuning.
[paper] [website] [code] End-to-end learning for autonomous navigation has received substantial attention recently as a promising method for reducing modeling error. However, its data complexity, especially around generalization to unseen environments, is high. We introduce a novel image-based autonomous navigation technique that leverages in policy structure using the Riemannian Motion Policy (RMP) framework for deep learning of vehicular control. We design a deep neural network to predict control point RMPs of the vehicle from visual images, from which the optimal control commands can be computed analytically. We show that our network trained in the Gibson environment can be used for indoor obstacle avoidance and navigation on a real RC car, and our RMP representation generalizes better to unseen environments than predicting local geometry or predicting control commands directly.
A neural policy enabling a 7-DoF robotic arm (the PR2 robot) to hit a high-speed ball (>8m/s) thrown at it, learned without supervision.
Policy is first learned through Reinforcement Learning in the MuJoCo simulator and later transferred to the real robot.
Real robot uses 30Hz depth images to estimate ball states. From the ball is thrown till the ball hits the robot, the robot only has about 0.3 seconds to react.
Surprisingly, the robot also learns to act like a Jedi!
Joint work with Boling Yang and Felix Leeb.
SkyStitch is a multi-UAV based video surveillance system. Compared with a traditional video surveillance system that captures videos with a single camera, SkyStitch removes the constraints of field of view and resolution by deploying multiple UAVs to cover the target area. Videos captured from all UAVs are streamed to the ground station, which are stitched together to provide a panoramic view of the area in real time.
This is the biggest project I have ever accomplished. It consists of 16k lines of high-optimized code running on hetrogeneous processors (x86 & ARM CPU, desktop & mobile GPU, microcontroller on the flight controller, etc.). Moreover, there is tons of mechanical work to do. We built 4 generations of prototypes and had to deal with numerous number of crashes.
More information can be found on the project webpage.
Augmented reality (AR) systems currently overlay information based on location and general compass orientation. However, interacting virtually with such information in the environment through AR is still in its infancy. With the proliferation of mobile phones and wearables, we foresee applications that will adopt AR as one of their key modes for user interactions.
The key challenge to enable human-object interaction is how to reliably detect and localize an object, which requires instance-level recognition. Existing image-classification techniques have limited usage because they cannot generalize to new classes and they are not able to distinguish between two objects of the same class.
I am currently developing new techniques that enable users to "tag" an object and later retrieve such object in a new image.
Here is a demo on how it works. Note that this is only for demonstrative purpose. I am improving the underlying detection algorithm to make it work more robustly in the wild.
This robotic segway is capable of balancing with two wheels. It can be controlled wirelessly from a PC or by a PS3 game controller. I started this project back in 2011 when I was in my first year. I built everything from scratch, including all the machining work (My father lent me a hand).
The experience has been very rewarding. The state estmation algorithm inspired me to adopt it for homography estimation in my SkyStitch project.
More information can be found on the project webpage.
There are many interesting aspects in this project, including how to convert Pascal source code into C and how to make it compatible with WebGL and non-blocking IO. I wrote a report to discuss some related technical details.
Firefox 17.0: https://ftp.mozilla.org/pub/firefox/releases/17.0b6/
Semantic Terrain Classification for Off-road Autonomous Driving Amirreza Shaban∗, Xiangyun Meng∗, JoonHo Lee∗, Byron Boots, Dieter Fox. CoRL 2021.
Learning Composable Behavior Embeddings for Long-horizon Visual Navigation Xiangyun Meng, Yu Xiang and Dieter Fox. RA-L 2021.
Scaling Local Control to Large Scale Topological Navigation Xiangyun Meng, Nathan Ratliff, Yu Xiang and Dieter Fox. ICRA 2020.
Neural Autonomous Navigation with Riemannian Motion Policy Xiangyun Meng, Nathan Ratliff, Yu Xiang and Dieter Fox. ICRA 2019.
SkyStitch: a Cooperative Multi-UAV-based Realtime Video Surveillance System with Stitching Xiangyun Meng, Wei Wang, and Ben Leong. Proceedings of the ACM Multimedia Conference 2015 (ACMMM 2015), Brisbane, Australia. Oct 2015.