Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

Chuning Zhu, Raymond Yu, Siyuan Feng, Benjamin Burchfiel, Paarth Shah, Abhishek Gupta
Robotics: Science and Systems (RSS), 2025
ICML Workshop on Building Physically Plausible World Models, 2025 (best paper award)
website / paper / code / talk

Unified World Model (UWM) is a multimodal diffusion transformer that uses separate diffusion timesteps for actions and videos to flexibly learn policies, forward dynamics, inverse dynamics, and video prediction models from both robot and video data.

Distributional Successor Features Enable Zero-Shot Policy Optimization

Chuning Zhu, Xinqi Wang, Tyler Han, Simon Shaolei Du, Abhishek Gupta
Neural Information Processing Systems (NeurIPS), 2024
ICML Workshop on Structured Probabilistic Inference & Generative Modeling, 2024
website / paper / code

DiSPOs learn a distribution of all possible outcomes in the dataset along with a readout policy that acts to realize a particular outcome. This enables zero-shot transfer to downstream rewards by performing a linear regression and planning for the optimal realizable outcome, with no additional training.

Active Exploration for System Identification in Robotic Manipulation

Marius Memmel, Andrew Wagenmaker, Chuning Zhu, Patrick Yin, Dieter Fox, Abhishek Gupta
International Conference on Learning Representations (ICLR), 2024 (oral, top 1%)
website / paper / code / talk

ASID learns an exploration policy in simulation and uses it to collect real-world exploration trajectories that reveal maximal information of unknown system parameters. These trajectories are then used for system identifcation to align the simulator to the real world.

Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon Shaolei Du
International Conference on Learning Representations (ICLR), 2024
NeurIPS Workshop on Foundation Models for Decision Making (FMDM), 2023 (oral, 6/112)
website / paper / code / talk

In the offline RL setting, Q-learning is limited by Bellman-completeness, and return-conditioned supervised learning (RCSL) does not perform trajectory stitching. MBRCSL augments the offline dataset with model-based rollouts of the behavior policy -- covering potentially optimal trajectories -- and then performs RCSL on the augmented dataset. It is free from Bellman completeness and able to perform trajectory stitching.

RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability

Chuning Zhu, Max Simchowitz, Siri Gadipudi, Abhishek Gupta
Neural Information Processing Systems (NeurIPS), 2023 (spotlight)
website / paper / code / talk

RePo is a visual model-based reinforcement learning method that learns a minimally task-relevant representation by optimizing an information bottleneck objective. This allows it to be resillient to spurious variations in the observations, e.g. random distractors and background changes.

Self-Supervised Reinforcement Learning that Transfers using Random Features

Boyuan Chen*, Chuning Zhu*, Pulkit Agrawal, Kaiqing Zhang, Abhishek Gupta
Neural Information Processing Systems (NeurIPS), 2023
website / paper / code / talk

RaMP is a self-supervised RL method that learns from unlabelled offline data and quickly transfers to arbitrary online reward. The idea is to learn environment dynamics by modeling the Q-values of random functions. These can then be linearly combined to reconstruct the Q-value corresponding to any test-time reward.

Model-Based Reinforcement Learning via Latent-Space Collocation

Oleh Rybkin*, Chuning Zhu*, Anusha Nagabandi, Kostas Daniilidis, Igor Mordatch, Sergey Levine
International Conference on Machine Learning (ICML), 2021
website / paper / code / talk

Our planner, LatCo, solves multi-stage long-horizon tasks much harder than those considered previously. By optimizing a sequence of future latent states instead of optimizing actions directly, it quickly discovers the high-reward region to create effective plans.

Pagination

Templates (for web app):

Error