Jae Sung (James) Park

I am a final-year PhD student at University of Washington in Computer Science and Engineering advised by Yejin Choi, Ali Farhadi, and Ranjay Krishna. Previously, I received my B.S. degree in EECS at University of California, Berkeley, where I worked closely with Anna Rohrbach and Trevor Darrell .

I am interested in how machines use visual perception and language understanding to reason about the visual world in a way humans do. Specifically, my research projects have been focused on:

  • Empowering Visual Commonsense Reasoning of AI models
  • Grounding Objects, Concepts, Actions to Images and Videos
  • Evaluation of Multimodal Language Models

Email  /  Google Scholar  /  Github    

I am looking for full-time industry positions in Multimodal AI/ML starting in summer 2025. Please feel free to reach out if you are interested in working with me.
profile photo
News
Research
Molmo and pixmo: Open weights and open data for state-of-the-art multimodal models.
Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi ... Ranjay Krishna, Luca Weihs, Noah A Smith, Hannaneh Hajishirzi,
Ross Girshick, Ali Farhadi, Aniruddha Kembhavi
arxiv, 2024
arXiv / demo / dataset / code
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Caption
Anas Awadalla, Le Xue, Manli Shu, An Yan, Jun Wang, Senthil Purushwalkam, Sheng Shen, Hannah Lee, Oscar Lo, Jae Sung Park, Etash Guha, Silvio Savarese, Ludwig Schmidt, Yejin Choi, Caiming Xiong, Ran Xu
arxiv, 2024
arXiv / dataset
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness.
Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung Park, Jack Hessel, Lijuan Wang, Yejin Choi
arxiv, 2024
arXiv
ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
Mohammadreza Salehi, Jae Sung Park, Tanush Yadav, Aditya Kusupati, Ranjay Krishna, Yejin Choi, Hannaneh Hajishirzi, Ali Farhad
Neurips Dataset & Benchmarks, 2024
arXiv / website
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Ethan Shen, Alan Fan, Sarah Pratt, Jae Sung Park, Matthew Wallingford, Sham Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati.
Neurips, 2024
arXiv
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Jae Sung Park, Jack Hessel, Khyathi Chandu, Paul Pu Liang, Ximing Lu, Peter West, Youngjae Yu, Qiuyuan Huang, Jianfeng Gao, Ali Farhadi, Yejin Choi
Neurips, 2023
arXiv
Multimodal knowledge alignment with reinforcement learning
Youngjae Yu, Jiwan Chung, Heeseung Yun, Jack Hessel, Jae Sung Park, Ximing Lu, Prithviraj Ammanabrolu, Rowan Zellers, Ronan Le Bras, Gunhee Kim, Yejin Choi
CVPR, 2023
arXiv
Exposing the limits of video-text models through contrast sets
Jae Sung Park, Sheng Shen, Ali Farhadi, Trevor Darrell, Yejin Choi, Anna Rohrbach
NAACL (short), 2022
arXiv / code
Merlot: Multimodal neural script knowledge models
Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi
Neurips, 2021
arXiv
LLC: Accurate, multi-purpose learnt low-dimensional binary codes
Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan, Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi
Neurips, 2021
arXiv
Natural language rationales with full-stack visual reasoning: From pixels to semantic frames to commonsense graphs
Ana Marasović, Chandra Bhagavatula, Jae Sung Park, Ronan Le Bras, Noah A Smith, Yejin Choi
Findings of EMNLP, 2020
arXiv
VisualCOMET: Reasoning about the Dynamic Context of a Still Image
Jae Sung Park, Chandra Bhagavatula, Roozbeh Mottaghi, Ali Farhadi, Yejin Choi
ECCV, 2020 (Spotlight)
project page / arXiv / code
Identity Aware Multi-Sentence Video Description
Jae Sung Park, Trevor Darrell, Anna Rohrbach
ECCV, 2020
project page / arXiv
Adversarial Inference for Multi-Sentence Video Description
Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach
CVPR, 2019 (Oral)
arxiv / code

Service
Teaching