Jae Sung (James) Park
I am interested in how machines use visual perception and language understanding to reason about the visual world in a way humans do.
Specifically, my research projects have been focused on:
- Empowering Visual Commonsense Reasoning of AI models
- Grounding Objects, Concepts, Actions to Images and Videos
- Evaluation of Multimodal Language Models
I am looking for full-time industry positions in Multimodal AI/ML starting in summer 2025. Please feel free to reach out if you are interested in working with me.
|
|
Research
|
Molmo and pixmo: Open weights and open data for state-of-the-art multimodal models.
Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi ... Ranjay Krishna, Luca Weihs, Noah A Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, Aniruddha Kembhavi
arxiv, 2024
arXiv /
demo /
dataset /
code
|
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Caption
Anas Awadalla, Le Xue, Manli Shu, An Yan, Jun Wang, Senthil Purushwalkam, Sheng Shen, Hannah Lee, Oscar Lo, Jae Sung Park, Etash Guha, Silvio Savarese, Ludwig Schmidt, Yejin Choi, Caiming Xiong, Ran Xu
arxiv, 2024
arXiv /
dataset
|
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness.
Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung Park, Jack Hessel, Lijuan Wang, Yejin Choi
arxiv, 2024
arXiv
|
ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
Mohammadreza Salehi, Jae Sung Park, Tanush Yadav, Aditya Kusupati, Ranjay Krishna, Yejin Choi, Hannaneh Hajishirzi, Ali Farhad
Neurips Dataset & Benchmarks, 2024
arXiv /
website
|
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Ethan Shen, Alan Fan, Sarah Pratt, Jae Sung Park, Matthew Wallingford, Sham Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati.
Neurips, 2024
arXiv
|
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Jae Sung Park, Jack Hessel, Khyathi Chandu, Paul Pu Liang, Ximing Lu, Peter West,
Youngjae Yu, Qiuyuan Huang, Jianfeng Gao, Ali Farhadi, Yejin Choi
Neurips, 2023
arXiv
|
Multimodal knowledge alignment with reinforcement learning
Youngjae Yu, Jiwan Chung, Heeseung Yun, Jack Hessel, Jae Sung Park, Ximing Lu, Prithviraj Ammanabrolu, Rowan Zellers, Ronan Le Bras, Gunhee Kim, Yejin Choi
CVPR, 2023
arXiv
|
Exposing the limits of video-text models through contrast sets
Jae Sung Park, Sheng Shen, Ali Farhadi, Trevor Darrell, Yejin Choi, Anna Rohrbach
NAACL (short), 2022
arXiv /
code
|
Merlot: Multimodal neural script knowledge models
Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi
Neurips, 2021
arXiv
|
LLC: Accurate, multi-purpose learnt low-dimensional binary codes
Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan, Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi
Neurips, 2021
arXiv
|
Natural language rationales with full-stack visual reasoning: From pixels to semantic frames to commonsense graphs
Ana Marasović, Chandra Bhagavatula, Jae Sung Park, Ronan Le Bras, Noah A Smith, Yejin Choi
Findings of EMNLP, 2020
arXiv
|
VisualCOMET: Reasoning about the Dynamic Context of a Still Image
Jae Sung Park,
Chandra Bhagavatula,
Roozbeh Mottaghi,
Ali Farhadi,
Yejin Choi
ECCV, 2020 (Spotlight)
project page
/
arXiv
/
code
|
Identity Aware Multi-Sentence Video Description
Jae Sung Park,
Trevor Darrell,
Anna Rohrbach
ECCV, 2020
project page
/
arXiv
|
Adversarial Inference for Multi-Sentence Video Description
Jae Sung Park,
Marcus Rohrbach,
Trevor Darrell,
Anna Rohrbach
CVPR, 2019 (Oral)
arxiv
/
code
|
|