About me
I am a third year PhD student in computer science at the The University of Washington, where I work
with Hannaneh Hajishirzi and Ali Farhadi.
My research spans natural language processing and computer vision. Prior to UW, I received my B.Sc. in computer engineering from Sharif University of Technology.
I publish under my full name Mohammadreza, and go by Reza among my friends.
To masters and undergraduate students: I'm looking for motivated students to work with us on some cool large-scale multi-modal video projects. Please email me your CV and a link to your code portfolio (e.g. Github acount etc.) if you are interested!
News
Publications
For the full list of publications please visit the publications page.
CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement
Mohammadreza Salehi, Mehrdad Farajtabar, Maxwell Horton, Fartash Faghri, Hadi Pouransari, Raviteja Vemulapalli, Oncel Tuzel, Ali Farhadi, Mohammad Rastegari, Sachin Mehta
UniReps Workshop at NeurIPS'23 (under submission to TMLR)
arXiv / Code (Soon!)
TL;DR
We showed how one can merge any task-specific expert model from open-source model zoos into foundation models (FMs) such as CLIP. This enhances the visual features of FM for dense prediction and localization tasks without collecting any supervised data.
SHARCS: Efficient Transformers through Routing with Dynamic Width Sub-networks
Mohammadreza Salehi, Sachin Mehta, Aditya Kusupati, Ali Farhadi, Hannaneh Hajishirzi.
EMNLP'23 Findings
arXiv / Code
TL;DR
We introduced a new sample adaptive inference method called SHARCS🦈. It routes samples to different sub-networks with varying widths within any transformer network based on the hardness of input sample.
Attentional Mixtures of Soft Prompt Tuning for Parameter-efficient Multi-task Knowledge Sharing
Akari Asai, Mohammadreza Salehi, Matthew E. Peters, Hannaneh Hajishirzi.
EMNLP'22
arXiv / Code
TL;DR
We introduced a new parameter-efficient fine-tuning method based on prompt tuning. In our method, prompts for some source tasks are learnt and for each sample in a new target task an attentional mixture of source prompts is used as the target prompt.
MERLOT Reserve: Multimodal Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers, Jiasen Lu, Ximing Lu, Youngjae Yu, Yanpeng Zhao, Mohammadreza Salehi, Aditya Kusupati, Jack Hessel, Ali Farhadi, Yejin Choi.
CVPR'22
arXiv / Code
TL;DR
We introduce MERLOT Reserve, which learns from 20 million YouTube videos through all their modalities (audio, vision, and text). Learning from audio helps broadly -- even on single-image tasks like VCR. Our model learns state-of-the-art representations, that also transfer well to video-based tasks in a zero-shot setting.
Paraphrase Generation by Learning How to Edit from Samples
Amirhossein Kazemnejad, Mohammadreza Salehi, Mahdieh Soleymani Baghshah.
ACL'2020
ACL Anthology
TL;DR
Paraphrase generation by retrieving similar paraphrase pairs from a pre-existing corpus and editing them using multi-head attention mechanism.