Zero-th order algorithm for softmax attention optimization

Publication
, 24–33
Yichuan Deng
Yichuan Deng
Ph.D. Student