An Undergrad Is All You Need
Download: PDFAbstract
The mechanism of self-attention has generally displaced the large convolutional neural architecture commonly used for tasks adjacent to natural language understanding. Specifically, Transformer models that exploit self-attention have been leveraged with surprising success in large-language models such as LaMDA and GPT-3. However, these large-language models are expensive to train, require large amounts of training data, and are prone to hallucination. In this paper, we introduce GPT-UGRD, a novel autoregressive architecture that requires minimal training and comes ready out-of-the-box for multi-modal learning with a modest watt-per-token power consumption. We show that it performs equivalently to, or better than the state-of-the-art, reporting an average BLEU score of 69.420.
BibTex entry
@inproceedings{Yoo22, author = {James Yoo}, title = {An undergrad is all you need}, booktitle = {SIGBOVIK 2023: Proceedings of the ACH Special Interest Group on Harry Q. Bovik}, pages = {7-12}, month = Apr, year = {2023} }