An Undergrad Is All You Need

Download: PDF

Abstract

The mechanism of self-attention has generally displaced the large convolutional neural architecture commonly used for tasks adjacent to natural language understanding. Specifically, Transformer models that exploit self-attention have been leveraged with surprising success in large-language models such as LaMDA and GPT-3. However, these large-language models are expensive to train, require large amounts of training data, and are prone to hallucination. In this paper, we introduce GPT-UGRD, a novel autoregressive architecture that requires minimal training and comes ready out-of-the-box for multi-modal learning with a modest watt-per-token power consumption. We show that it performs equivalently to, or better than the state-of-the-art, reporting an average BLEU score of 69.420.

BibTex entry

  @inproceedings{Yoo22,
    author = {James Yoo},
    title = {An undergrad is all you need},
    booktitle = {SIGBOVIK 2023: Proceedings of the ACH Special Interest Group
      on Harry Q. Bovik},
    pages = {7-12},
    month = Apr,
    year = {2023}
  }