research: transformers and llms. Worth the hype?

GPT-3 and ChatGPT have been fairly popular topics. Having research experience in machine learning, but not text-based applications, I was curious how "innovative" OpenAI's GPT really is. I was inspired partially by a tweet by Yann LeCun, one of the pioneers of AI and Machine Learning.

To be clear: I'm not criticizing OpenAI's work nor their claims.

I'm trying to correct a *perception* by the public & the media who see chatGPT as this incredibly new, innovative, & unique technological breakthrough that is far ahead of everyone else.

It's just not.

— Yann LeCun (@ylecun) January 24, 2023

So how innovative is ChatGPT?

Short answer: it isn't

Longer answer:

This is where I started doing research. I know the seminal paper in text based machine learning approaches is "Attention is All You Need" in 2017. This paper explains the concept of attention through a new type of model called the "transformer". To skip a bunch of math and computer science, the simplist understanding of the transformer/attention are "a method for a machine to learn what to pay attention to".

I'm going to skip over technical details and go to results of this new (for 2017) research (I link a bunch of technical papers that I read at the end). The results showed that the model could easily understand and translate from English to other languages(and vice versa). In simple English, all it was shown was an English word, told to guess what the output should be in another language, and then told if it was right or wrong

How do we get from 2017 research to chatGPT?

Researchers discovered that not only are these new Transformer models really good at understanding language data (and also video data, but thats a whole new topic), but that they are a perfect fit for self-supervised learning.

Self-supervised learning is the "secret sauce" of these new chatbots. Previously, I showed the example of being shown a word, and comparing the models output to the actual one. The issue with this is that you need to know the translation beforehand. Self-supervised learning gets rid of this idea completely.

The self-supervised learning method uses the data itself as both the input and the output. Now, for training an English language model, you ask the machine to predict the next word in a sentence, rather than an answer. Part of the input becomes the answer. This Facebook AI research blog, "Self-supervised learning: The dark matter of intelligence" explains the concept in depth. Because of the amount of data possible now, it led to the birth of LLMs, or Large Language Models. Methods like self-supervised learning are why you commonly hear the growing importance of data. Simply, data is learning. The more data, the more you learn.

This cake analogy from NIPS illustrates it perfectly

So what does this mean for ChatGPT?

Knowing more about the technology behind ChatGPT, we now can understand that it's simply trying to get the best next word. ChatGPT really isn't understanding the deeper meaning when you ask it to tell a joke, writing an email reply, or an essay. It's really just finding the next best word that makes the sentence sound normal.

ChatGPT isn't some terminator robot that will take over the world, but a tool that makes text that sounds convincing, even when it isn't. The danger isn't in ChatGPT itself, but the misuse and misunderstanding of it as something greater than a statistical machine.


Attention Is All You Need
Language Models are Few-Shot Learners
Self-supervised learning: The dark matter of intelligence