Upstage

View Original

GPT Series and Development Process

2023/08/24 | Written By: Sungmin Park

(This content was written based on the “GPT Series and Development Process” lecture from 'ChatGPT UP for Everyone' produced by Upstage.)

How did the GPT series develop to create ChatGPT, the most popular and widely known AI among the series for its developments day-by-day? We take a look at this journey of 5 years, from the concept of a basic language model to the ChatGPT era in recurrent neural networks (RNNs).

Language Model

Generative Pre-trained Transformer (GPT) is a large-scale language model developed by OpenAI and used for various natural language processing functions. For this reason, first understanding the language model can help you navigate the evolution of GPT. When a language model generates an answer, it relies on its ability to guess the next word. Let's take a look at the example below.

Q. What is the correct word to put in the blank?
”The [ ] I participated in was difficult, but very rewarding.”
(1) Running
(2) Nap
(3) Festival


The problem is figuring out what goes in the blanks.

This approach applies to language models as well. In their case, the advantage is that the model itself can generate countless correct answers using the structure of words or sentences without requiring human data. Language modeling has the characteristics of self-supervised learning and can be considered advantageous for creating pre-trained models.

Recurrent Neural Networks (Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/)

In the early days of deep-learning, language processing models were created as “RNN” architectures (the structure of the model and the frame of operations). RNN means Recurrent Neural Network, and is a keyword utilized when the connections between nodes form a cycle. Due to these characteristics, they are specialized in handling sequence-type data such as natural language.

Then, what exactly developed the simple next-word matcher to our complex ChatGPT?



GPT Series And Development Process Emergence (April 2017)

Sentiment neurons (Source: https://openai.com/research/unsupervised-sentiment-neuron)

In 2017, OpenAI made its language model a Recurrent Neural Network (RNN). Through this process, they discovered that certain neurons stimulated sentiment analysis, leading to the hypothesis that unintended abilities are created in the language modeling process.

<What is Sentiment Analysis? >

  • The process of analyzing and judging emotions or opinions extracted from text content using artificial intelligence technology.

  • Mainly performed when analyzing text data such as movie reviews and online media posts; AI understands sentences similarly to humans and identifies what emotions are contained in texts, distinguishing positivity, negativity, neutrality, etc.

Transformer (June 2017)

In 2017, Transformer, a type of architecture similar to Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs), emerged. The key to this architecture was an operation called ‘Attention,’ which represented the relationship between items. As a result, the Google Brain team published the paper, “Attention is all you need,” highlighting its importance. Due to Transformer’s enhanced computational efficiency and result quality compared to existing RNNs, it has had a great influence, becoming a technology used in other fields such as vision, recommendation systems, and bioinformatics.


GPT (June 2018)

An year later, the Generative Pre-training Transformer (GPT) made its first appearance—the creation of a language model with a self-supervised learning method as described above. GPT was considered a representative of the pre-training/fine-tuning paradigm, a fine-tuning process using  large-scale language modeling to create a pre-trained model from the advent of GPT whilst training this model with a small dataset suitable for each task. It has thus shown excellent performance in various NLP tasks.

< And fine-tuning is? >

  • The task of improving performance for a specific domain or task based on a pre-trained model.

  • The key idea is that reforming pre-trained models with larger datasets can reduce the model's training time on new tasks or domains and improve performance even when data is limited.

GPT-2 (February 2019)

Source: Language Models are Unsupervised Multitask Learners

GPT-2 is a renewed version created by increasing the size of the existing model (117M → 1.5B) and increasing the amount of training data (4GB → 40GB). However, OpenAI judged that while GPT-2 has excellent generation ability, it still carries a high risk of generating false information, therefore keeping it concealed from the outside world. GPT-2’s emergence presented another great enhancement on language-generation.


“Emergence”: Zero-shot Learning

What new possibilities did the advent of GPT-2 reveal? The concept of “zero-shot learning,” where a model performs a new task without ever seeing an example, brought new opportunities to the field. Alternatively called Unsupervised Multitask Learners, these models initially started out as language models, but several experiments were conducted to determine whether it could perform other tasks such as reading comprehension, translation, summarization, and Q&A.

Source: Language Models are Unsupervised Multitask Learners

As mentioned in the paper above, as the number of parameters increased, the performance of zero-shot increased as well, and in certain tasks, it was shown to outperform SOTA (state-of-the-art) models.

GPT-3 (June 2020)

After confirming the various abilities of GPT through these experiments, GPT-3, which appeared in 2020, expanded once more. The model grew from 1.5B to 175B, and similarly, more than 600GB of data was submitted. Because of pre-training utilizing more data than prior series, generation ability was enhanced significantly. GPT-3 was able to advance in several aspects, including “learning” tasks without prior knowledge or training (few-shot learners). The previous versions only performed tasks, but GPT-3 displayed its ability to learn tasks on its own.

“Emergence”: In-context Learning

(Schematic of the fine-tuning process by inputting examples for each task into the model prior to in-context learning / Source: Language Models are Few-Shot Learners)

The emergence of GPT-3 furthermore brought in-context learning. Before in-context learning, models required fine-tuning through the inputs of various examples of tasks. The necessity of a model and corresponding data served as a limitation, but, starting with GPT-2’s zero-learning, new tasks could be performed by inputting a few example prompts (few-shot) without a need to update the model.

(Source: Language Models are Few-Shot Learners)

Before the Release of GPT-4, 2021-2022

After GPT-3, people's anticipation for GPT-4’s release grew. Prior to the official release of GPT-4, four noteworthy models appeared in the industry. CLIP and DALL-E, which are models that present images as text, Codex, which generates code, and InstructGPT, a language model that applies instruction in fine-tuning, emerged. Unlike the existing GPT, InstructGPT gave direct instructions to the model, and received great attention as a language model designed to give answers according to a series of instructions and tailored to the user's intention.

  • CLIP (January 2021): Classifying “zero-shot” images

  • DALL-E (January 2021): Creating an image from given text

  • Codex (August 2021): Generating Code

  • InstructGPT (January 2022): Emphasizing fine-tuning and reinforcement learning for commands. If the existing GPT required conditions or examples through prompt engineering for the model to perform a specific task well, InstructGPT generated the result as requested by the user with simple natural language directives.

    • < Content Generated by Simple Language Models Regarding Instructions >
      💬 ”Tell me about ChatGPT.”
      → EXPLAIN BERT / EXPLAIN GPT

    • < Content Generated by Instruction Fine-Tuning Regarding Instructions >
      💬 ”Tell me about ChatGPT.”
      → ChatGPT is one of the natural language processing models developed by OpenAI. This model is based on the Generative Pre-trained Transformer (GPT) architecture and is used as a dialog-based artificial intelligence model. ChatGPT is pre-trained in advance with a large amount of data, and then optimized through a fine-tuning process based on various conversation data. This allows ChatGPT to generate natural responses in conversation with users and discourse on a variety of topics.

GPT-3.5 (March 2022)

GPT-3.5 added code data and instruction fine-tuning to GPT-3. According to the speculation of many researchers, it is unknown whether this method directly affected the model, but it was observed that adding code data increased GPT's reasoning ability and understanding of longer inputs.

In addition, instruction fine-tuning was applied to GPT-3.5, adding the experimental methods of InstructGPT (January 2022). This development focused on the fact that fine-tuning and reinforcement learning better understood and responded to user intentions and commands.

ChatGPT (November 2022)

ChatGPT, which first appeared in 2022, was one of the models that ushered the popularization of AI. This model is a fine-tuning of GPT-3.5, and in OpenAI, is also called the “sibling model” due to its similar learning method to InstructGPT.

Source: OpenAI Blog

The ChatGPT model’s first step required demonstration data consisting of instruction prompts and datasets. Throughout the process, the trainer labeled actions suitable for the instruction prompt, and the collected dataset was used to fine-tune GPT-3.5 through SFT (Supervised Fine Tuning) model learning.

Next, ChatGPT was updated with reinforcement learning (RL) using a reward model (RM) for user preferences. Through this method, ChatGPT could provide more diverse and natural conversations.

Journey from RNN to ChatGPT

From RNN to ChatGPT, we have reflected together upon the GPT series’ long journey. What is coming up in Next GPT, the successor to ChatGPT? For more information on how ChatGPT will be utilized in the future, alongside the various aspects needed for its development, check out this webinar replay page.