2024/07/18 | Written by: Sungmin Park

Are you an engineer, AI enthusiast, or data scientist interested in large language models (LLMs)? Or a business looking to build cost-effective and powerful LLMs? If so, dive into our informative course on “Pretraining LLMs” to expand your technical knowledge!

We’re excited to announce the launch of our "Pretraining LLMs" courses in collaboration with the education technology company, DeepLearning.AI. This means a lot to us since we’re the first Korean company to launch a course with Deeplearning.AI. Deeplearning.AI designed to fill a need for world-class AI education, is renowned in the AI industry for its founding from machine learning and education pioneer Andrew Ng.

Why Pre-training?

The reason we planned this collaboration with Andrew Ng is because pre-training is the first step in training an LLM, which could become the key to developing a model with the required capabilities. Pre-training a large language model is a complex process requiring taking a model, usually a transformer neural network, and training it on a large text corpus using supervised learning. It involves learning to predict the next token after given an input prompt, and is the first step of training an LLM before fine-tuning or further alignment to human preferences is carried out. Upstage also advanced our 'Solar Mini' using 'Depth Up-Scaling' (DUS), which involves depthwise scaling and continued pre-training.

Think of pre-training as having to learn the alphabet before reading a book. In the world of AI, pre-training helps models understand the basic building blocks of data, such as words, sentences, or images, which are then used to solve complex problems in a more efficient manner.

Here's a specific example of the impact of pre-training: When you want the model to develop a deep understanding of a new domain, additional pre-training is required to achieve good performance.
(Source: DeepLearning.AI)

What is the Difference Between Training from Scratch and Continued Pre-training?

When it comes to machine-learning models, there are two main approaches to training: from scratch and continual pre-training.

Training models from scratch involves training a model on a dataset from the very start, without any prior knowledge or pre-trained models. Essentially starting from a blank slate, the model has to learn everything from scratch. For example, if you were teaching a model to recognize cats, you would start by showing the model random images and introduce cat images progressively until the model can accurately identify a cat.

On the other hand, continued pre-training involves using a pre-trained model on a specific dataset to either broaden or deepen the model's knowledge and capabilities. This is akin to giving the model a head start, as it already has prior knowledge about the task. For example, if you were teaching a model to recognize cats, you could use a pre-trained model that has already been trained on a large dataset of images. This model would already have an understanding of what an image looks like, and you would only need to fine-tune it on a smaller dataset of cat images to improve its accuracy.

The choice between training from scratch or continued pre-training depends on the specific domain and the available resources.

Learn more About Pre-training in Our Course!

In our ‘Pre-training LLMs’ course, learners can explore cases where pre-training is the ideal option to achieve good performance.

Throughout the course, learners will:

Explore scenarios where pre-training is the optimal choice for model performance. Compare text generation across different versions of the same model to understand the performance differences between base, fine-tuned, and specialized pre-trained models.
Learn how to create a high-quality training dataset using web text and existing datasets, crucial for effective model pre-training.
Prepare your cleaned dataset for training. Learn how to package your training data for use with the Hugging Face library.
Explore ways to configure and initialize a model for training and see how these choices impact the speed of pre-training.
Learn how to configure and execute a training run, enabling you to train your own model.
Learn how to assess your trained model’s performance and explore common evaluation strategies for LLMs, including important benchmark tasks used to compare different models’ performance.

After taking this course, you’ll be equipped with the skills to pretrain a model—from data preparation and model configuration to performance evaluation. Enroll for free using the link below!

👉 Enroll for Free

Unlock Advanced AI Capabilities with Upstage!
Discover Our Innovative Solutions and Educational Resources

✨ Contact Us to Learn More

[Upstage X DeepLearning.AI] “Pre-training LLMs” Courses Now launched in Collaboration with AI Pioneer Andrew Ng!

Why Pre-training?

What is the Difference Between Training from Scratch and Continued Pre-training?

Learn more About Pre-training in Our Course!

Enhancing Tax Administration Efficiency Using the Latest Document AI Technology: Customer Success Stories

Building end-to-end RAG system using Solar LLM and MongoDB Atlas