Upstage

View Original

2023 Retrospective: Insights from a CTO Entering the World of Large Language Models

2023/12/22 | Written by: Hwalseok Lee

This year has been as shocking and confusing for me as COVID-19. This is because the Large Language Model (LLM) represented by ChatGPT took the world by storm this year. The experience provided by ChatGPT spread not only to AI industry workers but also to the general public, raising both expectations and concerns about LLM. Many companies are allocating huge budgets to introduce LLM, and big tech companies such as OpenAI, Google, and Meta, as well as startups such as Anthropic and Cohere, are also working hard to launch new LLM every week. 2023 was an exciting year, as we all rode the big wave of LLM advancements.

Is it a crisis or an opportunity?

Market turmoil can represent a crisis for some companies while simultaneously presenting opportunities for others. This is especially true when game-changers like LLM emerge, opening up new markets. With the market not yet firmly established, numerous companies attempt to enter with various strategies. Only a handful of products survive this cycle of attrition. Eventually, as the market goes through transitions, surviving products define the segments, leading to market stabilization.

In a saturated market, it is nearly impossible for a startup to outperform large corporations in theory. Therefore, cases where startups beat existing large corporations usually occur during the period when a new industry is emerging. This is true even when we look back at the emergence of big tech companies that changed the world. In 1975, when the personal PC market first opened, Microsoft appeared, and in 1998, when the Internet was rapidly spreading and a new industry was emerging, Larry Page founded Google. Facebook appeared in 2004, the year the Internet market recovered after the dot-com bubble, and four years later, the iPhone 3 was released to the world, causing explosive growth in the social media industry along with smartphones.

I thought that the current market was a very good opportunity for a startup like Upstage. Although we had a business that we had previously focused on, the LLM opened up a whole new market, and we thought we should seize this opportunity. On the other hand, I think I thought that this opportunity would not last long. So, starting in June, we formed an LLM TF and took on the challenge of raising the ranking on the Open LLM Leaderboard of Hugging Face, the world's largest open source AI model platform, similar to the 'Billboard Chart' in the AI field .

Achieved first in Hugging Face ‘Open LLM Leaderboard’

Two months after assembling a task force of exceptional engineers, Upstage's 30B model—refined from Meta's LLaMa2 with 70 billion parameters—surpassed the LLaMa2 70B and secured first place for the first time in the domestic large language model (LLM) rankings. Scores increased across all areas, including ARC, HellaSwag, MMLU, and TruthfulQA. The great Kagglers, who have experience presenting papers at international conferences, created an in-house leaderboard to encourage friendly competition among team members, quickly share the latest research trends, discuss constantly, and try many different things in a short period of time. I think it was possible.

Many people at Upstage developed fields such as Document AI (OCR) and recommendations, but suddenly asked how they could do well in LLM. In the past, when algorithms were developed based on rules, specific rules were programmed for each field and artificial intelligence systems were developed based on these. With the advent of deep learning, the end-to-end method was introduced and we entered a paradigm in which images and natural language could be processed using the same model architecture as long as the learning data was well prepared. The representative model is Transformer. Both natural language processing and computer vision are implemented based on transformers and handle a variety of tasks. With the advent of super-large language models, there is also a movement to process each modal as a single architecture. In fact, there were many multimodal fields before that, but with the emergence of super-large language models, such movements are accelerating. From this perspective, it is essential for AI engineers to have the ability to move across various modals, rather than being confined to one specific field. And to make this possible, strong foundational knowledge of machine learning must be supported. In Upstage, this was possible because there were many people who had strong foundational knowledge and were able to apply it well.

LLM in Real World Business

I've been quite busy lately, getting a lot of messages from people seeking to understand and leverage LLMs (Large Language Models) better. Although LLM technology is continuously being developed by researchers, there are concerns in the real world. I'd like to share with you a brief conceptual explanation of how we're trying to overcome two of the biggest concerns about LLM: the first is the illusion, and the second is our own model that we can control without security issues.

First, a representative attempt to alleviate hallucinations is a technique called Retrieval Augment Generation (RAG). RAG is a method that uses search technology to secure information that is fundamental when performing a task requested by a user, and then generates an answer based on the search results. The way LLM and search technology are linked is based on Vector DB. All information is turned into knowledge by using the Embedding Model, which turns a certain amount of text information into one vector. At this time, the more structured the text information, the better the RAG performance. To this end, the demand for not only OCR technology, which converts offline documents into text, but also Parsing technology, which converts text information into structured information, is increasing. Upstage is also constantly researching and developing this area and continuing to improve its performance.

Second, we are creating a small scale LLM for business application. Private (Custom) LLM refers to an LLM trained with your own data. Currently, Private LLM is implemented with models with a maximum number of parameters of 100B or less, considering technology accessibility, learning, and inference costs. Private LLM has strengths such as controllability, data security, and cost optimization. What needs to be verified most quickly in a private LLM-centered business is whether performance similar to or better than ChatGPT is achieved when the domain or usage method of knowledge is limited. Because the Private LLM business itself has only just begun, there are not many practical business cases yet, and this is not just a situation in Korea, but a global situation. Therefore, whether private LLM cases that satisfy customers will emerge next year will determine the future direction of the private LLM market.

What I personally think is the company's greatest asset this year is that, through the Document AI project, we have successfully experienced the entire major cycle of product development, product delivery, and product maintenance for B2B Enterprise AI products. There are many invisible barriers to solving real-world problems with AI models. We believe that only a small number of AI companies around the world have experienced this process end-to-end. We believe that Upstage's ability to develop world-class models, as well as its experience in solving actual customer problems using these models, will be a great strength for Upstage in launching LLM-related B2B services in the future.

Reasons to look forward to 2024 - The meeting of LLM and Document AI, Upstage in the world

In the process of exploring new opportunities through LLM technology, our Document AI business has also been steadily progressing throughout this year. I have often been asked whether I should change my business plan to an LLM. Luckily, we also discovered that Document AI and LLM technologies, which we were already doing well with, were great for creating synergy together. It is no exaggeration to say that when creating an LLM that can be used in actual business, how well the company processes its own data determines the performance of the LLM. As briefly mentioned above when talking about RAG, it is important to turn work-related documents into knowledge into well-structured text information, and having OCR technology (Layout Analysis) that automatically recognizes the document structure is a great synergy. I believe that this process of recognizing/saving information with Document AI technology and retrieving the information through LLM and using it for analysis will be a great foundation for increasing work productivity and business value.

Recently, the Solar Mini once again ranked first in the Huggingface Open LLM Leaderboard, raising awareness of Upstage in the global market. In order to check the global response to Upstage's technology, we attended many conferences and events this year and met various people. In a situation where there are not many companies around the world learning their own LLM model, we were able to feel that Solar's good results had a significant impact on the external reputation of Upstage.

The Best Way to Predict the Future is to Create It!

It's a famous quote by Alan Kay, and it's kind of my motto for the year. I've been swaying moderately in the midst of a seemingly chaotic future, building a future with good colleagues. I feel like I'm bragging now that I've written this, but there were many times when I was honestly feeling overwhelmed and anxious. Even if the views I chose weren't all the right ones, I tried to have faith in how we were going to shape the future because I was nervous about the chaotic future, and I think a big part of that faith was in the people I was working with. I'm looking forward to 2023 with the disruptions and bumps in the road, the lessons learned and insights gained, and the people I work with. One of my goals for 2024 is to share more and more of these insights in the coming year. Stay tuned for our journey!