Upstage

View Original

Case Study: Client-specific Large Language Model

2024/04/11 | Written By: YoungHoon Jeon

General-purpose Large Language Models(LLM) are trained on a foundation of widely available public documents, often failing to adequately capture the intricacies of specific domains. General-purpose LLMs also pose risks such as corporate data breaches and the generation of ambiguous outputs, often known as hallucination. Consequently, there arises a necessity for domain-specific private LLMs, which excel in catering to domain-specific tasks while ensuring robust security and privacy measures. Upstage has pioneered the development of such a model, tailored explicitly to the commerce sector, following two rounds of post-training. This model, characterized by a Mixture-of-Experts architecture, has been meticulously crafted to address tasks such as Attribute Value Extraction.

Since 2023, Upstage has been undertaking enterprise development projects utilizing a compact-parameter Korean-language-based LLM. Upstage spearheaded the development of private LLM with Connectwave, marking the pioneering instance in Korea, and has since maintained a consistent record of securing contracts. The deployment of customized models tailored to clients fine-tuned on its proprietary foundation model and seamlessly integrated onto client servers, underscores Upstage's unparalleled status as the premier provider in Korea.

Challenge

Connectwave, renowned as one of Korea's foremost e-commerce price-comparison marketplaces, instituted a directive for its internal team of product listers. This directive required these team members to devote a substantial portion of their time to the meticulous extraction of detailed information from a wide array of product catalogs featured on the platform. The thoroughness of this information extraction process was paramount, as it directly impacted the accuracy and comprehensiveness of product listings on the platform.

Furthermore, following the extraction phase, Connectwave's product listers were tasked with the meticulous organization and categorization of the extracted data. Each piece of information had to be meticulously entered into the corresponding categories within the platform's database. This meticulous categorization process played a pivotal role in enhancing the overall user experience by facilitating efficient product discovery and comparison for consumers navigating the platform.

Solution

Upstage engineered an e-commerce-specialized LLM tailored for the extraction of Attribute-Value pairs from product catalogs. Moreover, the model is adept at conducting sentiment analysis on customer reviews, leveraging sophisticated Natural Language Processing methodologies to yield actionable insights for business stakeholders.

Initially, post-training was performed on the base LLM using both public and private commerce datasets. Leveraging publicly available domain-specific data, the first post-training phase yielded the public post-trained LLM. Subsequently, incorporating the customer's internal or proprietary data, a second post-training phase was executed to produce the private post-trained LLM. Finally, fine-tuning of the model was conducted using the customer's task-specific data, resulting in the development of the task-specific fine-tuned LLM.

Overview of how to create a task-specific LLM

To effectively integrate new domain-specific knowledge into the Base LLM, the Mixture-of-Experts (MoE) architecture was employed. This process entailed expanding the MLP Layer of the Transformer block from the Base LLM to N layers through a copy-and-paste operation, alongside the addition of a Router network and the incorporation of a Weighted Sum component. Other structural elements (e.g., LayerNorm, Attention) were adopted from the Base LLM and adjusted to facilitate further learning based on the provided data.

The rationale behind leveraging the MoE structure stems from the anticipated enhancements in performance attributed to the collaborative nature of multiple expert networks or ensemble effects. Each MLP Layer acts as an expert, specializing in specific tasks or data types. As a result, by collaborating with specialized experts for tasks such as recommendation, explanation, attribute extraction, summarization, etc., the model can provide more accurate and contextually relevant responses across diverse requests and tasks, thereby facilitating more effective customer service.

Overview of MoE structure

Moreover, the MoE structure allows for optimized serving efficiency by activating only specific experts as needed, resulting in a sparse structure. This approach significantly boosts serving speed compared to models of similar size, as only a subset of experts is engaged for each token during inference. Consequently, Upstage has been able to offer a model with the distinct advantage of being notably faster and more finely tuned to cater to customer needs compared to generic models.

Leveraging Technology Partners

Leveraging AWS SageMaker for continual post-training during the model development process proved highly advantageous for Upstage’s project. Given the imperative to train an extensive dataset within project timelines, SageMaker significantly bolstered efficiency, facilitating the effective processing of voluminous data.

AWS SageMaker's support for distributed training allows for the parallel processing of large-scale datasets across multiple computational resources. This feature not only reduces training duration but also streamlines the handling of substantial data volumes. Furthermore, the seamless scalability of computing and storage resources ensures effortless adaptation to processing large datasets or meeting heightened throughput demands, representing a significant advantage.

Results and Benefits

This ecommerce-specialized LLM not only substantially reduces the burden on product listers by minimizing the latency associated with data input tasks, but also sets a standard for product attribute extraction, facilitating data preprocessing and standardization efforts.

This has significantly streamlined the labor-intensive tasks previously undertaken by in-house Merchandise Directors (MDs), who were responsible for extracting designated attributes and their corresponding properties from product metadata managed across multiple e-commerce platforms and aligning them with the client's data schema standards.

The model provided by Upstage was a private LLM specialized in Attribute Value Extraction (AVE), tailored for preprocessing unstructured data from product metadata and reviews, to be strategically leveraged from a planning perspective. The AVE capability involves extracting values of specific attributes from given product information and catalog texts. Demonstrating superior performance compared to GPT-3.5, as evidenced by the AE-110k dataset released by Alibaba, this model has effectively established a commerce-specific private LLM. Its implementation not only enhances the ease and distinctiveness of the shopping experience but also drives improvements in business productivity and efficiency.

Commerce-oriented AVE task evaluation results