Building end-to-end RAG system using Solar LLM and MongoDB Atlas

June 30, 2024 Hailey(박성민) .

2024/07/01 | Written by: Juhyung Son

Introduction

As an enterprise AI startup building large-scale AI solutions for leading enterprises globally, Upstage AI offers a suite of full-stack LLM components, including Document AI (converting unstructured documents into machine-readable data through key information extractor, layout analysis, and OCR), embedding models, Solar LLM, and Groundedness Checker. With these components and a powerful vector DB like MongoDB Atlas, developers can build a Retrieval Augmented Generation (RAG) application.

RAG applications are particularly useful for enterprise users like Upstage. They generate answers from LLMs based on the data retrieved from the private database, which lowers the risk of hallucination and increases productivity.

In this blog, we will explain how enterprises can build an internal RAG application. We will use Solar LLM and MongoDB Atlas to improve access to information for employees and increase work productivity. The RAG system will find relevant information from separate sources like Google Workspace, Notion, GitHub, and Linear. It will then use this data to generate answers. The end interface will be a chatbot. Employees can ask questions and search for information. At Upstage, we also developed the application "Kim Solar" for internal use, which we will reference as an example in this blog.

Now, let’s explore Solar and MongoDB Atlas in action!

MongoDB Atlas and Solar

The core of the RAG system has three parts. First, a vector database that stores documents as vectors. Second, an embedding model that embeds documents into vectors. Third, an LLM that generates answers. Upstage and MongoDB are part of developing key components of RAG. Upstage provides document vectorization embeddings and a specialized LLM for RAG. MongoDB has integrated vector database capabilities for RAG into its document database Atlas.

MongoDB Atlas

Atlas enhances traditional document-based databases with vector search capabilities, providing many advantages. The project tapped into these advantages completely. To call out a few key features that are particularly beneficial for this project. These benefits arise because MongoDB is not only a vector DB, but also a document-based database capable of full-text search.

Rich features combining benefits from traditional document-based databases and vector search.
Supports Lucene.
A fully managed service, which is reliable and saves operational costs.
Familiar and flexible queries can reduce switching costs.

Solar Embedding

Solar Embedding is a Solar-based embedding model with 32k context support. Solar Embedding outperforms other embedding models by leaps and bounds. It is strong in English, Korean, and Japanese. More and more customers are adopting Solar Embedding in RAG because of its performance. [ → Learn more about Solar Embedding.]

Solar LLM

Solar LLM is a super fast and high performance LLM model. Also, there are extra features that are beneficial for RAG use-cases, such as Layout Analysis and Groundedness Check. Moreover, LangChain and LlamaIndex have added Solar as an official partner package. This allows seamless and easy use within these frameworks.

Building RAG system

Solar and Atlas in open source integration

Upstage's Solar Embedding and Solar LLM are compatible with the OpenAI package. Also, they integrate seamlessly into LangChain and LlamaIndex. Below is an example of using LlamaIndex for embedding and chat. For more details, see the Upstage package documentation in Upstage LangChain and Upstage LlamaIndex. Also, see the MongoDB documentation in Atlas LangChain and Atlas LlamaIndex.

Setting up Solar and MongoDB in LlamaIndex

Solar and Atlas are compatible with LlamaIndex. To use Solar and Atlas with LlamaIndex, perform the initial setup as shown below.

See this content in the original post

Embedding with Solar Embedding

In the RAG system, documents are split into smaller chunks for some reasons. First, the embedding model embeds text of reasonable length into a vector with more precise semantics, rather than an entire document. Second, in order for the LLM model to give an accurate answer, it's better to provide context for the most important parts of the text without unnecessary parts. Besides, there is a cost benefit to having fewer tokens.

There are many strategies for splitting a document into chunks. The strategy may vary depending on the document type, format, or semantic content. In Kim Solar, we use semantic chunking, which uses embedding similarity to adaptively select breakpoints between sentences. This method composes chunks of sentences that are semantically similar.

See this content in the original post

Indexing with Atlas

When storing documents in the vector database, we've incorporated both vectors and metadata. The metadata has extra information about the document, such as the title, author and creation date. Atlas is basically a document database. It provides a range of Index Analyzers powered by Lucene. They work for fields other than embedding vectors. The two index types, "search" and "vectorSearch," help diverse retrieval strategies. In this project, we've used vector index as well as indices for document titles and creators.

Retrieval with Atlas

In the RAG system, retrieval plays a crucial role in searching for the most relevant documents to a user's query. Atlas offers both text searching and vector searching at the database level using Lucene. Lucene allows for complex document searches using MongoDB queries. It's also integrated with LLM frameworks, like LangChain and LlamaIndex. You can use it with these frameworks.

In this RAG system, we've searched for documents using a hybrid search method. Documents are first filtered through vector searches using embedded vectors and bm25 searches with Lucene from Atlas. Then, the RRF (Reciprocal Rank Fusion) algorithm selects the final documents.

We looked at both LlamaIndex and MongoDB queries, assessing the merits and limits of each method. The latter section shows how our RAG system retrieves data. It uses both LlamaIndex and Atlas queries.

See this content in the original post

As such, Atlas is well integrated with most RAG frameworks, making it easy to use. If you need more complex queries or need features not supported by the framework, you can use MongoDB native queries.

See this content in the original post

Chat with Solar

Solar receives the user's query and the chunks selected by the retriever as context. The key is the LLM model's ability to respond with only the given information. It must not use past knowledge, which can create an illusion.

See this content in the original post

Examples from Kim Solar

As a result, the Kim Solar allows for seamless and effortless access to information within the Slack platform. Users can ask the bot for information using mentions. The bot will get relevant data from sources like Linear, Google Drive, and GitHub.

Conclusion

This article covered Solar and Atlas. Both performed embedding, storing, and retrieving documents, and generating responses. We've used these two great products. They let us build a stable RAG system that's ready for production. This RAG system accelerates work output, increasing productivity. We are excited about developing this project to achieve AGI (Artificial General Intelligence) for Work.

See this content in the original post