Building end-to-end RAG system using Solar LLM and MongoDB Atlas

Product

Jul 1

2024/07/01 | Written by: Juhyung Son

Introduction

As an enterprise AI startup building large-scale AI solutions for leading enterprises globally, Upstage AI offers a suite of full-stack LLM components, including Document AI (converting unstructured documents into machine-readable data through key information extractor, layout analysis, and OCR), embedding models, Solar LLM, and Groundedness Checker. With these components and a powerful vector DB like MongoDB Atlas, developers can build a Retrieval Augmented Generation (RAG) application.

RAG applications are particularly useful for enterprise users like Upstage. They generate answers from LLMs based on the data retrieved from the private database, which lowers the risk of hallucination and increases productivity.

In this blog, we will explain how enterprises can build an internal RAG application. We will use Solar LLM and MongoDB Atlas to improve access to information for employees and increase work productivity. The RAG system will find relevant information from separate sources like Google Workspace, Notion, GitHub, and Linear. It will then use this data to generate answers. The end interface will be a chatbot. Employees can ask questions and search for information. At Upstage, we also developed the application "Kim Solar" for internal use, which we will reference as an example in this blog.

Now, let’s explore Solar and MongoDB Atlas in action!

MongoDB Atlas and Solar

The core of the RAG system has three parts. First, a vector database that stores documents as vectors. Second, an embedding model that embeds documents into vectors. Third, an LLM that generates answers. Upstage and MongoDB are part of developing key components of RAG. Upstage provides document vectorization embeddings and a specialized LLM for RAG. MongoDB has integrated vector database capabilities for RAG into its document database Atlas.

MongoDB Atlas

Atlas enhances traditional document-based databases with vector search capabilities, providing many advantages. The project tapped into these advantages completely. To call out a few key features that are particularly beneficial for this project. These benefits arise because MongoDB is not only a vector DB, but also a document-based database capable of full-text search.

Rich features combining benefits from traditional document-based databases and vector search.
Supports Lucene.
A fully managed service, which is reliable and saves operational costs.
Familiar and flexible queries can reduce switching costs.

Solar Embedding

Solar Embedding is a Solar-based embedding model with 32k context support. Solar Embedding outperforms other embedding models by leaps and bounds. It is strong in English, Korean, and Japanese. More and more customers are adopting Solar Embedding in RAG because of its performance. [ → Learn more about Solar Embedding.]

Solar LLM

Solar LLM is a super fast and high performance LLM model. Also, there are extra features that are beneficial for RAG use-cases, such as Layout Analysis and Groundedness Check. Moreover, LangChain and LlamaIndex have added Solar as an official partner package. This allows seamless and easy use within these frameworks.

Building RAG system

Solar and Atlas in open source integration

Upstage's Solar Embedding and Solar LLM are compatible with the OpenAI package. Also, they integrate seamlessly into LangChain and LlamaIndex. Below is an example of using LlamaIndex for embedding and chat. For more details, see the Upstage package documentation in Upstage LangChain and Upstage LlamaIndex. Also, see the MongoDB documentation in Atlas LangChain and Atlas LlamaIndex.

Setting up Solar and MongoDB in LlamaIndex

Solar and Atlas are compatible with LlamaIndex. To use Solar and Atlas with LlamaIndex, perform the initial setup as shown below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from llama_index.core import StorageContext
from llama_index.core.settings import Settings
from llama_index.llms.upstage import Upstage
from llama_index.embeddings.upstage import UpstageEmbedding
from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
 
 
embed_model = UpstageEmbedding(
        api_key=UPSTAGE_API_KEY,
    )
llm = Upstage(api_key=UPSTAGE_API_KEY)
 
Settings.embed_model = embed_model
Settings.node_parser = SemanticSplitterNodeParser(
       buffer_size=1, embed_model=embed_model
   )
 
Settings.llm = llm
 
mongodb_client = pymongo.MongoClient(mongodb_url)
vector_store = MongoDBAtlasVectorSearch(
       mongodb_client=mongodb_client,
       db_name=db_name,
       collection_name=collection_name,
       index_name="vector_index",
)
 
storage_context = StorageContext.from_defaults(
    vector_store=vector_store,
)
Colored by Color Scripter
cs

Embedding with Solar Embedding

In the RAG system, documents are split into smaller chunks for some reasons. First, the embedding model embeds text of reasonable length into a vector with more precise semantics, rather than an entire document. Second, in order for the LLM model to give an accurate answer, it's better to provide context for the most important parts of the text without unnecessary parts. Besides, there is a cost benefit to having fewer tokens.

There are many strategies for splitting a document into chunks. The strategy may vary depending on the document type, format, or semantic content. In Kim Solar, we use semantic chunking, which uses embedding similarity to adaptively select breakpoints between sentences. This method composes chunks of sentences that are semantically similar.

1
2
3
4
5
6
7
8
9
10
nodes = Settings.node_parser.get_nodes_from_documents(documents, show_progress=True)
 
 
embeddings = Settings.embed_model.get_text_embedding_batch(
                [node.text for node in batch_nodes],
                show_progress=True
             )
for node, embedding in zip(nodes, embeddings):
    node.embedding = embedding
storage_context.vector_store.add(nodes)
Colored by Color Scripter
cs

Indexing with Atlas

When storing documents in the vector database, we've incorporated both vectors and metadata. The metadata has extra information about the document, such as the title, author and creation date. Atlas is basically a document database. It provides a range of Index Analyzers powered by Lucene. They work for fields other than embedding vectors. The two index types, "search" and "vectorSearch," help diverse retrieval strategies. In this project, we've used vector index as well as indices for document titles and creators.

Retrieval with Atlas

In the RAG system, retrieval plays a crucial role in searching for the most relevant documents to a user's query. Atlas offers both text searching and vector searching at the database level using Lucene. Lucene allows for complex document searches using MongoDB queries. It's also integrated with LLM frameworks, like LangChain and LlamaIndex. You can use it with these frameworks.

In this RAG system, we've searched for documents using a hybrid search method. Documents are first filtered through vector searches using embedded vectors and bm25 searches with Lucene from Atlas. Then, the RRF (Reciprocal Rank Fusion) algorithm selects the final documents.

We looked at both LlamaIndex and MongoDB queries, assessing the merits and limits of each method. The latter section shows how our RAG system retrieves data. It uses both LlamaIndex and Atlas queries.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
index = VectorStoreIndex.from_vector_store(
  vector_store=storage_context.vector_store, 
  embed_model=Settings.embed_model
)
 
vector_retriever = VectorIndexRetriever(index=index, similarity_top_k=7)
 
 
bm25_retriever = MongoDBAtlasBM25Retriever(
  mongodb_client=storage_context.vector_store.client,
  db_name=db_name,
  collection_name=collection_name,
  index_name="title_index",
  text_key="text",
  similarity_top_k=7
)
 
retriever = QueryFusionRetriever(
  retrievals=[vector_retriever, bm25_retriever],
  similarity_top_k=3,
  mode="reciprocal_rerank"
)
 
Colored by Color Scripter
cs

As such, Atlas is well integrated with most RAG frameworks, making it easy to use. If you need more complex queries or need features not supported by the framework, you can use MongoDB native queries.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
mongodb_client.aggregate(
        [
            {
                # $vectorSearch stage to search the embedding field for the query     specified as vector embeddings in the queryVector field of the query.
                # The query specifies a search for up to 20 nearest neighbors and limit the results to 7 documents only. This stage returns the sorted documents from the semantic search in the results.
                "$vectorSearch": {
                    "index": "title_index",
                    "path": "embedding",
                    "queryVector": UpstageEmbeddings(
                        model="solar-embedding-1-large"
                    ).embed_query(query),
                    "numCandidates": 20,
                    "limit": 7,
                }
            },
            {
                 # $group stage to group all the documents in the results from the semantic search in a field named docs.
                "$group": {"_id": None, "docs": {"$push": "$$ROOT"}}
            },
            {
                 # $unwind stage to unwind the array of documents in the docs field and store the position of the document in the results array in a field named rank.
                "$unwind": {"path": "$docs", "includeArrayIndex": "rank"}
            },
            {
             
                # $addFields stage to add a new field named vs_score that contains the reciprocal rank score for each document in the results.
                # Here, reciprocal rank score is calculated by dividing 1.0 by the sum of rank, the vector_penalty weight, and a constant value of 1.
                "$addFields": {
                    "vs_score": {
                        "$divide": [1.0, {"$add": ["$rank", vector_penalty, 1]}]
                    }
                }
            },
            {
                # $project stage to include only the following fields in the results: vs_score, _id, title, text
                "$project": {
                    "vs_score": 1,
                    "_id": "$docs._id",
                    "title": "$docs.title",
                    "text": "$docs.text",
                }
            },
            {
                # add text search result
                # $unionWith stage to combine the results from the preceding stages with the results of the following stages in the sub-pipeline
                "$unionWith": {
                    "coll": "documents",
                    "pipeline": [
                        {
                            # $search stage to search for text that contains the query in the text field. This stage returns the sorted documents from the keyword search in the results.
                            "$search": {
                                "index": "text",
                                "phrase": {"query": query, "path": "text"},
                            }
                        },
                        {
                            "$limit": 7
                        },
                        {
                            "$group": {"_id": None, "docs": {"$push": "$$ROOT"}}
                        },
                        {
                            "$unwind": {"path": "$docs", "includeArrayIndex":                          "rank"}
                        },
                        {
                            # Here, reciprocal rank score is calculated by dividing 1.0 by the sum of the value of rank, the full_text penalty weight, and a constant value of 1.
                            "$addFields": {
                                "kws_score": {
                                    "$divide": [
                                        1.0,
                                        {"$add": ["$rank", keyword_penalty, 1]},
                                    ]
                                }
                            }
                        },
                        {
                            "$project": {
                                "kws_score": 1,
                                "_id": "$docs._id",
                                "title": "$docs.title",
                                "text": "$docs.text",
                            }
                        },
                    ],
                }
            },
            {
                # $project stage to include only the following fields in the results: _id, title, text, vs_score, kws_score
                "$project": {
                    "title": 1,
                    "vs_score": {"$ifNull": ["$vs_score", 0]},
                    "kws_score": {"$ifNull": ["$kws_score", 0]},
                    "text": 1,
                }
            },
            {
                # $project stage to add a field named score that contains the sum of vs_score and kws_score to the results.
                "$project": {
                    "score": {"$add": ["$kws_score", "$vs_score"]},
                    "title": 1,
                    "vs_score": 1,
                    "kws_score": 1,
                    "text": 1
                }
            },
            # $sort stage to sort the results by score in descending order.
            {"$sort": {"score": -1}},
            #   $limit stage to limit the output to 10 results only.
            {"$limit": 3},
        ]
    )
 
 
Colored by Color Scripter
cs

Chat with Solar

Solar receives the user's query and the chunks selected by the retriever as context. The key is the LLM model's ability to respond with only the given information. It must not use past knowledge, which can create an illusion.

1
2
3
4
5
6
7
query_engine = RetrieverQueryEngine.from_args(
    retriever,
    llm=Settings.llm,
    response_mode="tree_summarize",
    )
 
query_engine.query("what is max token size of solar embedding model?")
cs

Examples from Kim Solar

As a result, the Kim Solar allows for seamless and effortless access to information within the Slack platform. Users can ask the bot for information using mentions. The bot will get relevant data from sources like Linear, Google Drive, and GitHub.

Conclusion

This article covered Solar and Atlas. Both performed embedding, storing, and retrieving documents, and generating responses. We've used these two great products. They let us build a stable RAG system that's ready for production. This RAG system accelerates work output, increasing productivity. We are excited about developing this project to achieve AGI (Artificial General Intelligence) for Work.

✨ Try Solar

←SolarProductRAG

Hailey(박성민) .