The dawn of Generative AI makes possible new kinds of capabilities for the applications we build. LLMs can answer the user’s questions with incredible skill. So, why not use them as part of our systems? If the user needs help getting around the app, we can put a chat function where the LLM will answer all the user’s questions. If our app has blog posts explaining important concepts, instead of making the user read all of them to get the knowledge it needs, it could just ask and get an immediate response.
Why RAG?
We decide to integrate an LLM into our app to bring these features to our users. However, we soon find that the model can’t answer the user’s questions. It doesn’t have any information about our application! If the information needed to answer is not in the LLM’s training data, it can’t be answered. Even worse, if it doesn’t know the answer, it might hallucinate a completely wrong fact! This is bad, so how do we fix this? LLMs with the Transformer architecture have shown great in-context learning capabilities. So, we just have to pass all the facts that it needs in the prompt, together with the question! Uh oh, it will definitely be expensive to stuff all the data in every prompt. So, how do we do it?
