Building your Retrieval-Augmented Generation (RAG) for Custom LLMs

Prashanth Basappa
2 min readMay 21, 2024

--

While recently playing with RAGs and understanding why they matter to provide custom data for your LLM, there seems to be plenty of options to enable that. Ragie, the thing I stumbled upon on a podcast seems to be doing just that.

LLMs on their own have limitations, when it comes to providing contextually accurate and up-to-date information. This is where Retrieval-Augmented Generation (RAG) comes into play. RAG combines the strengths of retrieval-based and generation-based approaches to enhance the capabilities of LLMs. RAG addresses the limitations by allowing LLMs to fetch relevant information from external databases, ensuring that the responses are accurate, current, and contextually relevant. This approach also reduces the chances of the model “hallucinating” or making up information. Let’s see how we can build this up quickly with a terminal (virtual python env) and a OpenAI subscription:

Building a Custom LLM with Ragie

Ragie is a powerful tool that simplifies the process of building a custom LLM using RAG. Here’s a step-by-step guide to help you get started:

Step 1: Set Up Your Environment

python
# Example setup
pip install openai
pip install llama-index

Step 2: Prepare Your Data

The first step in building a RAG system is to prepare your data. This involves collecting documents or data that you want your LLM to use as a reference.

python
# Load your documents
documents = ["user_manual_1.txt", "user_manual_2.txt"] # This is your custom data

Step 3: Create Embeddings

Next, convert your documents into embeddings. Embeddings are numerical representations of your text data that capture the context and meaning of the content.

python
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex
# Create embeddings
reader = SimpleDirectoryReader("path_to_your_documents")
index = GPTVectorStoreIndex.from_documents(reader.load_data())

Step 4: Build the RAG Pipeline

Now, set up the RAG pipeline. This involves creating a query engine that can retrieve relevant information from your embeddings and augment the LLM’s responses.

python
# Initialize the query engine
query_engine = index.as_query_engine(similarity_top_k=3)
# Example query
query = "How do I reset my device?"
response = query_engine.query(query)
print(response)

Step 5: Integrate with Your LLM

Finally, integrate the retrieval component with your LLM to generate responses that are both contextually relevant and accurate.

python
import openai
# Function to generate response using RAG
def generate_response(query):
retrieved_info = query_engine.query(query)
prompt = f"Using the following information: {retrieved_info}, please answer the question: {query}"
response = openai.Completion.create(
engine="text-davinci-003",
prompt=prompt,
max_tokens=150
)
return response.choices[0].text.strip()

# Example usage
query = "How do I reset my device?"
print(generate_response(query))

Conclusion

By following these steps, you can build a custom LLM that leverages the power of RAG to provide accurate, contextually relevant responses. This approach will help your LLM stay up to date with the latest information. Think of it as a bridge between static knowledge and real-time information lookup.

--

--