{"id":2044,"date":"2025-02-25T07:03:32","date_gmt":"2025-02-25T07:03:32","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/02\/25\/enhancing-rag-beyond-vanilla-approaches\/"},"modified":"2025-02-25T07:03:32","modified_gmt":"2025-02-25T07:03:32","slug":"enhancing-rag-beyond-vanilla-approaches","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/02\/25\/enhancing-rag-beyond-vanilla-approaches\/","title":{"rendered":"Enhancing RAG: Beyond Vanilla Approaches"},"content":{"rendered":"<p>    Enhancing RAG: Beyond Vanilla Approaches<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-paragraph\">Retrieval-Augmented Generation (RAG) is a powerful technique that enhances language models by incorporating external information retrieval mechanisms. While standard RAG implementations improve response relevance, they often struggle in complex retrieval scenarios. This article explores the limitations of a vanilla RAG setup and introduces advanced techniques to enhance its accuracy and efficiency.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The Challenge with Vanilla RAG<\/strong><\/h3>\n<p class=\"wp-block-paragraph\">To illustrate RAG\u2019s limitations, consider a simple experiment where we attempt to retrieve relevant information from a set of documents. Our dataset includes:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">A primary document discussing best practices for staying healthy, productive, and in good shape.<\/li>\n<li class=\"wp-block-list-item\">Two additional documents on unrelated topics, but contain some similar words used in different contexts.<\/li>\n<\/ul>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\">main_document_text = \"\"\"\nMorning Routine (5:30 AM - 9:00 AM)\n<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2705.png?ssl=1\" alt=\"\u2705\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> Wake Up Early - Aim for 6-8 hours of sleep to feel well-rested.\n<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2705.png?ssl=1\" alt=\"\u2705\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> Hydrate First - Drink a glass of water to rehydrate your body.\n<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2705.png?ssl=1\" alt=\"\u2705\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> Morning Stretch or Light Exercise - Do 5-10 minutes of stretching or a short workout to activate your body.\n<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2705.png?ssl=1\" alt=\"\u2705\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> Mindfulness or Meditation - Spend 5-10 minutes practicing mindfulness or deep breathing.\n<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2705.png?ssl=1\" alt=\"\u2705\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> Healthy Breakfast - Eat a balanced meal with protein, healthy fats, and fiber.\n<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2705.png?ssl=1\" alt=\"\u2705\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> Plan Your Day - Set goals, review your schedule, and prioritize tasks.\n...\n\"\"\"<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Using a standard RAG setup, we query the system with:<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><em>What should I do to stay healthy and productive?<\/em><\/li>\n<li class=\"wp-block-list-item\"><em>What are the best practices to stay healthy and productive?<\/em><\/li>\n<\/ol>\n<h3 class=\"wp-block-heading\"><strong>Helper Functions<\/strong><\/h3>\n<p class=\"wp-block-paragraph\">To enhance retrieval accuracy and streamline query processing, we implement a set of essential helper functions. These functions serve various purposes, from querying the ChatGPT API to computing document embeddings and similarity scores. By leveraging these functions, we create a more efficient RAG pipeline that effectively retrieves the most relevant information for user queries.<\/p>\n<p class=\"wp-block-paragraph\">To support our RAG improvements, we define the following helper functions:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># **Imports**\nimport os\nimport json\nimport openai\nimport numpy as np\nfrom scipy.spatial.distance import cosine\nfrom google.colab import userdata\n\n# Set up OpenAI API key\nos.environ[\"OPENAI_API_KEY\"] = userdata.get('AiTeam')<\/code><\/pre>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def query_chatgpt(prompt, model=\"gpt-4o\", response_format=openai.NOT_GIVEN):\n\u00a0 \u00a0 try:\n\u00a0 \u00a0 \u00a0 \u00a0 response = client.chat.completions.create(\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 model=model,\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 messages=[{\"role\": \"user\", \"content\": prompt}],\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 temperature=0.0 , # Adjust for more or less creativity\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 response_format=response_format\n\u00a0 \u00a0 \u00a0 \u00a0 )\n\u00a0 \u00a0 \u00a0 \u00a0 return response.choices[0].message.content.strip()\n\u00a0 \u00a0 except Exception as e:\n\u00a0 \u00a0 \u00a0 \u00a0 return f\"Error: {e}\"<\/code><\/pre>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def get_embedding(text, model=\"text-embedding-3-large\"): #\"text-embedding-ada-002\"\n    \"\"\"Fetches the embedding for a given text using OpenAI's API.\"\"\"\n    response = client.embeddings.create(\n        input=[text],\n        model=model\n    )\n    return response.data[0].embedding<\/code><\/pre>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def compute_similarity_metrics(embed1, embed2):\n    \"\"\"Computes different similarity\/distance metrics between two embeddings.\"\"\"\n    cosine_sim = 1- cosine(embed1, embed2)  # Cosine similarity\n\n    return cosine_sim<\/code><\/pre>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def fetch_similar_docs(query, docs, threshold = .55, top=1):\n  query_em = get_embedding(query)\n  data = []\n  for d in docs:\n    # Compute and print similarity metrics\n    similarity_results = compute_similarity_metrics(d[\"embedding\"], query_em)\n    if(similarity_results &gt;= threshold):\n      data.append({\"id\":d[\"id\"], \"ref_doc\":d.get(\"ref_doc\", \"\"), \"score\":similarity_results})\n\n  # Sorting by value (second element in each tuple)\n  sorted_data = sorted(data, key=lambda x: x[\"score\"], reverse=True)  # Ascending order\n  sorted_data = sorted_data[:min(top, len(sorted_data))]\n  return sorted_data<\/code><\/pre>\n<h2 class=\"wp-block-heading\"><strong>Evaluating the Vanilla RAG<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">To evaluate the effectiveness of a vanilla RAG setup, we conduct a simple test using predefined queries. Our goal is to determine whether the system retrieves the most relevant document based on semantic similarity. We then analyze the limitations and explore possible improvements.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">\"\"\"# **Testing Vanilla RAG**\"\"\"\n\nquery = \"what should I do to stay healthy and productive?\"\nr = fetch_similar_docs(query, docs)\nprint(\"query = \", query)\nprint(\"documents = \", r)\n\nquery = \"what are the best practices to stay healthy and productive ?\"\nr = fetch_similar_docs(query, docs)\nprint(\"query = \", query)\nprint(\"documents = \", r)<\/code><\/pre>\n<h2 class=\"wp-block-heading\"><strong>Advanced Techniques for Improved RAG<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">To further refine the retrieval process, we introduce advanced functions that enhance the capabilities of our RAG system. These functions generate structured information that aids in document retrieval and query processing, making our system more robust and context-aware.<\/p>\n<p class=\"wp-block-paragraph\">To address these challenges, we implement three key enhancements:<\/p>\n<h4 class=\"wp-block-heading\"><strong>1. Generating FAQs<\/strong><\/h4>\n<p class=\"wp-block-paragraph\">By automatically creating a list of frequently asked questions related to a document, we expand the range of potential queries the model can match. These FAQs are generated once and stored alongside the document, providing a richer search space without incurring ongoing costs.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def generate_faq(text):\n  prompt = f'''\n  given the following text: \"\"\"{text}\"\"\"\n  Ask relevant simple atomic questions ONLY (don't answer them) to cover all subjects covered by the text. Return the result as a json list example [q1, q2, q3...]\n  '''\n  return query_chatgpt(prompt, response_format={ \"type\": \"json_object\" })<\/code><\/pre>\n<h4 class=\"wp-block-heading\"><strong>2. Creating an Overview<\/strong><\/h4>\n<p class=\"wp-block-paragraph\">A high-level summary of the document helps capture its core ideas, making retrieval more effective. By embedding the overview alongside the document, we provide additional entry points for relevant queries, improving match rates.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def generate_overview(text):\n\u00a0 prompt = f'''\n\u00a0 given the following text: \"\"\"{text}\"\"\"\n\u00a0 Generate an abstract for it that tells in maximum 3 lines what is it about and use high level terms that will capture the main points,\n\u00a0 Use terms and words that will be most likely used by average person.\n\u00a0 '''\n\u00a0 return query_chatgpt(prompt)<\/code><\/pre>\n<h4 class=\"wp-block-heading\"><strong>3. Query Decomposition<\/strong><\/h4>\n<p class=\"wp-block-paragraph\">Instead of searching with broad user queries, we break them down into smaller, more precise sub-queries. Each sub-query is then compared against our enhanced document collection, which now includes:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">The original document<\/li>\n<li class=\"wp-block-list-item\">The generated FAQs<\/li>\n<li class=\"wp-block-list-item\">The generated overview<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">By merging the retrieval results from these multiple sources, we significantly improve the likelihood of finding relevant information.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def decompose_query(query):\n  prompt = f'''\n  Given the user query: \"\"\"{query}\"\"\"\nbreak it down into smaller, relevant subqueries\nthat can retrieve the best information for answering the original query.\nReturn them as a ranked json list example [q1, q2, q3...].\n'''\n  return query_chatgpt(prompt, response_format={ \"type\": \"json_object\" })<\/code><\/pre>\n<h2 class=\"wp-block-heading\"><strong>Evaluating the Improved RAG<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Implementing these techniques, we re-run our initial queries. This time, query decomposition generates several sub-queries, each focusing on different aspects of the original question. As a result, our system successfully retrieves relevant information from both the FAQ and the original document, demonstrating a substantial improvement over the vanilla RAG approach.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">\"\"\"# **Testing Advanced Functions**\"\"\"\n\n## Generate overview of the document\noverview_text = generate_overview(main_document_text)\nprint(overview_text)\n# generate embedding\ndocs.append({\"id\":\"overview_text\", \"ref_doc\": \"main_document_text\", \"embedding\":get_embedding(overview_text)})\n\n\n## Generate FAQ for the document\nmain_doc_faq_arr = generate_faq(main_document_text)\nprint(main_doc_faq_arr)\nfaq =json.loads(main_doc_faq_arr)[\"questions\"]\n\nfor f, i in zip(faq, range(len(faq))):\n  docs.append({\"id\": f\"main_doc_faq_{i}\", \"ref_doc\": \"main_document_text\", \"embedding\":  get_embedding(f)})\n\n\n## Decompose the 1st query\nquery = \"what should I do to stay healty and productive?\"\nsubqueries = decompose_query(query)\nprint(subqueries)\n\n\n\n\nsubqueries_list = json.loads(subqueries)['subqueries']\n\n\n## compute the similarities between the subqueries and documents, including FAQ\nfor subq in subqueries_list:\n  print(\"query = \", subq)\n  r = fetch_similar_docs(subq, docs, threshold=.55, top=2)\n  print(r)\n  print('=================================n')\n\n\n## Decompose the 2nd query\nquery = \"what the best practices to stay healty and productive?\"\nsubqueries = decompose_query(query)\nprint(subqueries)\n\nsubqueries_list = json.loads(subqueries)['subqueries']\n\n\n## compute the similarities between the subqueries and documents, including FAQ\nfor subq in subqueries_list:\n  print(\"query = \", subq)\n  r = fetch_similar_docs(subq, docs, threshold=.55, top=2)\n  print(r)\n  print('=================================n')<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Here are some of the FAQ that were generated:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-json\">{\n  \"questions\": [\n    \"How many hours of sleep are recommended to feel well-rested?\",\n    \"How long should you spend on morning stretching or light exercise?\",\n    \"What is the recommended duration for mindfulness or meditation in the morning?\",\n    \"What should a healthy breakfast include?\",\n    \"What should you do to plan your day effectively?\",\n    \"How can you minimize distractions during work?\",\n    \"How often should you take breaks during work\/study productivity time?\",\n    \"What should a healthy lunch consist of?\",\n    \"What activities are recommended for afternoon productivity?\",\n    \"Why is it important to move around every hour in the afternoon?\",\n    \"What types of physical activities are suggested for the evening routine?\",\n    \"What should a nutritious dinner include?\",\n    \"What activities can help you reflect and unwind in the evening?\",\n    \"What should you do to prepare for sleep?\",\n    \u2026\n  ]\n}<\/code><\/pre>\n<h2 class=\"wp-block-heading\"><strong>Cost-Benefit Analysis<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">While these enhancements introduce an upfront processing cost\u2014generating FAQs, overviews, and embeddings\u2014this is a one-time cost per document. In contrast, a poorly optimized RAG system would lead to two major inefficiencies:<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Frustrated users due to low-quality retrieval.<\/li>\n<li class=\"wp-block-list-item\">Increased query costs from retrieving excessive, loosely related documents.<\/li>\n<\/ol>\n<p class=\"wp-block-paragraph\">For systems handling high query volumes, these inefficiencies compound quickly, making preprocessing a worthwhile investment.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">By integrating document preprocessing (FAQs and overviews) with query decomposition, we create a more intelligent RAG system that balances accuracy and cost-effectiveness. This approach enhances retrieval quality, reduces irrelevant results, and ensures a better user experience.<\/p>\n<p class=\"wp-block-paragraph\">As RAG continues to evolve, these techniques will be instrumental in refining AI-driven retrieval systems. Future research may explore further optimizations, including dynamic thresholding and reinforcement learning for query refinement.<\/p>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/enhancing-rag-beyond-vanilla-approaches\/\">Enhancing RAG: Beyond Vanilla Approaches<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Ziad SALLOUM<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/enhancing-rag-beyond-vanilla-approaches\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Enhancing RAG: Beyond Vanilla Approaches Retrieval-Augmented Generation (RAG) is a powerful technique that enhances language models by incorporating external information retrieval mechanisms. While standard RAG implementations improve response relevance, they often struggle in complex retrieval scenarios. This article explores the limitations of a vanilla RAG setup and introduces advanced techniques to enhance its accuracy and [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,71,1839,70,1841,1771,1648],"tags":[1842,628,362],"class_list":["post-2044","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-large-language-models","category-llms","category-machine-learning","category-model-optimization","category-prompt-engineering","category-retrieval-augmented","tag-healthy","tag-import","tag-rag"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/2044"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=2044"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/2044\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=2044"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=2044"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=2044"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}