{"id":756,"date":"2024-12-23T07:02:37","date_gmt":"2024-12-23T07:02:37","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2024\/12\/23\/an-agentic-approach-to-reducing-llm-hallucinations-f7ffd6eedcf2\/"},"modified":"2024-12-23T07:02:37","modified_gmt":"2024-12-23T07:02:37","slug":"an-agentic-approach-to-reducing-llm-hallucinations-f7ffd6eedcf2","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2024\/12\/23\/an-agentic-approach-to-reducing-llm-hallucinations-f7ffd6eedcf2\/","title":{"rendered":"An Agentic Approach to Reducing LLM Hallucinations"},"content":{"rendered":"<p>    An Agentic Approach to Reducing LLM Hallucinations<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h4>Simple techniques to alleviate LLM hallucinations using LangGraph<\/h4>\n<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/0*fNRfvA4TOQuHOzfh\"><figcaption>Photo by <a href=\"https:\/\/unsplash.com\/@grakozy?utm_source=medium&amp;utm_medium=referral\">Greg Rakozy<\/a> on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\">Unsplash<\/a><\/figcaption><\/figure>\n<p>If you\u2019ve worked with LLMs, you know they can sometimes hallucinate. This means they generate text that\u2019s either nonsensical or contradicts the input data. It\u2019s a common issue that can hurts the reliability of LLM-powered applications.<\/p>\n<p>In this post, we\u2019ll explore a few simple techniques to reduce the likelihood of hallucinations. By following these tips, you can (hopefully) improve the accuracy of your AI applications.<\/p>\n<p>There are multiple types of hallucinations:<\/p>\n<ul>\n<li>\n<a href=\"https:\/\/arxiv.org\/pdf\/2311.05232\">Intrinsic hallucinations<\/a>: the LLM\u2019s response contradicts the user-provided context. This is when the response is verifiably wrong withing the current\u00a0context.<\/li>\n<li>\n<a href=\"https:\/\/arxiv.org\/pdf\/2311.05232\">Extrinsic hallucinations<\/a>: the LLM\u2019s response cannot be verified using the user-provided context. This is when the response may or may not be wrong but we have no way of confirming that using the current\u00a0context.<\/li>\n<li>Incoherent hallucinations: the LLM\u2019s response does not answer the question or does not make sense. This is when the LLM is unable to follow the instructions.<\/li>\n<\/ul>\n<p>In this post, we will target all the types mentioned above.<\/p>\n<p>We will list out a set of tips and tricks that work in different ways in reducing hallucinations.<\/p>\n<h4>Tip 1: Use Grounding<\/h4>\n<p>Grounding is using in-domain relevant additional context in the input of the LLM when asking it to do a task. This gives the LLM the information it needs to correctly answer the question and reduces the likelihood of a hallucination. This is one the reason we use Retrieval augmented generation (RAG).<\/p>\n<p>For example asking the LLM a math question OR asking it the same question while providing it with relevant sections of a math book will yield different results, with the second option being more likely to be\u00a0right.<\/p>\n<p>Here is an example of such implementation in one of my previous tutorials where I provide document-extracted context when asking a question:<\/p>\n<p><a href=\"https:\/\/towardsdatascience.com\/build-a-document-ai-pipeline-for-any-type-of-pdf-with-gemini-9221c8e143db\">Build a Document AI pipeline for ANY type of PDF With Gemini<\/a><\/p>\n<h4>Tip 2: Use structured outputs<\/h4>\n<p>Using structured outputs means forcing the LLM to output valid JSON or YAML text. This will allow you to reduce the useless ramblings and get \u201cstraight-to-the-point\u201d answers about what you need from the LLM. It also will help with the next tips as it makes the LLM responses easier to\u00a0verify.<\/p>\n<p>Here is how you can do this with Gemini\u2019s\u00a0API:<\/p>\n<pre>import json<br><br>import google.generativeai as genai<br>from pydantic import BaseModel, Field<br><br>from document_ai_agents.schema_utils import prepare_schema_for_gemini<br><br><br>class Answer(BaseModel):<br>    answer: str = Field(..., description=\"Your Answer.\")<br><br><br>model = genai.GenerativeModel(\"gemini-1.5-flash-002\")<br><br>answer_schema = prepare_schema_for_gemini(Answer)<br><br><br>question = \"List all the reasons why LLM hallucinate\"<br><br>context = (<br>    \"LLM hallucination refers to the phenomenon where large language models generate plausible-sounding but\"<br>    \" factually incorrect or nonsensical information. This can occur due to various factors, including biases\"<br>    \" in the training data, the inherent limitations of the model's understanding of the real world, and the \"<br>    \"model's tendency to prioritize fluency and coherence over accuracy.\"<br>)<br><br>messages = (<br>    [context]<br>    + [<br>        f\"Answer this question: {question}\",<br>    ]<br>    + [<br>        f\"Use this schema for your answer: {answer_schema}\",<br>    ]<br>)<br><br>response = model.generate_content(<br>    messages,<br>    generation_config={<br>        \"response_mime_type\": \"application\/json\",<br>        \"response_schema\": answer_schema,<br>        \"temperature\": 0.0,<br>    },<br>)<br><br>response = Answer(**json.loads(response.text))<br><br>print(f\"{response.answer=}\")<\/pre>\n<p>Where \u201cprepare_schema_for_gemini\u201d is a utility function that prepares the schema to match Gemini\u2019s weird requirements. You can find its definition here:\u00a0<a href=\"https:\/\/github.com\/CVxTz\/document_ai_agents\/blob\/498d8ee6e8597f8ba43b336c64178d186461dba0\/document_ai_agents\/schema_utils.py#L38\">code<\/a>.<\/p>\n<p>This code defines a Pydantic schema and sends this schema as part of the query in the field \u201cresponse_schema\u201d. This forces the LLM to follow this schema in its response and makes it easier to parse its\u00a0output.<\/p>\n<h4>Tip 3: Use chain of thoughts and better prompting<\/h4>\n<p>Sometimes, giving the LLM the space to work out its response, before committing to a final answer, can help produce better quality responses. This technique is called Chain-of-thoughts and is widely used as it is effective and very easy to implement.<\/p>\n<p>We can also explicitly ask the LLM to answer with \u201cN\/A\u201d if it can\u2019t find enough context to produce a quality response. This will give it an easy way out instead of trying to respond to questions it has no answer\u00a0to.<\/p>\n<p>For example, lets look into this simple question and\u00a0context:<\/p>\n<p><strong>Context<\/strong><\/p>\n<blockquote><p>Thomas Jefferson (April 13 [O.S. April 2], 1743\u200a\u2014\u200aJuly 4, 1826) was an American statesman, planter, diplomat, lawyer, architect, philosopher, and Founding Father who served as the third president of the United States from 1801 to 1809.[6] He was the primary author of the Declaration of Independence. Following the American Revolutionary War and before becoming president in 1801, Jefferson was the nation\u2019s first U.S. secretary of state under George Washington and then the nation\u2019s second vice president under John Adams. Jefferson was a leading proponent of democracy, republicanism, and natural rights, and he produced formative documents and decisions at the state, national, and international levels. (Source: Wikipedia)<\/p><\/blockquote>\n<p><strong>Question<\/strong><\/p>\n<blockquote><p>What year did davis jefferson die?<\/p><\/blockquote>\n<p>A naive approach\u00a0yields:<\/p>\n<p><strong>Response<\/strong><\/p>\n<blockquote><p>answer=\u20191826&#8242;<\/p><\/blockquote>\n<p>Which is obviously false as Jefferson Davis is not even mentioned in the context at all. It was Thomas Jefferson that died in\u00a01826.<\/p>\n<p>If we change the schema of the response to use chain-of-thoughts to:<\/p>\n<pre>class AnswerChainOfThoughts(BaseModel):<br>    rationale: str = Field(<br>        ...,<br>        description=\"Justification of your answer.\",<br>    )<br>    answer: str = Field(<br>        ..., description=\"Your Answer. Answer with 'N\/A' if answer is not found\"<br>    )<\/pre>\n<p>We are also adding more details about what we expect as output when the question is not answerable using the context \u201cAnswer with \u2018N\/A\u2019 if answer is not\u00a0found\u201d<\/p>\n<p>With this new approach, we get the following <strong>rationale<\/strong> (remember, chain-of-thought):<\/p>\n<blockquote><p>The provided text discusses Thomas Jefferson, not Jefferson Davis. No information about the death of Jefferson Davis is included.<\/p><\/blockquote>\n<p>And the final\u00a0<strong>answer<\/strong>:<\/p>\n<blockquote><p>answer=\u2019N\/A\u2019<\/p><\/blockquote>\n<p>Great\u00a0! But can we use a more general approach to hallucination detection?<\/p>\n<p>We can, with\u00a0Agents!<\/p>\n<h4>Tip 4: Use an Agentic\u00a0approach<\/h4>\n<p>We will build a simple agent that implements a three-step process:<\/p>\n<ul>\n<li>The first step is to include the context and ask the question to the LLM in order to get the first candidate response and the relevant context that it had used for its\u00a0answer.<\/li>\n<li>The second step is to reformulate the question and the first candidate response as a declarative statement.<\/li>\n<li>The third step is to ask the LLM to verify whether or not the relevant context <strong>entails<\/strong> the candidate response. It is called \u201cSelf-verification\u201d: <a href=\"https:\/\/arxiv.org\/pdf\/2212.09561\">https:\/\/arxiv.org\/pdf\/2212.09561<\/a>\n<\/li>\n<\/ul>\n<p>In order to implement this, we define three nodes in LangGraph. The first node will ask the question while including the context, the second node will reformulate it using the LLM and the third node will check the entailment of the statement in relation to the input\u00a0context.<\/p>\n<p>The first node can be defined as\u00a0follows:<\/p>\n<pre>    def answer_question(self, state: DocumentQAState):<br>        logger.info(f\"Responding to question '{state.question}'\")<br>        assert (<br>            state.pages_as_base64_jpeg_images or state.pages_as_text<br>        ), \"Input text or images\"<br>        messages = (<br>            [<br>                {\"mime_type\": \"image\/jpeg\", \"data\": base64_jpeg}<br>                for base64_jpeg in state.pages_as_base64_jpeg_images<br>            ]<br>            + state.pages_as_text<br>            + [<br>                f\"Answer this question: {state.question}\",<br>            ]<br>            + [<br>                f\"Use this schema for your answer: {self.answer_cot_schema}\",<br>            ]<br>        )<br><br>        response = self.model.generate_content(<br>            messages,<br>            generation_config={<br>                \"response_mime_type\": \"application\/json\",<br>                \"response_schema\": self.answer_cot_schema,<br>                \"temperature\": 0.0,<br>            },<br>        )<br><br>        answer_cot = AnswerChainOfThoughts(**json.loads(response.text))<br><br>        return {\"answer_cot\": answer_cot}<\/pre>\n<p>And the second one\u00a0as:<\/p>\n<pre>    def reformulate_answer(self, state: DocumentQAState):<br>        logger.info(\"Reformulating answer\")<br>        if state.answer_cot.answer == \"N\/A\":<br>            return<br><br>        messages = [<br>            {<br>                \"role\": \"user\",<br>                \"parts\": [<br>                    {<br>                        \"text\": \"Reformulate this question and its answer as a single assertion.\"<br>                    },<br>                    {\"text\": f\"Question: {state.question}\"},<br>                    {\"text\": f\"Answer: {state.answer_cot.answer}\"},<br>                ]<br>                + [<br>                    {<br>                        \"text\": f\"Use this schema for your answer: {self.declarative_answer_schema}\"<br>                    }<br>                ],<br>            }<br>        ]<br><br>        response = self.model.generate_content(<br>            messages,<br>            generation_config={<br>                \"response_mime_type\": \"application\/json\",<br>                \"response_schema\": self.declarative_answer_schema,<br>                \"temperature\": 0.0,<br>            },<br>        )<br><br>        answer_reformulation = AnswerReformulation(**json.loads(response.text))<br><br>        return {\"answer_reformulation\": answer_reformulation}<\/pre>\n<p>The third one\u00a0as:<\/p>\n<pre>    def verify_answer(self, state: DocumentQAState):<br>        logger.info(f\"Verifying answer '{state.answer_cot.answer}'\")<br>        if state.answer_cot.answer == \"N\/A\":<br>            return<br>        messages = [<br>            {<br>                \"role\": \"user\",<br>                \"parts\": [<br>                    {<br>                        \"text\": \"Analyse the following context and the assertion and decide whether the context \"<br>                        \"entails the assertion or not.\"<br>                    },<br>                    {\"text\": f\"Context: {state.answer_cot.relevant_context}\"},<br>                    {<br>                        \"text\": f\"Assertion: {state.answer_reformulation.declarative_answer}\"<br>                    },<br>                    {<br>                        \"text\": f\"Use this schema for your answer: {self.verification_cot_schema}. Be Factual.\"<br>                    },<br>                ],<br>            }<br>        ]<br>    <br>        response = self.model.generate_content(<br>            messages,<br>            generation_config={<br>                \"response_mime_type\": \"application\/json\",<br>                \"response_schema\": self.verification_cot_schema,<br>                \"temperature\": 0.0,<br>            },<br>        )<br>    <br>        verification_cot = VerificationChainOfThoughts(**json.loads(response.text))<br>    <br>        return {\"verification_cot\": verification_cot}<\/pre>\n<p>Full code in <a href=\"https:\/\/github.com\/CVxTz\/document_ai_agents\">https:\/\/github.com\/CVxTz\/document_ai_agents<\/a><\/p>\n<p>Notice how each node uses its own schema for structured output and its own prompt. This is possible due to the flexibility of both Gemini\u2019s API and LangGraph.<\/p>\n<p>Lets work through this code using the same example as above \u27a1\ufe0f<br \/><em>(Note: we are not using chain-of-thought on the first prompt so that the verification gets triggered for our\u00a0tests.)<\/em><\/p>\n<p><strong>Context<\/strong><\/p>\n<blockquote><p>Thomas Jefferson (April 13 [O.S. April 2], 1743\u200a\u2014\u200aJuly 4, 1826) was an American statesman, planter, diplomat, lawyer, architect, philosopher, and Founding Father who served as the third president of the United States from 1801 to 1809.[6] He was the primary author of the Declaration of Independence. Following the American Revolutionary War and before becoming president in 1801, Jefferson was the nation\u2019s first U.S. secretary of state under George Washington and then the nation\u2019s second vice president under John Adams. Jefferson was a leading proponent of democracy, republicanism, and natural rights, and he produced formative documents and decisions at the state, national, and international levels. (Source: Wikipedia)<\/p><\/blockquote>\n<p><strong>Question<\/strong><\/p>\n<blockquote><p>What year did davis jefferson die?<\/p><\/blockquote>\n<p><strong>First node result (First\u00a0answer):<\/strong><\/p>\n<blockquote><p>\n<strong>relevant_context<\/strong>=\u2019Thomas Jefferson (April 13 [O.S. April 2], 1743\u200a\u2014\u200aJuly 4, 1826) was an American statesman, planter, diplomat, lawyer, architect, philosopher, and Founding Father who served as the third president of the United States from 1801 to\u00a01809.\u2019<\/p><\/blockquote>\n<blockquote><p><strong>answer=\u20191826&#8242;<\/strong><\/p><\/blockquote>\n<p><strong>Second node result (Answer Reformulation):<\/strong><\/p>\n<blockquote><p>\n<strong>declarative_answer<\/strong>=\u2019Davis Jefferson died in\u00a01826&#8242;<\/p><\/blockquote>\n<p><strong>Third node result (Verification):<\/strong><\/p>\n<blockquote><p>\n<strong>rationale<\/strong>=\u2019The context states that Thomas Jefferson died in 1826. The assertion states that Davis Jefferson died in 1826. The context does not mention Davis Jefferson, only Thomas Jefferson.\u2019<\/p><\/blockquote>\n<blockquote><p>\n<strong>entailment<\/strong>=\u2019No\u2019<\/p><\/blockquote>\n<p>So the verification step <strong>rejected<\/strong> (<em>No entailment between the two<\/em>) the initial answer. We can now avoid returning a hallucination to the\u00a0user.<\/p>\n<h4>Bonus Tip\u00a0: Use stronger\u00a0models<\/h4>\n<p>This tip is not always easy to apply due to budget or latency limitations but you should know that stronger LLMs are less prone to hallucination. So, if possible, go for a more powerful LLM for your most sensitive use cases. You can check a benchmark of hallucinations here: <a href=\"https:\/\/github.com\/vectara\/hallucination-leaderboard\">https:\/\/github.com\/vectara\/hallucination-leaderboard<\/a>. We can see that the top models in this benchmark (least hallucinations) also ranks at the top of conventional NLP leader\u00a0boards.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/0%2A0lHK7eiqTV6ESKQQ.png?ssl=1\"><figcaption>Source: <a href=\"https:\/\/github.com\/vectara\/hallucination-leaderboard\">https:\/\/github.com\/vectara\/hallucination-leaderboard<\/a> Source License: Apache\u00a02.0<\/figcaption><\/figure>\n<h3>Conclusion<\/h3>\n<p>In this tutorial, we explored strategies to improve the reliability of LLM outputs by reducing the hallucination rate. The main recommendations include careful formatting and prompting to guide LLM calls and using a workflow based approach where Agents are designed to verify their own\u00a0answers.<\/p>\n<p>This involves multiple\u00a0steps:<\/p>\n<ol>\n<li>Retrieving the exact context elements used by the LLM to generate the\u00a0answer.<\/li>\n<li>Reformulating the answer for easier verification (In declarative form).<\/li>\n<li>Instructing the LLM to check for consistency between the context and the reformulated answer.<\/li>\n<\/ol>\n<p>While all these tips can significantly improve accuracy, you should remember that no method is foolproof. There\u2019s always a risk of rejecting valid answers if the LLM is overly conservative during verification or missing real hallucination cases. Therefore, rigorous evaluation of your specific LLM workflows is still essential.<\/p>\n<p>Full code in <a href=\"https:\/\/github.com\/CVxTz\/document_ai_agents\">https:\/\/github.com\/CVxTz\/document_ai_agents<\/a><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=f7ffd6eedcf2\" width=\"1\" height=\"1\" alt=\"\"><\/p>\n<hr>\n<p><a href=\"https:\/\/towardsdatascience.com\/an-agentic-approach-to-reducing-llm-hallucinations-f7ffd6eedcf2\">An Agentic Approach to Reducing LLM Hallucinations<\/a> was originally published in <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Youness Mansar<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/medium.com\/m\/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fan-agentic-approach-to-reducing-llm-hallucinations-f7ffd6eedcf2\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>An Agentic Approach to Reducing LLM Hallucinations Simple techniques to alleviate LLM hallucinations using LangGraph Photo by Greg Rakozy on\u00a0Unsplash If you\u2019ve worked with LLMs, you know they can sometimes hallucinate. This means they generate text that\u2019s either nonsensical or contradicts the input data. It\u2019s a common issue that can hurts the reliability of LLM-powered [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[756,62,240,862,863,87],"tags":[356,864,134],"class_list":["post-756","post","type-post","status-publish","format-standard","hentry","category-agents","category-aimldsaimlds","category-editors-pick","category-hallucinations","category-langgraph","category-llm","tag-context","tag-hallucinations","tag-llm"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/756"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=756"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/756\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=756"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=756"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=756"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}