{"id":3899,"date":"2025-05-17T07:04:47","date_gmt":"2025-05-17T07:04:47","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/05\/17\/agentic-ai-102-guardrails-and-agent-evaluation\/"},"modified":"2025-05-17T07:04:47","modified_gmt":"2025-05-17T07:04:47","slug":"agentic-ai-102-guardrails-and-agent-evaluation","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/05\/17\/agentic-ai-102-guardrails-and-agent-evaluation\/","title":{"rendered":"Agentic AI 102: Guardrails and Agent Evaluation"},"content":{"rendered":"<p>    Agentic AI 102: Guardrails and Agent Evaluation<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h2 class=\"wp-block-heading\"><mdspan datatext=\"el1747422439456\" class=\"mdspan-comment\">Introduction<\/mdspan><\/h2>\n<p class=\"wp-block-paragraph\">In the first post of this series (<a href=\"https:\/\/towardsdatascience.com\/agentic-ai-101-starting-your-journey-building-ai-agents\/\">Agentic AI 101: Starting Your Journey Building AI Agents<\/a>), we talked about the fundamentals of creating AI Agents and introduced concepts like reasoning, memory, and tools.<\/p>\n<p class=\"wp-block-paragraph\">Of course, that first post touched only the surface of this new area of the data industry. There is so much more that can be done, and we are going to learn more along the way in this series.<\/p>\n<p class=\"wp-block-paragraph\">So, it is time to take one step further.<\/p>\n<p class=\"wp-block-paragraph\">In this post, we will cover three topics:<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Guardrails<\/strong>: these are safe blocks that prevent a Large Language Model (LLM) from responding about some topics.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Agent Evaluation<\/strong>: Have you ever thought about how accurate the responses from LLM are? I bet you did. So we will see the main ways to measure that.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Monitoring<\/strong>: We will also learn about the built-in monitoring app in Agno\u2019s framework.<\/li>\n<\/ol>\n<p class=\"wp-block-paragraph\">We shall begin now.<\/p>\n<h2 class=\"wp-block-heading\">Guardrails<\/h2>\n<p class=\"wp-block-paragraph\">Our first topic is the simplest, in my opinion. Guardrails are rules that will keep an AI agent from responding to a given topic or list of topics.<\/p>\n<p class=\"wp-block-paragraph\">I believe there is a good chance that you have ever asked something to ChatGPT or Gemini and received a response like \u201cI can\u2019t talk about this topic\u201d, or \u201cPlease consult a professional specialist\u201d, something like that. Usually, that occurs with sensitive topics like health advice, psychological conditions, or financial advice.<\/p>\n<p class=\"wp-block-paragraph\">Those blocks are safeguards to prevent people from hurting themselves, harming their health, or their pockets. As we know, LLMs are trained on massive amounts of text, ergo inheriting a lot of bad content with it, which could easily lead to bad advice in those areas for people. And I didn\u2019t even mention hallucinations! <\/p>\n<p class=\"wp-block-paragraph\">Think about how many stories there are of people who lost money by following investment tips from online forums. Or how many people took the wrong medicine because they <em>read about it on the internet<\/em>.<\/p>\n<p class=\"wp-block-paragraph\">Well, I guess you got the point. We must prevent our agents from talking about certain topics or taking certain actions. For that, we will use guardrails.<\/p>\n<p class=\"wp-block-paragraph\">The best framework I found to impose those blocks is Guardrails AI <a href=\"https:\/\/www.guardrailsai.com\/\">[1]<\/a>. There, you will see a hub full of predefined rules that a response must follow in order to pass and be displayed to the user.<\/p>\n<p class=\"wp-block-paragraph\">To get started quickly, first go to this link [2] and get an API key. Then, install the package. Next, type the guardrails setup command. It will ask you a couple of questions that you can respond n (for No), and it will ask you to enter the API Key generated.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">pip install guardrails-ai\nguardrails configure<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Once that is completed, go to the Guardrails AI Hub <a href=\"https:\/\/hub.guardrailsai.com\/\">[3]<\/a> and choose one that you need. Every guardrail has instructions on how to implement it. Basically, you install it via the command line and then use it like a module in Python.<\/p>\n<p class=\"wp-block-paragraph\">For this example, we\u2019re choosing one called <em>Restrict to Topic<\/em> [4], which, as its name says, lets the user talk only about what\u2019s in the list. So, go back to the terminal and install it using the code below.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">guardrails hub install hub:\/\/tryolabs\/restricttotopic<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Next, let\u2019s open our Python script and import some modules.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Imports\nfrom agno.agent import Agent\nfrom agno.models.google import Gemini\nimport os\n\n# Import Guard and Validator\nfrom guardrails import Guard\nfrom guardrails.hub import RestrictToTopic\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Next, we create the guard. We will restrict our agent to talk only about <em>sports<\/em> or the <em>weather<\/em>. And we are restricting it to talk about <em>stocks<\/em>.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Setup Guard\nguard = Guard().use(\n    RestrictToTopic(\n        valid_topics=[\"sports\", \"weather\"],\n        invalid_topics=[\"stocks\"],\n        disable_classifier=True,\n        disable_llm=False,\n        on_fail=\"filter\"\n    )\n)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Now we can run the agent and the guard.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Create agent\nagent = Agent(\n    model= Gemini(id=\"gemini-1.5-flash\",\n                  api_key = os.environ.get(\"GEMINI_API_KEY\")),\n    description= \"An assistant agent\",\n    instructions= [\"Be sucint. Reply in maximum two sentences\"],\n    markdown= True\n    )\n\n# Run the agent\nresponse = agent.run(\"What's the ticker symbol for Apple?\").content\n\n# Run agent with validation\nvalidation_step = guard.validate(response)\n\n# Print validated response\nif validation_step.validation_passed:\n    print(response)\nelse:\n    print(\"Validation Failed\", validation_step.validation_summaries[0].failure_reason)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">This is the response when we ask about a stock symbol.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">Validation Failed Invalid topics found: ['stocks']<\/code><\/pre>\n<p class=\"wp-block-paragraph\">If I ask about a topic that is not on the <code>valid_topics<\/code> list, I will also see a block.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\">\"What's the number one soda drink?\"\nValidation Failed No valid topic was found.<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Finally, let\u2019s ask about sports.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\">\"Who is Michael Jordan?\"\nMichael Jordan is a former professional basketball player widely considered one of \nthe greatest of all time.  He won six NBA championships with the Chicago Bulls.<\/code><\/pre>\n<p class=\"wp-block-paragraph\">And we saw a response this time, as it is a valid topic.<\/p>\n<p class=\"wp-block-paragraph\">Let\u2019s move on to the evaluation of agents now.<\/p>\n<h2 class=\"wp-block-heading\">Agent Evaluation<\/h2>\n<p class=\"wp-block-paragraph\">Since I started studying LLMs and <a href=\"https:\/\/towardsdatascience.com\/tag\/agentic-ai\/\" title=\"Agentic Ai\">Agentic Ai<\/a>, one of my main questions has been about model evaluation. Unlike traditional Data Science Modeling, where you have structured metrics that are adequate for each case, for AI Agents, this is more blurry.<\/p>\n<p class=\"wp-block-paragraph\">Fortunately, the developer community is pretty quick in finding solutions for almost everything, and so they created this nice package for LLMs evaluation: <code>deepeval<\/code>.<\/p>\n<p class=\"wp-block-paragraph\">DeepEval <a href=\"https:\/\/www.deepeval.com\/docs\/getting-started\">[5]<\/a> is a library created by Confident AI that gathers many methods to evaluate LLMs and AI Agents. In this section, let\u2019s learn a couple of the main methods, just so we can build some intuition on the subject, and also because the library is quite extensive.<\/p>\n<p class=\"wp-block-paragraph\">The first evaluation is the most basic we can use, and it is called <code>G-Eval<\/code>. As AI tools like ChatGPT become more common in everyday tasks, we have to make sure they\u2019re giving helpful and accurate responses. That\u2019s where G-Eval from the DeepEval Python package comes in. <\/p>\n<p class=\"wp-block-paragraph\"><strong>G-Eval<\/strong> is like a smart reviewer that uses another AI model to evaluate how well a chatbot or AI assistant is performing. For example. My agent runs Gemini, and I am using OpenAI to assess it. This method takes a more advanced approach than a human one by asking an AI to \u201cgrade\u201d another AI\u2019s answers based on things like <em>relevance<\/em>, <em>correctness<\/em>, and <em>clarity<\/em>. <\/p>\n<p class=\"wp-block-paragraph\">It\u2019s a nice way to test and improve generative AI systems in a more scalable way. Let\u2019s quickly code an example. We will import the modules, create a prompt, a simple chat agent, and ask it about a description of the weather for the month of May in NYC.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Imports\nfrom agno.agent import Agent\nfrom agno.models.google import Gemini\nimport os\n# Evaluation Modules\nfrom deepeval.test_case import LLMTestCase, LLMTestCaseParams\nfrom deepeval.metrics import GEval\n\n# Prompt\nprompt = \"Describe the weather in NYC for May\"\n\n# Create agent\nagent = Agent(\n    model= Gemini(id=\"gemini-1.5-flash\",\n                  api_key = os.environ.get(\"GEMINI_API_KEY\")),\n    description= \"An assistant agent\",\n    instructions= [\"Be sucint\"],\n    markdown= True,\n    monitoring= True\n    )\n\n# Run agent\nresponse = agent.run(prompt)\n\n# Print response\nprint(response.content)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">It responds: \u201c<em>Mild, with average highs in the 60s\u00b0F and lows in the 50s\u00b0F. Expect some rain<\/em>\u201c. <\/p>\n<p class=\"wp-block-paragraph\">Nice. Seems pretty good to me.<\/p>\n<p class=\"wp-block-paragraph\">But how can we put a number on it and show a potential manager or client how our agent is doing?<\/p>\n<p class=\"wp-block-paragraph\">Here is how:<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Create a test case passing the <code>prompt<\/code> and the <code>response<\/code> to the <code>LLMTestCase<\/code> class.<\/li>\n<li class=\"wp-block-list-item\">Create a metric. We will use the method <code>GEval<\/code> and add a prompt for the model to test it for <strong>coherence<\/strong>, and then I give it the meaning of what coherence is to me.<\/li>\n<li class=\"wp-block-list-item\">Give the output as <code>evaluation_params<\/code>.<\/li>\n<li class=\"wp-block-list-item\">Run the <code>measure<\/code> method and get the <code>score<\/code> and <code>reason<\/code> from it.<\/li>\n<\/ol>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Test Case\ntest_case = LLMTestCase(input=prompt, actual_output=response)\n\n# Setup the Metric\ncoherence_metric = GEval(\n    name=\"Coherence\",\n    criteria=\"Coherence. The agent can answer the prompt and the response makes sense.\",\n    evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT]\n)\n\n# Run the metric\ncoherence_metric.measure(test_case)\nprint(coherence_metric.score)\nprint(coherence_metric.reason)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The output looks like this.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\">0.9\nThe response directly addresses the prompt about NYC weather in May, \nmaintains logical consistency, flows naturally, and uses clear language. \nHowever, it could be slightly more detailed.<\/code><\/pre>\n<p class=\"wp-block-paragraph\">0.9 seems pretty good, given that the default threshold is 0.5.<\/p>\n<p class=\"wp-block-paragraph\">If you want to check the logs, use this next snippet.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Check the logs\nprint(coherence_metric.verbose_logs)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Here\u2019s the response.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\">Criteria:\nCoherence. The agent can answer the prompt and the response makes sense.\n\nEvaluation Steps:\n[\n    \"Assess whether the response directly addresses the prompt; if it aligns,\n it scores higher on coherence.\",\n    \"Evaluate the logical flow of the response; responses that present ideas\n in a clear, organized manner rank better in coherence.\",\n    \"Consider the relevance of examples or evidence provided; responses that \ninclude pertinent information enhance their coherence.\",\n    \"Check for clarity and consistency in terminology; responses that maintain\n clear language without contradictions achieve a higher coherence rating.\"\n]<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Very nice. Now let us learn about another interesting use case, which is the evaluation of <strong>task completion for AI Agents<\/strong>. Elaborating a little more, how our agent is doing when it is requested to perform a task, and how much of it the agent can deliver.<\/p>\n<p class=\"wp-block-paragraph\">First, we are creating a simple agent that can access Wikipedia and summarize the topic of the query.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Imports\nfrom agno.agent import Agent\nfrom agno.models.google import Gemini\nfrom agno.tools.wikipedia import WikipediaTools\nimport os\nfrom deepeval.test_case import LLMTestCase, ToolCall\nfrom deepeval.metrics import TaskCompletionMetric\nfrom deepeval import evaluate\n\n# Prompt\nprompt = \"Search wikipedia for 'Time series analysis' and summarize the 3 main points\"\n\n# Create agent\nagent = Agent(\n    model= Gemini(id=\"gemini-2.0-flash\",\n                  api_key = os.environ.get(\"GEMINI_API_KEY\")),\n    description= \"You are a researcher specialized in searching the wikipedia.\",\n    tools= [WikipediaTools()],\n    show_tool_calls= True,\n    markdown= True,\n    read_tool_call_history= True\n    )\n\n# Run agent\nresponse = agent.run(prompt)\n\n# Print response\nprint(response.content)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The result looks very good. Let\u2019s evaluate it using the <code>TaskCompletionMetric<\/code> class.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Create a Metric\nmetric = TaskCompletionMetric(\n    threshold=0.7,\n    model=\"gpt-4o-mini\",\n    include_reason=True\n)\n\n# Test Case\ntest_case = LLMTestCase(\n    input=prompt,\n    actual_output=response.content,\n    tools_called=[ToolCall(name=\"wikipedia\")]\n    )\n\n# Evaluate\nevaluate(test_cases=[test_case], metrics=[metric])<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Output, including the agent\u2019s response.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-markup\">======================================================================\n\nMetrics Summary\n\n  - <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2705.png?ssl=1\" alt=\"\u2705\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> Task Completion (score: 1.0, threshold: 0.7, strict: False, \nevaluation model: gpt-4o-mini, \nreason: The system successfully searched for 'Time series analysis' \non Wikipedia and provided a clear summary of the 3 main points, \nfully aligning with the user's goal., error: None)\n\nFor test case:\n\n  - input: Search wikipedia for 'Time series analysis' and summarize the 3 main points\n  - actual output: Here are the 3 main points about Time series analysis based on the\n Wikipedia search:\n\n1.  **Definition:** A time series is a sequence of data points indexed in time order,\n often taken at successive, equally spaced points in time.\n2.  **Applications:** Time series analysis is used in various fields like statistics,\n signal processing, econometrics, weather forecasting, and more, wherever temporal \nmeasurements are involved.\n3.  **Purpose:** Time series analysis involves methods for extracting meaningful \nstatistics and characteristics from time series data, and time series forecasting \nuses models to predict future values based on past observations.\n\n  - expected output: None\n  - context: None\n  - retrieval context: None\n\n======================================================================\n\nOverall Metric Pass Rates\n\nTask Completion: 100.00% pass rate\n\n======================================================================\n\n\u2713 Tests finished <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f389.png?ssl=1\" alt=\"\ud83c\udf89\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\">! Run 'deepeval login' to save and analyze evaluation results\n on Confident AI.<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Our agent passed the test with honor: 100%!<\/p>\n<p class=\"wp-block-paragraph\">You can learn much more about the <strong>DeepEval<\/strong> library in this link <a href=\"https:\/\/www.confident-ai.com\/blog\/llm-evaluation-metrics-everything-you-need-for-llm-evaluation\">[8]<\/a>.<\/p>\n<p class=\"wp-block-paragraph\">Finally, in the next section, we will learn the capabilities of Agno\u2019s library for monitoring agents.<\/p>\n<h2 class=\"wp-block-heading\">Agent Monitoring<\/h2>\n<p class=\"wp-block-paragraph\">Like I told you in my previous post <a href=\"https:\/\/towardsdatascience.com\/agentic-ai-101-starting-your-journey-building-ai-agents\/\">[9]<\/a>, I chose <strong>Agno<\/strong> to learn more about Agentic AI. Just to be clear, this is not a sponsored post. It is just that I think this is the best option for those starting their journey learning about this topic.<\/p>\n<p class=\"wp-block-paragraph\">So, one of the cool things we can take advantage of using Agno\u2019s framework is the app they make available for model monitoring.<\/p>\n<p class=\"wp-block-paragraph\">Take this agent that can search the internet and write Instagram posts, for example.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Imports\nimport os\nfrom agno.agent import Agent\nfrom agno.models.google import Gemini\nfrom agno.tools.file import FileTools\nfrom agno.tools.googlesearch import GoogleSearchTools\n\n\n# Topic\ntopic = \"Healthy Eating\"\n\n# Create agent\nagent = Agent(\n    model= Gemini(id=\"gemini-1.5-flash\",\n                  api_key = os.environ.get(\"GEMINI_API_KEY\")),\n                  description= f\"\"\"You are a social media marketer specialized in creating engaging content.\n                  Search the internet for 'trending topics about {topic}' and use them to create a post.\"\"\",\n                  tools=[FileTools(save_files=True),\n                         GoogleSearchTools()],\n                  expected_output=\"\"\"A short post for instagram and a prompt for a picture related to the content of the post.\n                  Don't use emojis or special characters in the post. If you find an error in the character encoding, remove the character before saving the file.\n                  Use the template:\n                  - Post\n                  - Prompt for the picture\n                  Save the post to a file named 'post.txt'.\"\"\",\n                  show_tool_calls=True,\n                  monitoring=True)\n\n# Writing and saving a file\nagent.print_response(\"\"\"Write a short post for instagram with tips and tricks that positions me as \n                     an authority in {topic}.\"\"\",\n                     markdown=True)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">To monitor its performance, follow these steps:<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Go to <a href=\"https:\/\/app.agno.com\/settings\">https:\/\/app.agno.com\/settings<\/a> and get an API Key. <\/li>\n<li class=\"wp-block-list-item\">Open a terminal and type <code>ag setup<\/code>.<\/li>\n<li class=\"wp-block-list-item\">If it is the first time, it might ask for the API Key. Copy and Paste it in the terminal prompt.<\/li>\n<li class=\"wp-block-list-item\">You will see the <strong>Dashboard <\/strong>tab open in your browser.<\/li>\n<li class=\"wp-block-list-item\">If you want to monitor your agent, add the argument <code>monitoring=True<\/code>.<\/li>\n<li class=\"wp-block-list-item\">Run your agent.<\/li>\n<li class=\"wp-block-list-item\">Go to the Dashboard on the web browser.<\/li>\n<li class=\"wp-block-list-item\">Click on <strong>Sessions<\/strong>. As it is a single agent, you will see it under the tab Agents on the top portion of the page.<\/li>\n<\/ol>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"487\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/image-136-1024x487.png?resize=1024%2C487&#038;ssl=1\" alt=\"\" class=\"wp-image-604254\"><figcaption class=\"wp-element-caption\">Agno Dashboard after running the agent. Image by the author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">The cools features we can see there are:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Info about the model<\/li>\n<li class=\"wp-block-list-item\">The response<\/li>\n<li class=\"wp-block-list-item\">Tools used<\/li>\n<li class=\"wp-block-list-item\">Tokens consumed<\/li>\n<\/ul>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"375\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/image-137-1024x375.png?resize=1024%2C375&#038;ssl=1\" alt=\"\" class=\"wp-image-604255\"><figcaption class=\"wp-element-caption\">This is the resulting token consumption while saving the file. Image by the author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Pretty neat, huh?<\/p>\n<p class=\"wp-block-paragraph\">This is useful for us to know where the agent is spending more or less tokens, and where it is taking more time to perform a task, for example.<\/p>\n<p class=\"wp-block-paragraph\">Well, let\u2019s wrap up then.<\/p>\n<h2 class=\"wp-block-heading\">Before You Go<\/h2>\n<p class=\"wp-block-paragraph\">We have learned a lot in this second round. In this post, we covered:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Guardrails for AI<\/strong> are essential safety measures and ethical guidelines implemented to prevent unintended harmful outputs and ensure responsible AI behavior.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Model evaluation<\/strong>, exemplified by <code>GEval<\/code> for broad assessment and <code>TaskCompletion<\/code> with DeepEval for agents output quality, is crucial for understanding AI capabilities and limitations.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Model monitoring<\/strong> with Agno\u2019s app, including tracking token usage and response time, which is vital for managing costs, ensuring performance, and identifying potential issues in deployed AI systems.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\">Contact &amp; Follow Me<\/h3>\n<p class=\"wp-block-paragraph\">If you liked this content, find more of my work in my website.<\/p>\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/gustavorsantos.me\/\">https:\/\/gustavorsantos.me<\/a><\/p>\n<h3 class=\"wp-block-heading\">GitHub Repository<\/h3>\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/github.com\/gurezende\/agno-ai-labs\">https:\/\/github.com\/gurezende\/agno-ai-labs<\/a><\/p>\n<h2 class=\"wp-block-heading\">References<\/h2>\n<p class=\"wp-block-paragraph\">[1. <a href=\"https:\/\/towardsdatascience.com\/tag\/guardrails-ai\/\" title=\"Guardrails Ai\">Guardrails Ai<\/a>] <a href=\"https:\/\/www.guardrailsai.com\/docs\/getting_started\/guardrails_server\">https:\/\/www.guardrailsai.com\/docs\/getting_started\/guardrails_server<\/a><\/p>\n<p class=\"wp-block-paragraph\">[2. Guardrails AI Auth Key] <a href=\"https:\/\/hub.guardrailsai.com\/keys\">https:\/\/hub.guardrailsai.com\/keys<\/a><\/p>\n<p class=\"wp-block-paragraph\">[3. Guardrails AI Hub] <a href=\"https:\/\/hub.guardrailsai.com\/\">https:\/\/hub.guardrailsai.com\/<\/a><\/p>\n<p class=\"wp-block-paragraph\">[4. Guardrails Restrict to Topic] <a href=\"https:\/\/hub.guardrailsai.com\/validator\/tryolabs\/restricttotopic\">https:\/\/hub.guardrailsai.com\/validator\/tryolabs\/restricttotopic<\/a><\/p>\n<p class=\"wp-block-paragraph\">[5. DeepEval.] <a href=\"https:\/\/www.deepeval.com\/docs\/getting-started\">https:\/\/www.deepeval.com\/docs\/getting-started<\/a><\/p>\n<p class=\"wp-block-paragraph\">[6. DataCamp \u2013 DeepEval Tutorial] <a href=\"https:\/\/www.datacamp.com\/tutorial\/deepeval\">https:\/\/www.datacamp.com\/tutorial\/deepeval<\/a><\/p>\n<p class=\"wp-block-paragraph\">[7. DeepEval. TaskCompletion] <a href=\"https:\/\/www.deepeval.com\/docs\/metrics-task-completion\">https:\/\/www.deepeval.com\/docs\/metrics-task-completion<\/a><\/p>\n<p class=\"wp-block-paragraph\">[8. <a href=\"https:\/\/towardsdatascience.com\/tag\/llm\/\" title=\"Llm\">Llm<\/a> Evaluation Metrics: The Ultimate LLM Evaluation Guide] <a href=\"https:\/\/www.confident-ai.com\/blog\/llm-evaluation-metrics-everything-you-need-for-llm-evaluation\">https:\/\/www.confident-ai.com\/blog\/llm-evaluation-metrics-everything-you-need-for-llm-evaluation<\/a><\/p>\n<p class=\"wp-block-paragraph\">[9. Agentic AI 101: Starting Your Journey Building AI Agents] <a href=\"https:\/\/towardsdatascience.com\/agentic-ai-101-starting-your-journey-building-ai-agents\/\">https:\/\/towardsdatascience.com\/agentic-ai-101-starting-your-journey-building-ai-agents\/<\/a><\/p>\n<p class=\"wp-block-paragraph\">\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/agentic-ai-102-guardrails-and-agent-evaluation\/\">Agentic AI 102: Guardrails and Agent Evaluation<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Gustavo Santos<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/agentic-ai-102-guardrails-and-agent-evaluation\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Agentic AI 102: Guardrails and Agent Evaluation Introduction In the first post of this series (Agentic AI 101: Starting Your Journey Building AI Agents), we talked about the fundamentals of creating AI Agents and introduced concepts like reasoning, memory, and tools. Of course, that first post touched only the surface of this new area of [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[678,799,62,69,2707,87,1500],"tags":[320,98,1494],"class_list":["post-3899","post","type-post","status-publish","format-standard","hentry","category-agentic-ai","category-ai-agent","category-aimldsaimlds","category-artificial-intelligence","category-guardrails-ai","category-llm","category-model-evaluation","tag-about","tag-ai","tag-guardrails"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/3899"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=3899"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/3899\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=3899"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=3899"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=3899"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}