{"id":3274,"date":"2025-04-23T07:02:29","date_gmt":"2025-04-23T07:02:29","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/04\/23\/ai-agents-processing-timeseries-and-large-dataframes\/"},"modified":"2025-04-23T07:02:29","modified_gmt":"2025-04-23T07:02:29","slug":"ai-agents-processing-timeseries-and-large-dataframes","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/04\/23\/ai-agents-processing-timeseries-and-large-dataframes\/","title":{"rendered":"AI Agents Processing Time Series and Large Dataframes"},"content":{"rendered":"<p>    AI Agents Processing Time Series and Large Dataframes<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h2 class=\"wp-block-heading\"><mdspan datatext=\"el1745296097126\" class=\"mdspan-comment\">Intro<\/mdspan><\/h2>\n<p class=\"wp-block-paragraph\">Agents are AI systems, powered by LLMs, that can reason about their objectives and take actions to achieve a final goal. They are designed not just to respond to queries, but to orchestrate a sequence of operations, including processing data (i.e. dataframes and time series). This ability unlocks numerous real-world applications for democratizing access to data analysis, such as automating reporting, no-code queries, support on data cleaning and manipulation.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Agents that can interact with dataframes in two different ways:\u00a0<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">with <strong>natural language <\/strong>\u2014<strong> <\/strong>the LLM reads the table as a string<strong> <\/strong>and tries to make sense of it based on its knowledge base<\/li>\n<li class=\"wp-block-list-item\">by <strong>generating and executing code<\/strong> \u2014 the Agent activates tools to process the dataset as an object.\u00a0<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXctE3JRXxkBRROerRqJ8CM7k0N6J8crUjavJJzLR72Nv0def0NS35LXVyhbyigpTaacvB8rRN-yHZxUeGOUpHoE2vlfn0zAz9MIdT6x_63ylc-0XVg125HkncpfG7M8r-50pXH3HA?key=MCyGNqQxDXoyqd9WC2WjrRH-\" width=\"602\" height=\"289\">So, by combining the power of NLP with the precision of code execution, AI Agents enable a broader range of users to interact with complex datasets and derive insights.<\/p>\n<p class=\"wp-block-paragraph\">In this tutorial, I\u2019m going to show how to <strong>process dataframes and time series with AI Agents<\/strong>. I will present some useful Python code that can be easily applied in other similar cases (just copy, paste, run) and walk through every line of code with comments so that you can replicate this example (link to full code at the end of the article).<\/p>\n<h2 class=\"wp-block-heading\">Setup<\/h2>\n<p class=\"wp-block-paragraph\">Let\u2019s start by setting up <a href=\"https:\/\/ollama.com\/\"><strong><em>Ollama<\/em><\/strong><\/a><strong><em> <\/em><\/strong>(<code>pip install ollama==0.4.7<\/code>), a library that allows users to run open-source LLMs locally, without needing cloud-based services, giving more control over data privacy and performance. Since it runs locally, any conversation data does not leave your machine.<\/p>\n<p class=\"wp-block-paragraph\">First of all, you need to download <em>Ollama<\/em> from the website.\u00a0<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXewKS3mE3HGQ5ABk-VsU3m7ycB3-xG-kDYSXsTxN6uY94cEEFKxKC03StgFVYkLohoZdTBNZCnJb-qYM-K8SKKsJEil7QBQnIV5O5awXwOzSgQ4AFe-IE8b1ewEr-Q78Ev2A2dn?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\"><\/figure>\n<p class=\"wp-block-paragraph\">Then, on the prompt shell of your laptop, use the command to download the selected LLM. I\u2019m going with Alibaba\u2019s <strong><em>Qwen<\/em><\/strong>, as it\u2019s both smart and light.<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXdSofGioHmo2AmV_VimTUzmlsWLhRSZqVQ8frZAVmHySVdpG6cjbOMb0vj0L2HiD-O6KXOrxt-W-W3ffNxQ_-U54iAqKcvY2SruOWSjV6QQvnyc7ezly-aB9AjzxQNuc8op7AK9Qg?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\"><\/figure>\n<p class=\"wp-block-paragraph\">After the download is completed, you can move on to Python and start writing code.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import ollama\nllm = \"qwen2.5\"<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Let\u2019s test the LLM:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">stream = ollama.generate(model=llm, prompt='''what time is it?''', stream=True)\nfor chunk in stream:\n\u00a0 \u00a0 print(chunk['response'], end='', flush=True)<\/code><\/pre>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXdl1dgcxkb56m_smeUZDDZ0akGbomEgUpqIsaweH7qDZVKNYofJmV_RDxSWgnPdsCd18YjQ3mur9GTNsEnNcyQEfd2yyjuPAPD9HrNTDxpxbrNR3WTuDx1_virjxynNffjg9bV3?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\"><\/figure>\n<h2 class=\"wp-block-heading\">Time Series<\/h2>\n<p class=\"wp-block-paragraph\">A time series is a sequence of data points measured over time, often used for analysis and forecasting. It allows us to see how variables change over time, and it\u2019s used to identify trends and seasonal patterns. <\/p>\n<p class=\"wp-block-paragraph\">I\u2019m going to generate a fake time series dataset to use as an example.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\n\n## create data\nnp.random.seed(1) #&lt;--for reproducibility\nlength = 30\nts = pd.DataFrame(data=np.random.randint(low=0, high=15, size=length),\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 columns=['y'],\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 index=pd.date_range(start='2023-01-01', freq='MS', periods=length).strftime('%Y-%m'))\n\n## plot\nts.plot(kind=\"bar\", figsize=(10,3), legend=False, color=\"black\").grid(axis='y')<\/code><\/pre>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXfoRLNByHpN3jX1bhmK-XRRLMHs3irRlC4VkvFVj4wYxsLKLk9hq-UiUBg8YNhi1BtRSXgFQR_1vugZP1V4_wQQgr7ByodnjIhLsOadpyGOZTafFhSmC3pnBASrzyYMsiyoajOI5A?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\"><\/figure>\n<p class=\"wp-block-paragraph\">Usually, time series datasets have a really simple structure with the main variable as a column and the time as the index.<\/p>\n<figure class=\"wp-block-image is-resized\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXdFVOnSXA_Q0UHiY6pno2dK4Q2D5JMRCifT_wvCSjKGulfklonaIMkzXCcmsGiEYpOJSSFLBWoan0Y6uJ3ZXa-oSUDieJnwLie8X2NttuQMDhysCcp_CuzKa6h9KB6ZUpiYEMK_?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\" style=\"width:156px;height:auto\"><\/figure>\n<p class=\"wp-block-paragraph\">Before transforming it into a string, I want to make sure that everything is placed under a column, so that we don\u2019t lose any piece of information.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">dtf = ts.reset_index().rename(columns={\"index\":\"date\"})\ndtf.head()<\/code><\/pre>\n<figure class=\"wp-block-image is-resized\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXe_j3L2kNweWWWCvvrSRaKPbIIKmulsSeeiAVzOPT5a1-ZzmT5ApECW9mYaYkDA4k4KAExYvvZMZ0oql8xG1tFXb1LVHfNoalgzY4B98HhsyvyCoD6HT0UHqUusseWD-KELg-ud?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\" style=\"width:176px;height:auto\"><\/figure>\n<p class=\"wp-block-paragraph\">Then, I shall change the data type <strong>from dataframe to dictionary<\/strong>.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">data = dtf.to_dict(orient='records')\ndata[0:5]<\/code><\/pre>\n<figure class=\"wp-block-image is-resized\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXdWvnvtLLUahbr2fSY0M6xy7y5xWBDo2IODONiq8gkD5E3BCc59oH85BRhc42k1e73gOTN4dFyrZf2iPCsa-2QICy94wydaINKvoSfEQ01atLUf-HDSWhTDQdUc6xue93EwMBBE4w?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\" style=\"width:329px;height:auto\"><\/figure>\n<p class=\"wp-block-paragraph\">Finally, <strong>from dictionary to string<\/strong>.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">str_data = \"n\".join([str(row) for row in data])\nstr_data<\/code><\/pre>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXeNnLLprfggjRVWGQcysr08V6NNLdJdE4wzTd5OI8WjEZtuJ5uh3ZwxD99po-Mg_fe6_k7Y9BhnEyHG-mNltt2wz2EMvSOnvdclCI6N7THCd3Jpaqg9CqLfDhKh-VPNUdOOuHgdbA?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\"><\/figure>\n<p class=\"wp-block-paragraph\">Now that we have a string, it can be <strong>included in a prompt<\/strong> that any language model is able to process. When you paste a dataset into a prompt, the LLM reads the data as plain text, but can still understand the structure and meaning based on patterns seen during training.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">prompt = f'''\nAnalyze this dataset, it contains monthly sales data of an online retail product:\n{str_data}\n'''<\/code><\/pre>\n<p class=\"wp-block-paragraph\">We can easily start a chat with the LLM. Please note that, right now, this is not an Agent as it doesn\u2019t have any Tool, we\u2019re just using the language model. While it doesn\u2019t process numbers like a computer, the LLM can recognize column names, time-based patterns, trends, and outliers, especially with smaller datasets. It can simulate analysis and explain findings, but it won\u2019t perform precise calculations independently, as it\u2019s not executing code like an Agent.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">messages = [{\"role\":\"system\", \"content\":prompt}]\n\nwhile True:\n\u00a0 \u00a0 ## User\n\u00a0 \u00a0 q = input('<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f642.png?ssl=1\" alt=\"\ud83d\ude42\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> &gt;')\n\u00a0 \u00a0 if q == \"quit\":\n\u00a0 \u00a0 \u00a0 \u00a0 break\n\u00a0 \u00a0 messages.append( {\"role\":\"user\", \"content\":q} )\n\u00a0 \u00a0\n\u00a0 \u00a0 ## Model\n\u00a0 \u00a0 agent_res = ollama.chat(model=llm, messages=messages, tools=[])\n\u00a0 \u00a0 res = agent_res[\"message\"][\"content\"]\n\u00a0 \u00a0\n\u00a0 \u00a0 ## Response\n\u00a0 \u00a0 print(\"<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f47d.png?ssl=1\" alt=\"\ud83d\udc7d\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> &gt;\", f\"x1b[1;30m{res}x1b[0m\")\n\u00a0 \u00a0 messages.append( {\"role\":\"assistant\", \"content\":res} )<\/code><\/pre>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXeJqw64GqxKJiOzrLvW18yvJsGw1OXXyd_e-Nwi-cTL5YNJcNQKDJJKSsyB0JWDP9j_k2_O1BHzTgbmVZOvKEOCLzQ6hQqfdXWsw-wkDxKrkgRLk3HfiUJIJ6yvzw2X2MZ8_lvCnA?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\"><\/figure>\n<p class=\"wp-block-paragraph\">The LLM recognizes numbers and understands the<em> <\/em>general context, the same way it might understand a recipe or a line of code.\u00a0<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXcGYGyzSKdOJ05cJtesiQZK0N5gOX1LX54DhKvvSEtjctEY66DbpU7ZUwrjVeuERfiMOy7Ia76MiwAmIYLGcKZynMwOPHOkVnkMXcAPIiCDlphAatfwF0Tt4_gs3WseCFkCFM3w?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\"><\/figure>\n<p class=\"wp-block-paragraph\">As you can see, using LLMs to analyze time series is great for quick and conversational insights.<\/p>\n<h2 class=\"wp-block-heading\">Agent<\/h2>\n<p class=\"wp-block-paragraph\">LLMs are good for brainstorming and lite exploration, while an Agent can run code. Therefore, it can handle more complex tasks like plotting, forecasting, and anomaly detection. So, let\u2019s create the Tools.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Sometimes, it can be more effective to treat the <strong>\u201cfinal answer\u201d as a Tool<\/strong>. For example, if the Agent does multiple actions to generate intermediate results, the final answer can be thought of as the Tool that integrates all of this information into a cohesive response. By designing it this way, you have more customization and control over the results.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def final_answer(text:str) -&gt; str:\n\u00a0 \u00a0 return text\n\ntool_final_answer = {'type':'function', 'function':{\n\u00a0 'name': 'final_answer',\n\u00a0 'description': 'Returns a natural language response to the user',\n\u00a0 'parameters': {'type': 'object',\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'required': ['text'],\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'properties': {'text': {'type':'str', 'description':'natural language response'}}\n}}}\n\nfinal_answer(text=\"hi\")<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Then, the <strong>coding Tool<\/strong>.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import io\nimport contextlib\n\ndef code_exec(code:str) -&gt; str:\n\u00a0 \u00a0 output = io.StringIO()\n\u00a0 \u00a0 with contextlib.redirect_stdout(output):\n\u00a0 \u00a0 \u00a0 \u00a0 try:\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 exec(code)\n\u00a0 \u00a0 \u00a0 \u00a0 except Exception as e:\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(f\"Error: {e}\")\n\u00a0 \u00a0 return output.getvalue()\n\ntool_code_exec = {'type':'function', 'function':{\n\u00a0 'name': 'code_exec',\n\u00a0 'description': 'Execute python code. Use always the function print() to get the output.',\n\u00a0 'parameters': {'type': 'object',\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'required': ['code'],\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'properties': {\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'code': {'type':'str', 'description':'code to execute'},\n}}}}\n\ncode_exec(\"from datetime import datetime; print(datetime.now().strftime('%H:%M'))\")<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Moreover, I shall add a couple of <strong>utils functions<\/strong> for Tool usage and to run the Agent.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">dic_tools = {\"final_answer\":final_answer, \"code_exec\":code_exec}\n\n# Utils\ndef use_tool(agent_res:dict, dic_tools:dict) -&gt; dict:\n\u00a0 \u00a0 ## use tool\n\u00a0 \u00a0 if \"tool_calls\" in agent_res[\"message\"].keys():\n\u00a0 \u00a0 \u00a0 \u00a0 for tool in agent_res[\"message\"][\"tool_calls\"]:\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 t_name, t_inputs = tool[\"function\"][\"name\"], tool[\"function\"][\"arguments\"]\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if f := dic_tools.get(t_name):\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 ### calling tool\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print('<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f527.png?ssl=1\" alt=\"\ud83d\udd27\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> &gt;', f\"x1b[1;31m{t_name} -&gt; Inputs: {t_inputs}x1b[0m\")\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 ### tool output\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 t_output = f(**tool[\"function\"][\"arguments\"])\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(t_output)\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 ### final res\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 res = t_output\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 else:\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print('<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f92c.png?ssl=1\" alt=\"\ud83e\udd2c\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> &gt;', f\"x1b[1;31m{t_name} -&gt; NotFoundx1b[0m\")\n\u00a0 \u00a0 ## don't use tool\n\u00a0 \u00a0 if agent_res['message']['content'] != '':\n\u00a0 \u00a0 \u00a0 \u00a0 res = agent_res[\"message\"][\"content\"]\n\u00a0 \u00a0 \u00a0 \u00a0 t_name, t_inputs = '', ''\n\u00a0 \u00a0 return {'res':res, 'tool_used':t_name, 'inputs_used':t_inputs}<\/code><\/pre>\n<p class=\"wp-block-paragraph\">When the Agent is trying to solve a task, I want it to keep track of the Tools that have been used, the inputs that it tried, and the results it gets. The iteration should stop only when the model is ready to give the final answer.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def run_agent(llm, messages, available_tools):\n\u00a0 \u00a0 tool_used, local_memory = '', ''\n\u00a0 \u00a0 while tool_used != 'final_answer':\n\u00a0 \u00a0 \u00a0 \u00a0 ### use tools\n\u00a0 \u00a0 \u00a0 \u00a0 try:\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 agent_res = ollama.chat(model=llm,\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 messages=messages,\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0tools=[v for v in available_tools.values()])\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 dic_res = use_tool(agent_res, dic_tools)\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 res, tool_used, inputs_used = dic_res[\"res\"], dic_res[\"tool_used\"], dic_res[\"inputs_used\"]\n\u00a0 \u00a0 \u00a0 \u00a0 ### error\n\u00a0 \u00a0 \u00a0 \u00a0 except Exception as e:\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(\"<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/26a0.png?ssl=1\" alt=\"\u26a0\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> &gt;\", e)\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 res = f\"I tried to use {tool_used} but didn't work. I will try something else.\"\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(\"<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f47d.png?ssl=1\" alt=\"\ud83d\udc7d\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> &gt;\", f\"x1b[1;30m{res}x1b[0m\")\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 messages.append( {\"role\":\"assistant\", \"content\":res} )\n\u00a0 \u00a0 \u00a0 \u00a0 ### update memory\n\u00a0 \u00a0 \u00a0 \u00a0 if tool_used not in ['','final_answer']:\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 local_memory += f\"nTool used: {tool_used}.nInput used: {inputs_used}.nOutput: {res}\"\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 messages.append( {\"role\":\"assistant\", \"content\":local_memory} )\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 available_tools.pop(tool_used)\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if len(available_tools) == 1:\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 messages.append( {\"role\":\"user\", \"content\":\"now activate the tool final_answer.\"} )\n\u00a0 \u00a0 \u00a0 \u00a0 ### tools not used\n\u00a0 \u00a0 \u00a0 \u00a0 if tool_used == '':\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 break\n\u00a0 \u00a0 return res<\/code><\/pre>\n<p class=\"wp-block-paragraph\">In regard to the coding Tool, I\u2019ve noticed that Agents tend to recreate the dataframe at every step. So I will use a <strong>memory reinforcement<\/strong> to remind the model that the dataset already exists. A trick commonly used to get the desired behaviour. Ultimately, memory reinforcements help you to get more meaningful and effective interactions.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Start a chat\nmessages = [{\"role\":\"system\", \"content\":prompt}]\nmemory = '''\nThe dataset already exists and it's called 'dtf', don't create a new one.\n'''\nwhile True:\n\u00a0 \u00a0 ## User\n\u00a0 \u00a0 q = input('<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f642.png?ssl=1\" alt=\"\ud83d\ude42\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> &gt;')\n\u00a0 \u00a0 if q == \"quit\":\n\u00a0 \u00a0 \u00a0 \u00a0 break\n\u00a0 \u00a0 messages.append( {\"role\":\"user\", \"content\":q} )\n\n\u00a0 \u00a0 ## Memory\n\u00a0 \u00a0 messages.append( {\"role\":\"user\", \"content\":memory} )\u00a0 \u00a0 \u00a0\n\u00a0 \u00a0\n\u00a0 \u00a0 ## Model\n\u00a0 \u00a0 available_tools = {\"final_answer\":tool_final_answer, \"code_exec\":tool_code_exec}\n\u00a0 \u00a0 res = run_agent(llm, messages, available_tools)\n\u00a0 \u00a0\n\u00a0 \u00a0 ## Response\n\u00a0 \u00a0 print(\"<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f47d.png?ssl=1\" alt=\"\ud83d\udc7d\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> &gt;\", f\"x1b[1;30m{res}x1b[0m\")\n\u00a0 \u00a0 messages.append( {\"role\":\"assistant\", \"content\":res} )<\/code><\/pre>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXevzr4D_hC0uXS-IHzk0y7bKDjZI_mZnZyBR8eljyk9fxkLSuayay39eQvzoH39rZiPOIteRqRLx43s77iB_6YuDl7oM1UoId53Swj9cAtDqjK6HExUCX0T-sKol8ZCN8MrNCVDAg?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\"><\/figure>\n<p class=\"wp-block-paragraph\">Creating a plot is something that the LLM alone can\u2019t do. But keep in mind that even if Agents can create images, they can\u2019t see them, because after all, the engine is still a language model. So the user is the only one who visualises the plot.<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXfrvxtPZXAA-ExCRRjbjGw15vRREHH2zrwuP3H7SaVOprby09koe4VsZl2NYoryLOrXm-J5SjbJPz8cvWeTsKtRyhVnpuZmST9-VxI0_LlRvxskdBWWbribUnIIjMgKLN5eVtzNSQ?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\"><\/figure>\n<p class=\"wp-block-paragraph\">The Agent is using the library <a href=\"https:\/\/www.statsmodels.org\/stable\/index.html\"><em>statsmodels<\/em><\/a><em> <\/em>to train a model and forecast the time series.\u00a0<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXdDtjrkAY0K3yGZVmUMRKnyilMk8WIytg4NRY439ksSMcuDe8IRtS-mDiST_1NkKS9jKXhFLf-eM4S-4bS1iIgppFZhvq282D6yhHtIQe26YlRWPXqvMIu9uvYknWfRwV2enOGh?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\"><\/figure>\n<h2 class=\"wp-block-heading\">Large Dataframes<\/h2>\n<p class=\"wp-block-paragraph\">LLMs have limited memory, which restricts how much information they can process at once, even the most advanced models have token limits (a few hundred pages of text). Additionally, LLMs don\u2019t retain memory across sessions unless a retrieval system is integrated. In practice, to effectively work with large dataframes, developers often use strategies like chunking, RAG, vector databases, and summarizing content before feeding it into the model.<\/p>\n<p class=\"wp-block-paragraph\">Let\u2019s create a big dataset to play with.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import random\nimport string\n\nlength = 1000\n\ndtf = pd.DataFrame(data={\n\u00a0 \u00a0 'Id': [''.join(random.choices(string.ascii_letters, k=5)) for _ in range(length)],\n\u00a0 \u00a0 'Age': np.random.randint(low=18, high=80, size=length),\n\u00a0 \u00a0 'Score': np.random.uniform(low=50, high=100, size=length).round(1),\n\u00a0 \u00a0 'Status': np.random.choice(['Active','Inactive','Pending'], size=length)\n})\n\ndtf.tail()<\/code><\/pre>\n<figure class=\"wp-block-image is-resized\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXfqcbgJCvzGHG2gDg_mA9_0L4oAaVCxUhwcjPrdu4aZuE2gXqIYV9fMLtMMnWNgoMN6xeFGqLDi_huG1_ra48-PpOceLTXFeyae0JSX-3yXjSDVTRhFE3-meg-uLPkaM8sCCVi_tg?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\" style=\"width:406px;height:auto\"><\/figure>\n<p class=\"wp-block-paragraph\">I\u2019ll add a <strong>web-searching Tool<\/strong>, so that, with the ability to execute Python code and search the internet, a general-purpose AI gains access to all the available knowledge and can make data-driven decisions.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">In Python, the easiest way to create a web-searching Tool is with the famous private browser<a href=\"https:\/\/pypi.org\/project\/duckduckgo-search\/\"> <em>DuckDuckGo<\/em><\/a><em> <\/em>(<code>pip install duckduckgo-search==6.3.5<\/code>). You can directly use the original library or import the <a href=\"https:\/\/www.langchain.com\/\"><em>LangChain<\/em><\/a> wrapper (<code>pip install langchain-community==0.3.17<\/code>).<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">from langchain_community.tools import DuckDuckGoSearchResults\n\ndef search_web(query:str) -&gt; str:\n\u00a0 return DuckDuckGoSearchResults(backend=\"news\").run(query)\n\ntool_search_web = {'type':'function', 'function':{\n\u00a0 'name': 'search_web',\n\u00a0 'description': 'Search the web',\n\u00a0 'parameters': {'type': 'object',\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'required': ['query'],\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'properties': {\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'query': {'type':'str', 'description':'the topic or subject to search on the web'},\n}}}}\n\nsearch_web(query=\"nvidia\")<\/code><\/pre>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXejc51pYTVnpttlJpPQKo3XJJUn0s7J9bucFRZFPKyShyO_hzE_aq7xEAA1sV5hLKm84FQU4ukEXnRZIiHiS1nnaDxdttumbw1bAKVCG7OjNfsNh4T7F4I5Mw-ntBSM_izaY4Ly7g?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\"><\/figure>\n<p class=\"wp-block-paragraph\">In total, the Agent now has 3 tools.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">dic_tools = {'final_answer':final_answer,\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0  'search_web':search_web,\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0  'code_exec':code_exec}<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Since I can\u2019t add the full dataframe in the prompt, I shall feed only the first 10 rows so that the LLM can understand the general context of the dataset. Additionally, I will specify where to find the full dataset.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">str_data = \"n\".join([str(row) for row in dtf.head(10).to_dict(orient='records')])\n\nprompt = f'''\nYou are a Data Analyst, you will be given a task to solve as best you can.\nYou have access to the following tools:\n- tool 'final_answer' to return a text response.\n- tool 'code_exec' to execute Python code.\n- tool 'search_web' to search for information on the internet.\n\nIf you use the 'code_exec' tool, remember to always use the function print() to get the output.\nThe dataset already exists and it's called 'dtf', don't create a new one.\n\nThis dataset contains credit score for each customer of the bank. Here's the first rows:\n{str_data}\n'''<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Finally, we can run the Agent.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">messages = [{\"role\":\"system\", \"content\":prompt}]\nmemory = '''\nThe dataset already exists and it's called 'dtf', don't create a new one.\n'''\nwhile True:\n\u00a0 \u00a0 ## User\n\u00a0 \u00a0 q = input('<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f642.png?ssl=1\" alt=\"\ud83d\ude42\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> &gt;')\n\u00a0 \u00a0 if q == \"quit\":\n\u00a0 \u00a0 \u00a0 \u00a0 break\n\u00a0 \u00a0 messages.append( {\"role\":\"user\", \"content\":q} )\n\n\u00a0 \u00a0 ## Memory\n\u00a0 \u00a0 messages.append( {\"role\":\"user\", \"content\":memory} )\u00a0 \u00a0 \u00a0\n\u00a0 \u00a0\n\u00a0 \u00a0 ## Model\n\u00a0 \u00a0 available_tools = {\"final_answer\":tool_final_answer, \"code_exec\":tool_code_exec, \"search_web\":tool_search_web}\n\u00a0 \u00a0 res = run_agent(llm, messages, available_tools)\n\u00a0 \u00a0\n\u00a0 \u00a0 ## Response\n\u00a0 \u00a0 print(\"<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f47d.png?ssl=1\" alt=\"\ud83d\udc7d\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> &gt;\", f\"x1b[1;30m{res}x1b[0m\")\n\u00a0 \u00a0 messages.append( {\"role\":\"assistant\", \"content\":res} )<\/code><\/pre>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXeFJ773JObfczgA0wSU23kX3XcYAcwjIhXjpetOK9b_8xq69XdiuGyJThMOk-s820QpQZoFqn-bb_BIkv6rBR0F-YFn_B9XWfTExTAPqvbYEJt47haItrXOLTujRTRoLM269eTOFQ?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\"><\/figure>\n<p class=\"wp-block-paragraph\">In this interaction, the Agent used the coding Tool properly. Now, I want to make it utilize the other tool as well.<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXf3tKZXFQSSdjudlb9jfy83Ud47OhlobPpY48pB0dnTX5kUlMS1fIulE6T19TsNQOs7BlHKFJ8sFoa01ca42G25n5kEE-PMzYuo2zvC-SLpk_QGH4eZKS88VEYZO-yFI1u1Qso_KA?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\"><\/figure>\n<p class=\"wp-block-paragraph\">At last, I need the Agent to put together all the pieces of information obtained so far from this chat.\u00a0<\/p>\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n<p class=\"wp-block-paragraph\">This article has been a tutorial to demonstrate <strong>how to build from scratch Agents that process time series and large dataframes<\/strong>. We covered both ways that models can interact with the data: through natural language, where the LLM interprets the table as a string using its knowledge base, and by generating and executing code, leveraging tools to process the dataset as an object.<\/p>\n<p class=\"wp-block-paragraph\">Full code for this article: <a href=\"https:\/\/github.com\/mdipietro09\/GenerativeAI\/blob\/main\/Agents_ZeroToHero\/notebook.ipynb\"><strong>GitHub<\/strong><\/a><\/p>\n<p class=\"wp-block-paragraph\">I hope you enjoyed it! Feel free to contact me for questions and feedback, or just to share your interesting projects.<\/p>\n<p class=\"has-text-align-center wp-block-paragraph\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f449.png?ssl=1\" alt=\"\ud83d\udc49\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\">\u00a0<a href=\"https:\/\/maurodp.carrd.co\/\"><strong>Let\u2019s Connect<\/strong><\/a>\u00a0<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f448.png?ssl=1\" alt=\"\ud83d\udc48\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXdgOCU7Qi5PrdCBaZvLdupah64sFhrNrtOKua-OukkT-48c06bTqRcLfIZ83RuzZxQ5C2RLZOe1zHR38AxosXiSfT3DlMTBp6lmFXyJkD_9JpGeiOm2FiPSmLnsnl6NRmzWyEU6?key=MCyGNqQxDXoyqd9WC2WjrRH-\" alt=\"\"><\/figure>\n<p class=\"wp-block-paragraph\">\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/ai-agents-processing-timeseries-and-large-dataframes\/\">AI Agents Processing Time Series and Large Dataframes<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Mauro Di Pietro<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/ai-agents-processing-timeseries-and-large-dataframes\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI Agents Processing Time Series and Large Dataframes Intro Agents are AI systems, powered by LLMs, that can reason about their objectives and take actions to achieve a final goal. They are designed not just to respond to queries, but to orchestrate a sequence of operations, including processing data (i.e. dataframes and time series). This [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[678,62,69,938,83,240,354],"tags":[1140,98,368],"class_list":["post-3274","post","type-post","status-publish","format-standard","hentry","category-agentic-ai","category-aimldsaimlds","category-artificial-intelligence","category-data-processing","category-data-science","category-editors-pick","category-time-series-analysis","tag-agents","tag-ai","tag-code"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/3274"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=3274"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/3274\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=3274"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=3274"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=3274"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}