{"id":3509,"date":"2025-05-02T07:05:20","date_gmt":"2025-05-02T07:05:20","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/05\/02\/step-by-step-guide-to-build-and-deploy-an-llm-powered-chat-with-memory-in-streamlit\/"},"modified":"2025-05-02T07:05:20","modified_gmt":"2025-05-02T07:05:20","slug":"step-by-step-guide-to-build-and-deploy-an-llm-powered-chat-with-memory-in-streamlit","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/05\/02\/step-by-step-guide-to-build-and-deploy-an-llm-powered-chat-with-memory-in-streamlit\/","title":{"rendered":"Step-by-Step Guide to Build and Deploy an LLM-Powered Chat with Memory in Streamlit"},"content":{"rendered":"<p>    Step-by-Step Guide to Build and Deploy an LLM-Powered Chat with Memory in Streamlit<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-paragraph\"><mdspan datatext=\"el1746144864726\" class=\"mdspan-comment\">In this post<\/mdspan>, I\u2019ll show you step by step how to build and deploy a chat powered with LLM\u200a\u2014\u200a<strong><a href=\"https:\/\/towardsdatascience.com\/tag\/gemini\/\" title=\"Gemini\">Gemini<\/a>\u200a<\/strong>\u2014<strong>\u200a<\/strong>in Streamlit and monitor the API usage on Google Cloud Console. Streamlit is a Python framework that makes it super easy to turn your Python scripts into interactive web apps, with almost no front-end work.<\/p>\n<p class=\"wp-block-paragraph\">Recently, I built a project, <a href=\"https:\/\/medium.com\/p\/c01cdc077bfa\" target=\"_blank\" rel=\"noreferrer noopener\">bordAI<\/a> \u2014 a chat assistant powered by LLM integrated with tools I developed to support embroidery projects. After that, I decided to start this series of posts to share tips I\u2019ve learned along the way.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Here\u2019s a quick summary of the post:<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><em>1 to 6\u200a\u2014\u200aProject Setup<\/em><\/p>\n<\/blockquote>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><em>7 to 13\u200a\u2014\u200aBuilding the Chat<\/em><\/p>\n<\/blockquote>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><em>14 to 15\u2014 Deploy and Monitor the app<\/em><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">1. Create a New GitHub repository<\/h2>\n<p class=\"wp-block-paragraph\">Go to <a href=\"https:\/\/github.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub<\/a> and create a new repository.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">2. Clone the repository locally<\/h2>\n<p class=\"wp-block-paragraph\">\u2192 Execute this command in your terminal to clone it:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">git clone &lt;your-repository-url&gt;<\/code><\/pre>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">3. Set Up a Virtual Environment (optional)<\/h2>\n<p class=\"wp-block-paragraph\">A Virtual Environment is like a separate space on your computer where you can install a specific version of Python and libraries without affecting the rest of your system. This is useful because different projects might need different versions of the same libraries.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u2192 To create a virtual environment:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">pyenv virtualenv 3.9.14 chat-streamlit-tutorial<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\u2192 To activate it:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">pyenv activate chat-streamlit-tutorial<\/code><\/pre>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">4. Project Structure<\/h2>\n<p class=\"wp-block-paragraph\">A project structure is just a way to organize all the files and folders for your project. Ours will look like this:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">chat-streamlit-tutorial\/\n\u2502\n\u251c\u2500\u2500 .env\n\u251c\u2500\u2500 .gitignore\n\u251c\u2500\u2500 app.py\n\u251c\u2500\u2500 functions.py\n\u251c\u2500\u2500 requirements.txt\n\u2514\u2500\u2500 README.md<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<code>.env<\/code>\u2192 file where you store your API key (not pushed to GitHub)<\/li>\n<li class=\"wp-block-list-item\">\n<strong><code>.gitignore<\/code><\/strong> \u2192 file where you list the files or folders for git to ignore\u00a0<\/li>\n<li class=\"wp-block-list-item\">\n<strong><code>app.py<\/code><\/strong> \u2192 main streamlit app<\/li>\n<li class=\"wp-block-list-item\">\n<strong><code>functions.py<\/code><\/strong> \u2192 custom functions to better organize the code<\/li>\n<li class=\"wp-block-list-item\">\n<strong><code>requirements.txt<\/code><\/strong> \u2192 list of libraries your project needs<\/li>\n<li class=\"wp-block-list-item\">\n<strong><code>README.md<\/code><\/strong> \u2192 file that explains what your project is about<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">\u2192 Execute this inside your project folder to create these files:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">touch .env .gitignore app.py functions.py requirements.txt<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\u2192 Inside the file\u00a0<code>.gitignore<\/code>, add:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">.env\n__pycache__\/<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\u2192 Add this to the <code>requirements.txt<\/code>:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">streamlit\ngoogle-generativeai\npython-dotenv<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\u2192 Install dependencies:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">pip install -r requirements.txt<\/code><\/pre>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">5. Get API\u00a0Key<\/h2>\n<p class=\"wp-block-paragraph\">An API Key is like a password that tells a service you have permission to use it. In this project, we\u2019ll use the Gemini API because they have a free tier, so you can play around with it without spending money.\u00a0<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Go to <a href=\"https:\/\/aistudio.google.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/aistudio.google.com\/<\/a>\n<\/li>\n<li class=\"wp-block-list-item\">Create or log in to your account.<\/li>\n<li class=\"wp-block-list-item\">Click on \u201c<strong>Create API Key<\/strong>\u201c, create it, and copy it.\u00a0<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">Don\u2019t set up billing if you just want to use the free tier. It should say \u201cFree\u201d under \u201cPlan\u201d, just like here:<\/p>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/1zbTahsgJnMtFcp3_JKfI5Q.png?ssl=1\" alt=\"\" class=\"wp-image-603048\"><figcaption class=\"wp-element-caption\">Image by the\u00a0author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">We\u2019ll use <strong>gemini-2.0-flash<\/strong> in this project. It offers a free tier, as you can see in the table below:<\/p>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/1JscOlE-46RiOLzl5SE6ckg.png?ssl=1\" alt=\"\" class=\"wp-image-603043\"><figcaption class=\"wp-element-caption\">Screenshot by the author from <a href=\"https:\/\/aistudio.google.com\/plan_information\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/aistudio.google.com\/plan_information<\/a><\/figcaption><\/figure>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">15 RPM = 15 Requests per minute<\/li>\n<li class=\"wp-block-list-item\">1,000,000 TPM = 1 Million Tokens Per Minute<\/li>\n<li class=\"wp-block-list-item\">1,500 RPD = 1,500 Requests Per Day<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\"><em>Note: These limits are accurate as of April 2025 and may change over time.\u00a0<\/em><\/p>\n<p class=\"wp-block-paragraph\">Just a heads up: if you are using the free tier, Google may use your prompts to improve their products, including human reviews, so it\u2019s not recommended to send sensitive information. If you want to read more about this, check this <a href=\"https:\/\/ai.google.dev\/gemini-api\/terms\" rel=\"noreferrer noopener\" target=\"_blank\">link<\/a>.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">6. Store your API\u00a0Key<\/h2>\n<p class=\"wp-block-paragraph\">We\u2019ll store our API Key inside a\u00a0<code>.env<\/code> file. A\u00a0<strong><code>.env<\/code> <\/strong>file is a simple text file where you store secret information, so you don\u2019t write it directly in your code. We don\u2019t want it going to GitHub, so we have to add it to our\u00a0<strong><code>.gitignore<\/code> <\/strong>file. This file determines which files git should literally ignore when you push your changes to the repository. I\u2019ve already mentioned this in part 4, \u201cProject Structure\u201d, but just in case you missed it, I\u2019m repeating it here.<\/p>\n<p class=\"wp-block-paragraph\">This step is really important, don\u2019t forget it!<br \/><strong>\u2192 Add this to\u00a0<code>.gitignore<\/code>:\u00a0<\/strong><\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">.env\n__pycache__\/<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">\u2192 Add the API Key to\u00a0<code>.env<\/code>:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">API_KEY= \"your-api-key\"<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">If you\u2019re running locally,\u00a0<code>.env<\/code> works fine. However, if you\u2019re deploying in Streamlit later, you will have to use <code>st.secrets<\/code>. Here I\u2019ve included a code that can work in both scenarios.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u2192Add this function to your <code>functions.py<\/code>:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import streamlit as st\nimport os\nfrom dotenv import load_dotenv\n\ndef get_secret(key):\n    \"\"\"\n    Get a secret from Streamlit or fallback to .env for local development.\n\n    This allows the app to run both on Streamlit Cloud and locally.\n    \"\"\"\n    try:\n        return st.secrets[key]\n    except Exception:\n        load_dotenv()\n        return os.getenv(key)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\u2192 Add this to your <code>app.py<\/code>:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import streamlit as st\nimport google.generativeai as genai\nfrom functions import get_secret\n\napi_key = get_secret(\"API_KEY\")<\/code><\/pre>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">7. Choose the\u00a0model\u00a0<\/h2>\n<p class=\"wp-block-paragraph\">I chose <strong>gemini-2.0-flash<\/strong> for this project because I think it\u2019s a great model with a generous free tier. However, you can explore other model options that also offer free tiers and choose your preferred one.<\/p>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/1SMQaULV1V9S9OPgXKZe5fA.png?ssl=1\" alt=\"\" class=\"wp-image-603041\"><figcaption class=\"wp-element-caption\">Screenshot by the author from <a href=\"https:\/\/aistudio.google.com\/plan_information\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/aistudio.google.com\/plan_information<\/a><\/figcaption><\/figure>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Pro<\/strong>: models designed for <strong>high<\/strong>\u2013<strong>quality<\/strong> outputs, including reasoning and creativity. Generally used for complex tasks, problem-solving, and content generation. They are multimodal\u200a\u2014\u200athis means they can process text, image, video, and audio for input and output.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Flash<\/strong>: models projected for <strong>speed<\/strong> and <strong>cost efficiency.<\/strong> Can have lower-quality answers compared to the Pro for complex tasks. Generally used for chatbots, assistants, and real-time applications like automatic phrase completion. They are multimodal for input, and for output is currently just text, other features are in development.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Lite<\/strong>: even faster and cheaper than Flash, but with some reduced capabilities, such as it is multimodal only for input and text-only output. Its main characteristic is that it is <strong>more economical<\/strong> than the Flash, ideal for generating large amounts of text within cost restrictions.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">This <a href=\"https:\/\/ai.google.dev\/gemini-api\/docs\/models#gemini-2.0-flash\" rel=\"noreferrer noopener\" target=\"_blank\">link<\/a> has plenty of details about the models and their differences.<\/p>\n<p class=\"wp-block-paragraph\">Here we are setting up the model. Just replace \u201cgemini-2.0-flash\u201d with the model you\u2019ve chosen.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u2192 Add this to your <code>app.py<\/code>:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">genai.configure(api_key=api_key)\nmodel = genai.GenerativeModel(\"gemini-2.0-flash\")<\/code><\/pre>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">8. Build the\u00a0chat<\/h2>\n<p class=\"wp-block-paragraph\">First, let\u2019s discuss the key concepts we\u2019ll use:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong><code>st.session_state<\/code><\/strong>: this works like a <strong>memory<\/strong> for your app. Streamlit reruns your script from top to bottom every time something changes\u200a\u2014\u200awhen you send a message or click a button\u200a\u2014\u200a so normally, all the variables would be reset. This allows Streamlit to remember values between reruns. However, if you refresh your web page you\u2019ll lose the <code>session_state<\/code>.\u00a0<\/li>\n<li class=\"wp-block-list-item\">\n<strong><code>st.chat_message(name, avatar)<\/code><\/strong>: Creates a chat bubble for a message in the interface. The first parameter is the <strong>name<\/strong> of the message author, which<strong> <\/strong>can be <em>\u201cuser\u201d, \u201chuman\u201d, \u201cassistant\u201d, \u201cai\u201d, or str. <\/em>If you use user\/human and assistant\/ai, it already has default <strong>avatars<\/strong> of user and bot icons. You can change this if you want to. Check out the <a href=\"https:\/\/docs.streamlit.io\/develop\/api-reference\/chat\/st.chat_message\" target=\"_blank\" rel=\"noreferrer noopener\">documentation<\/a> for more details.<\/li>\n<li class=\"wp-block-list-item\">\n<strong><code>st.chat_input(placeholder)<\/code><\/strong>: Displays an input box at the bottom for the user to type messages. It has many parameters, so I recommend you check out the <a href=\"https:\/\/docs.streamlit.io\/develop\/api-reference\/chat\/st.chat_input\" target=\"_blank\" rel=\"noreferrer noopener\">documentation<\/a>.\u00a0<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">First, I\u2019ll explain each part of the code separately, and after I\u2019ll show you the whole code together.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">This initial step initializes your <strong><code>session_state<\/code><\/strong>, the app\u2019s \u201cmemory\u201d, to keep all the messages within one session.\u00a0<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">if \"chat_history\" not in st.session_state:\n    st.session_state.chat_history = []<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">Next, we\u2019ll set the first default message. This is optional, but I like to add it. You could add some initial instructions if suitable for your context. Every time Streamlit runs the page and <code>st.session_state.chat_history<\/code> is empty, it\u2019ll append this message to the history with the role \u201cassistant\u201d.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">if not st.session_state.chat_history:\n    st.session_state.chat_history.append((\"assistant\", \"Hi! How can I help you?\"))<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">In my app bordAI, I added this initial message giving context and instructions for my app:<\/p>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/1I3oSFxnUVuDnjQPyQMfSOA.png?ssl=1\" alt=\"\" class=\"wp-image-603053\"><figcaption class=\"wp-element-caption\">Image by the\u00a0author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">For the <strong>user<\/strong> part, the first line creates the input box. If <code>user_message<\/code> contains content, it writes it to the interface and then appends it to <code>chat_history<\/code>.\u00a0<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">user_message = st.chat_input(\"Type your message...\")\n\nif user_message:\n    st.chat_message(\"user\").write(user_message)\n    st.session_state.chat_history.append((\"user\", user_message))<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">Now let\u2019s add the <strong>assistant<\/strong> part:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong><code>system_prompt<\/code><\/strong> is the prompt sent to the model. You could just send the <code>user_message<\/code> in place of <code>full_input<\/code> (look at the code below). However, the output might not be precise. A prompt provides context and instructions about <strong>how<\/strong> you want the model to behave, not just <strong>what<\/strong> you want it to answer. <strong>A good prompt makes the model\u2019s response more accurate, consistent, and aligned with your goals.<\/strong> In addition, without telling how our model should behave, it\u2019s vulnerable to <strong>prompt injections<\/strong>.\u00a0<\/li>\n<\/ul>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><em><strong>Prompt injection<\/strong> is when someone tries to manipulate the model\u2019s prompt in order to alter its behavior. One way to mitigate this is to structure prompts clearly and delimit the user\u2019s message within triple quotes.\u00a0<\/em><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">We\u2019ll start with a simple and unclear <strong><code>system_prompt<\/code> <\/strong>and in the next session we\u2019ll make it better to compare the difference.\u00a0<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong><code>full_input<\/code><\/strong>: here, we\u2019re organizing the input, delimiting the user message with triple quotes (\u201c\u201d\u201d). This doesn\u2019t prevent all prompt injections, but it is one way to create better and more reliable interactions.\u00a0<\/li>\n<li class=\"wp-block-list-item\">\n<strong><code>response<\/code><\/strong>: sends a request to the API, storing the output in response.\u00a0<\/li>\n<li class=\"wp-block-list-item\">\n<strong><code>assistant_reply<\/code><\/strong>: extracts the text from the response.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">Finally, we use <code>st.chat_message()<\/code> combined to <code>write()<\/code> to display the assistant reply and append it to the <code>st.session_state.chat_history<\/code>, just like we did with the user.\u00a0<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">if user_message:\n    st.chat_message(\"user\").write(user_message)\n    st.session_state.chat_history.append((\"user\", user_message))\n    \n    system_prompt = f\"\"\"\n    You are an assistant.\n    Be nice and kind in all your responses.\n    \"\"\"\n    full_input = f\"{system_prompt}nnUser message:n\"\"\"{user_message}\"\"\"\"\n\n    response = model.generate_content(full_input)\n    assistant_reply = response.text\n\n    st.chat_message(\"assistant\").write(assistant_reply)\n    st.session_state.chat_history.append((\"assistant\", assistant_reply))<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">Now let\u2019s see everything together!<\/p>\n<p class=\"wp-block-paragraph\">\u2192 Add this to your <code>app.py<\/code>:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import streamlit as st\nimport google.generativeai as genai\nfrom functions import get_secret\n\napi_key = get_secret(\"API_KEY\")\ngenai.configure(api_key=api_key)\nmodel = genai.GenerativeModel(\"gemini-2.0-flash\")\n\nif \"chat_history\" not in st.session_state:\n    st.session_state.chat_history = []\n\nif not st.session_state.chat_history:\n    st.session_state.chat_history.append((\"assistant\", \"Hi! How can I help you?\"))\n\nuser_message = st.chat_input(\"Type your message...\")\n\nif user_message:\n    st.chat_message(\"user\").write(user_message)\n    st.session_state.chat_history.append((\"user\", user_message))\n\n    system_prompt = f\"\"\"\n    You are an assistant.\n    Be nice and kind in all your responses.\n    \"\"\"\n    full_input = f\"{system_prompt}nnUser message:n\"\"\"{user_message}\"\"\"\"\n\n    response = model.generate_content(full_input)\n    assistant_reply = response.text\n\n    st.chat_message(\"assistant\").write(assistant_reply)\n    st.session_state.chat_history.append((\"assistant\", assistant_reply))<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">To run and test your app locally, first navigate to the project folder, then execute the following command.<\/p>\n<p class=\"wp-block-paragraph\">\u2192 Execute in your terminal:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">cd chat-streamlit-tutorial\nstreamlit run app.py<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\"><strong>Yay!<\/strong> You now have a chat running in Streamlit!<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">9. Prompt Engineering\u00a0<\/h2>\n<p class=\"wp-block-paragraph\">Prompt Engineering is a process of writing instructions to get the best possible output from an AI model.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">There are plenty of techniques for prompt engineering. Here are 5 tips:<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Write clear and specific instructions.<\/li>\n<li class=\"wp-block-list-item\">Define a role, expected behavior, and rules for the assistant.<\/li>\n<li class=\"wp-block-list-item\">Give the right amount of context.<\/li>\n<li class=\"wp-block-list-item\">Use the delimiters to indicate user input (as I explained in part 8).<\/li>\n<li class=\"wp-block-list-item\">Ask for the output in a specified format.<\/li>\n<\/ol>\n<p class=\"wp-block-paragraph\">These tips can be applied to the <code>system_prompt<\/code> or when you\u2019re writing a prompt to interact with the chat assistant.<\/p>\n<p class=\"wp-block-paragraph\">Our current system prompt is:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">system_prompt = f\"\"\"\nYou are an assistant.\nBe nice and kind in all your responses.\n\"\"\"<\/code><\/pre>\n<p class=\"wp-block-paragraph\">It is super vague and provides no guidance to the model.\u00a0<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">No clear direction for the assistant, what kind of help it should provide<\/li>\n<li class=\"wp-block-list-item\">No specification of the role or what is the topic of the assistance<\/li>\n<li class=\"wp-block-list-item\">No guidelines for structuring the output<\/li>\n<li class=\"wp-block-list-item\">No context on whether it should be technical or casual<\/li>\n<li class=\"wp-block-list-item\">Lack of boundaries\u00a0<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">We can improve our prompt based on the tips above. Here\u2019s an example.<\/p>\n<p class=\"wp-block-paragraph\">\u2192 Change the <code>system_prompt<\/code> in the <code>app.py<\/code>:\u00a0<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">system_prompt = f\"\"\"\nYou are a friendly and a programming tutor.\nAlways explain concepts in a simple and clear way, using examples when possible.\nIf the user asks something unrelated to programming, politely bring the conversation back to programming topics.\n\"\"\"\nfull_input = f\"{system_prompt}nnUser message:n\"\"\"{user_message}\"\"\"\"<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">If we ask \u201cWhat is python?\u201d to the old prompt, it just gives a generic short answer:<\/p>\n<figure class=\"wp-block-image aligncenter is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/1tCPim-xZ4OWWiiGsEki66Q.png?ssl=1\" alt=\"\" class=\"wp-image-603046\" style=\"width:446px;height:auto\"><figcaption class=\"wp-element-caption\">Image by the\u00a0author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">With the new prompt, it provides a more detailed response with examples:<\/p>\n<figure class=\"wp-block-image aligncenter is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/1ZKyWt6Lwqx_uwluIlM13eg.png?ssl=1\" alt=\"\" class=\"wp-image-603054\" style=\"width:344px;height:auto\"><figcaption class=\"wp-element-caption\">Image by the\u00a0author<\/figcaption><\/figure>\n<figure class=\"wp-block-image aligncenter is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/1d4quNkJ2_gbWqeFwwim5hg.png?ssl=1\" alt=\"\" class=\"wp-image-603040\" style=\"width:339px;height:auto\"><figcaption class=\"wp-element-caption\">Image by the\u00a0author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Try changing the <strong><code>system_prompt<\/code><\/strong> yourself to see the difference in the model outputs and craft the ideal prompt for your context!<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">10. Choose Generate Content Parameters<\/h2>\n<p class=\"wp-block-paragraph\">There are many parameters you can configure when generating content. Here I\u2019ll demonstrate how <strong><code>temperature<\/code><\/strong> and <strong><code>maxOutputTokens<\/code><\/strong> work. Check the <a href=\"https:\/\/ai.google.dev\/api\/generate-content#v1beta.GenerationConfig\" target=\"_blank\" rel=\"noreferrer noopener\">documentation<\/a> for more details.<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong><code>temperature<\/code><\/strong>: controls the randomness of the output, ranging from 0 to 2. The default is 1. Lower values produce more deterministic outputs, while higher values produce more creative ones.<\/li>\n<li class=\"wp-block-list-item\">\n<strong><code>maxOutputTokens<\/code><\/strong>: the maximum number of tokens that can be generated in the output. A token is approximately four characters.\u00a0<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">To change the temperature dynamically and test it, you can create a sidebar slider to control this parameter.<\/p>\n<p class=\"wp-block-paragraph\">\u2192 Add this to <code>app.py<\/code>:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\">temperature = st.sidebar.slider(\n    label=\"Select the temperature\",\n    min_value=0.0,\n    max_value=2.0,\n    value=1.0\n)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\u2192 Change the <code>response<\/code> variable to:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\">response = model.generate_content(\n    full_input,\n    generation_config={\n        \"temperature\": temperature,\n        \"max_output_tokens\": 1000\n    }\n)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">The sidebar will look like this:<\/p>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/1Bl37GvT_iGo6g43p68eH2g.png?ssl=1\" alt=\"\" class=\"wp-image-603044\"><figcaption class=\"wp-element-caption\">Image by the\u00a0author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Try adjusting the temperature to see how the output changes!<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">11. Display chat\u00a0history\u00a0<\/h2>\n<p class=\"wp-block-paragraph\">This step ensures that you keep track of all the exchanged messages in the chat, so you can see the chat history. Without this, you\u2019d only see the latest messages from the assistant and user each time you send something.<\/p>\n<p class=\"wp-block-paragraph\">This code accesses everything appended to <code>chat_history<\/code> and displays it in the interface.<\/p>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">\u2192 Add this before the <code>if user_message<\/code> in <code>app.py<\/code>:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\">for role, message in st.session_state.chat_history:\n    st.chat_message(role).write(message)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">Now, all the messages within one session are kept visible in the interface:<\/p>\n<figure class=\"wp-block-image aligncenter is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/1hyXbUt2kAaIFwsInIJJQnw.png?ssl=1\" alt=\"\" class=\"wp-image-603051\" style=\"width:420px;height:auto\"><figcaption class=\"wp-element-caption\">Image by the\u00a0author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Obs: I tried to ask a non-programming question, and the assistant tried to change the subject back to programming. Our prompt is working!<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">12. Chat with\u00a0memory\u00a0<\/h2>\n<p class=\"wp-block-paragraph\">Besides having messages stored in <code>chat_history<\/code>, our model isn\u2019t aware of the context of our conversation. It is stateless, each transaction is independent.\u00a0<\/p>\n<figure class=\"wp-block-image aligncenter is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/17Lmg9EiuPb61IQLeNTLV-g.png?ssl=1\" alt=\"\" class=\"wp-image-603052\" style=\"width:460px;height:auto\"><figcaption class=\"wp-element-caption\">Image by the\u00a0author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">To solve this, we have to pass all this context inside our prompt so the model can reference previous messages exchanged.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Create <strong><code>context<\/code><\/strong> which is a list containing all the messages exchanged until that moment. Adding lastly the most recent user message, so it doesn\u2019t get lost in the context.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">system_prompt = f\"\"\"\nYou are a friendly and knowledgeable programming tutor.\nAlways explain concepts in a simple and clear way, using examples when possible.\nIf the user asks something unrelated to programming, politely bring the conversation back to programming topics.\n\"\"\"\nfull_input = f\"{system_prompt}nnUser message:n\"\"\"{user_message}\"\"\"\"\n\ncontext = [\n    *[\n        {\"role\": role, \"parts\": [{\"text\": msg}]} for role, msg in st.session_state.chat_history\n    ],\n    {\"role\": \"user\", \"parts\": [{\"text\": full_input}]}\n]\n\nresponse = model.generate_content(\n    context,\n    generation_config={\n        \"temperature\": temperature,\n        \"max_output_tokens\": 1000\n    }\n)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">Now, I told the assistant that I was working on a project to analyze weather data. Then I asked what the theme of my project was and it correctly answered \u201cweather data analysis\u201d, as it now has the context of the previous messages.\u00a0<\/p>\n<figure class=\"wp-block-image aligncenter is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/1s8bXPGMVOBAkHddXsubvfw.png?ssl=1\" alt=\"\" class=\"wp-image-603049\" style=\"width:468px;height:auto\"><figcaption class=\"wp-element-caption\">Image by the\u00a0author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">If your context gets too long, <strong>you can consider summarizing it<\/strong> to save costs, since the more tokens you send to the API, the more you\u2019ll pay.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">13. Create a Reset Button (optional)\u00a0<\/h2>\n<p class=\"wp-block-paragraph\">I like adding a reset button in case something goes wrong or the user just wants to clear the conversation.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">You just need to create a function to set de <code>chat_history<\/code> as an empty list. If you created other session states, you should set them here as False or empty, too.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">\u2192 Add this to <code>functions.py<\/code>:\u00a0<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def reset_chat():\n    \"\"\"\n    Reset the Streamlit chat session state.\n    \"\"\"\n    st.session_state.chat_history = []\n    st.session_state.example = False # Add others if needed<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\u2192 And if you want it in the sidebar, add this to <code>app.py<\/code>:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">from functions import get_secret, reset_chat\n\nif st.sidebar.button(\"Reset chat\"):\n    reset_chat()<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">It will look like this:<\/p>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/1J2nOtboanbzSScQw1NOFuA.png?ssl=1\" alt=\"\" class=\"wp-image-603045\"><figcaption class=\"wp-element-caption\">Image by the\u00a0author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">Everything together:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import streamlit as st\nimport google.generativeai as genai\nfrom functions import get_secret, reset_chat\n\napi_key = get_secret(\"API_KEY\")\ngenai.configure(api_key=api_key)\nmodel = genai.GenerativeModel(\"gemini-2.0-flash\")\n\ntemperature = st.sidebar.slider(\n    label=\"Select the temperature\",\n    min_value=0.0,\n    max_value=2.0,\n    value=1.0\n)\n\nif st.sidebar.button(\"Reset chat\"):\n    reset_chat()\n\nif \"chat_history\" not in st.session_state:\n    st.session_state.chat_history = []\n\nif not st.session_state.chat_history:\n    st.session_state.chat_history.append((\"assistant\", \"Hi! How can I help you?\"))\n\nfor role, message in st.session_state.chat_history:\n    st.chat_message(role).write(message)\n\nuser_message = st.chat_input(\"Type your message...\")\n\nif user_message:\n    st.chat_message(\"user\").write(user_message)\n    st.session_state.chat_history.append((\"user\", user_message))\n\n    system_prompt = f\"\"\"\n    You are a friendly and a programming tutor.\n    Always explain concepts in a simple and clear way, using examples when possible.\n    If the user asks something unrelated to programming, politely bring the conversation back to programming topics.\n    \"\"\"\n    full_input = f\"{system_prompt}nnUser message:n\"\"\"{user_message}\"\"\"\"\n\n    context = [\n        *[\n            {\"role\": role, \"parts\": [{\"text\": msg}]} for role, msg in st.session_state.chat_history\n        ],\n        {\"role\": \"user\", \"parts\": [{\"text\": full_input}]}\n    ]\n\n    response = model.generate_content(\n        context,\n        generation_config={\n            \"temperature\": temperature,\n            \"max_output_tokens\": 1000\n        }\n    )\n    assistant_reply = response.text\n\n    st.chat_message(\"assistant\").write(assistant_reply)\n    st.session_state.chat_history.append((\"assistant\", assistant_reply))<\/code><\/pre>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">14. Deploy<\/h2>\n<p class=\"wp-block-paragraph\">If your repository is public, you can deploy with Streamlit for free.\u00a0<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">MAKE SURE YOU DO NOT HAVE API KEYS ON YOUR PUBLIC REPOSITORY.<\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">First, save and push your code to the repository.<\/p>\n<p class=\"wp-block-paragraph\">\u2192 Execute in your terminal:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">git add .\ngit commit -m \"tutorial chat streamlit\"\ngit push origin main<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Pushing directly into the <code>main<\/code> isn\u2019t a best practice, but since it\u2019s just a simple tutorial, we\u2019ll do it for convenience.\u00a0<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Go to your streamlit app that is running locally.<\/li>\n<li class=\"wp-block-list-item\">Click on \u201cDeploy\u201d at the top right.<\/li>\n<li class=\"wp-block-list-item\">In Streamlit Community Cloud, click \u201cDeploy now\u201d.<\/li>\n<li class=\"wp-block-list-item\">Fill out the information.<\/li>\n<\/ol>\n<figure class=\"wp-block-image aligncenter\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/16tWJ-hH-6ra7pfM_jqHzSQ.png?ssl=1\" alt=\"\" class=\"wp-image-603047\"><figcaption class=\"wp-element-caption\">Image by the\u00a0author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">5. Click on \u201c<strong>Advanced settings<\/strong>\u201d and write <code>API_KEY=\"your-api-key\"<\/code>, just like you did with the\u00a0<code>.env<\/code> file.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">6. Click \u201cDeploy\u201d.<\/p>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">All done! If you\u2019d like, check out my app <a href=\"https:\/\/chat-tutorial.streamlit.app\/\" rel=\"noreferrer noopener\" target=\"_blank\">here<\/a>! <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f389.png?ssl=1\" alt=\"\ud83c\udf89\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">15. Monitor API usage on Google\u00a0Console\u00a0<\/h2>\n<p class=\"wp-block-paragraph\">The last part of this post shows you how to monitor API usage on the Google Cloud Console. This is important if you deploy your app publicly, so you don\u2019t have any surprises.<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Access <a href=\"https:\/\/console.cloud.google.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Google Cloud Console<\/a>.<\/li>\n<li class=\"wp-block-list-item\">Go to \u201cAPIs and services\u201d.<\/li>\n<li class=\"wp-block-list-item\">Click on \u201cGenerative Language API\u201d.<\/li>\n<\/ol>\n<figure class=\"wp-block-image aligncenter\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/11NGmuGqRm_ATgrFTLAoWAA.png?ssl=1\" alt=\"\" class=\"wp-image-603050\"><figcaption class=\"wp-element-caption\">Image by the\u00a0author<\/figcaption><\/figure>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Requests:<\/strong> how many times your API was called. In our case, the API is called each time we run <strong><code>model.generate_content(context)<\/code><\/strong>.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Error (%): <\/strong>the percentage of requests that failed. Errors can have the code <strong>4xx<\/strong> which is usually the user\u2019s\/requester\u2019s fault\u200a\u2014\u200afor instance, <strong>400<\/strong> for <strong>bad input,<\/strong> and <strong>429<\/strong> means you\u2019re <strong>hitting the API too frequently<\/strong>. In addition, errors with the code <strong>5xx<\/strong> are usually the system\u2019s\/server\u2019s fault and are less common. Google typically retries internally or recommends retrying after a few seconds\u200a\u2014\u200ae.g. <strong>500<\/strong> for <strong>Internal Server Error <\/strong>and<strong> 503 <\/strong>for<strong> Service Unavailable.<\/strong>\n<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Latency, median (ms)<\/strong>: This shows how long (in milliseconds) it takes for your service to respond, at the 50th percentile\u200a\u2014\u200ameaning half the requests are faster and half are slower. It\u2019s a good general measure of your service\u2019s speed, answering the question, \u201cHow fast is it normally?\u201d.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Latency, 95% (ms)<\/strong>: This shows the response time at the 95th percentile\u200a\u2014\u200ameaning 95% of requests are faster than this time, and only 5% slower. It helps to identify how your system behaves under heavy load or with slower cases, answering the question, \u201cHow bad is it getting for some users?\u201d.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\"><strong>A quick example of the difference between Latency median and Latency p95:<\/strong><br \/>Imagine your service usually responds in <strong>200ms<\/strong>:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Median latency = 200ms (good!)<\/li>\n<li class=\"wp-block-list-item\">p95 latency = 220ms (also good)<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">Now under heavy load:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Median latency = 220ms (still looks OK)<\/li>\n<li class=\"wp-block-list-item\">p95 latency = 1200ms (<strong>not<\/strong> <strong>good<\/strong>)<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">The metric p95 shows that <strong>5% of your users are waiting more than 1.2 seconds<\/strong>\u200a\u2014\u200aa much worse experience. If we had looked just at the median, we\u2019d assume everything was fine, but p95 shows hidden problems.<\/p>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">Continuing in the \u201cMetrics\u201d page, you\u2019ll find graphs and, at the bottom, the methods called by the API. Also, in \u201cQuotas &amp; System Limits\u201d, you can monitor the API usage compared to the free tier limit.<\/p>\n<figure class=\"wp-block-image aligncenter\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/1h37F2TswrxS-bLdb6_HCZw.png?ssl=1\" alt=\"\" class=\"wp-image-603055\"><figcaption class=\"wp-element-caption\">Image by the\u00a0author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">Click \u201cShow usage chart\u201d to compare usage day by day.<\/p>\n<figure class=\"wp-block-image aligncenter is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/1NWBtBIvkVCnB3HgAHSCRCw.png?ssl=1\" alt=\"\" class=\"wp-image-603042\" style=\"width:395px;height:auto\"><figcaption class=\"wp-element-caption\">Image by the\u00a0author<\/figcaption><\/figure>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<p class=\"wp-block-paragraph\">I hope you enjoyed this tutorial.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">You can find all the code for this project on my <a href=\"https:\/\/github.com\/alessandraalpino\/chat-streamlit-tutorial\" rel=\"noreferrer noopener\" target=\"_blank\">GitHub<\/a>.<\/p>\n<p class=\"wp-block-paragraph\">I\u2019d love to hear your thoughts! Let me know in the comments what you think.<\/p>\n<p class=\"wp-block-paragraph\">Follow me on:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/www.linkedin.com\/in\/alessandraalpino\/\" target=\"_blank\" rel=\"noreferrer noopener\">Linkedin<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/github.com\/alessandraalpino\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/www.youtube.com\/@data_match\" target=\"_blank\" rel=\"noreferrer noopener\">Youtube<\/a><\/li>\n<\/ul>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/step-by-step-guide-to-build-and-deploy-an-llm-powered-chat-with-memory-in-streamlit\/\">Step-by-Step Guide to Build and Deploy an LLM-Powered Chat with Memory in Streamlit<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    ALESSANDRA COSTA<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/step-by-step-guide-to-build-and-deploy-an-llm-powered-chat-with-memory-in-streamlit\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Step-by-Step Guide to Build and Deploy an LLM-Powered Chat with Memory in Streamlit In this post, I\u2019ll show you step by step how to build and deploy a chat powered with LLM\u200a\u2014\u200aGemini\u200a\u2014\u200ain Streamlit and monitor the API usage on Google Cloud Console. Streamlit is a Python framework that makes it super easy to turn your [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,69,67,366,1664,71,1930],"tags":[2531,299,163],"class_list":["post-3509","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-artificial-intelligence","category-deep-dives","category-gemini","category-generative-ai","category-large-language-models","category-llm-applications","tag-chat","tag-streamlit","tag-your"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/3509"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=3509"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/3509\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=3509"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=3509"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=3509"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}