{"id":3718,"date":"2025-05-10T07:02:44","date_gmt":"2025-05-10T07:02:44","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/05\/10\/how-not-to-write-an-mcp-server\/"},"modified":"2025-05-10T07:02:44","modified_gmt":"2025-05-10T07:02:44","slug":"how-not-to-write-an-mcp-server","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/05\/10\/how-not-to-write-an-mcp-server\/","title":{"rendered":"How Not to Write an MCP Server"},"content":{"rendered":"<p>    How Not to Write an MCP Server<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-paragraph\">I <mdspan datatext=\"el1746766844980\" class=\"mdspan-comment\">recently had <\/mdspan>the chance to create an MCP server for an observability application in order to provide the AI agent with dynamic code analysis capabilities. Because of its potential to transform applications, MCP is a technology I\u2019m even more ecstatic about than I originally was about genAI in general. I wrote more about that and some intro to MCPs in general in a previous <a href=\"https:\/\/towardsdatascience.com\/a-farewell-to-apms-the-future-of-observability-is-mcp-tools\/\" target=\"_blank\" rel=\"noreferrer noopener\">post<\/a>.<\/p>\n<p class=\"wp-block-paragraph\">While an initial POCs demonstrated that there was an <strong>immense<\/strong> potential for this to be a force multiplier to our product\u2019s value, it took several iterations and several stumbles to deliver on that promise. In this post, I\u2019ll try to capture some of the lessons learned, as I think that this can benefit other MCP server developers.<\/p>\n<h3 class=\"wp-block-heading\">My Stack<\/h3>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">I was using <a href=\"https:\/\/www.cursor.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Cursor<\/a> and <a href=\"https:\/\/code.visualstudio.com\/docs\/copilot\/chat\/mcp-servers\" target=\"_blank\" rel=\"noreferrer noopener\">vscode<\/a> intermittently as the main MCP client <\/li>\n<li class=\"wp-block-list-item\">To develop the MCP server itself, I used the<a href=\"https:\/\/github.com\/modelcontextprotocol\/csharp-sdk\" target=\"_blank\" rel=\"noreferrer noopener\">\u00a0.NET MCP SDK<\/a>, as I decided to host the server on another service written in\u00a0.NET<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\">Lesson 1: Don\u2019t dump all of your data on the\u00a0agent<\/h3>\n<p class=\"wp-block-paragraph\">In my application, one tool returns aggregated information on errors and exceptions. The API is very detailed as it serves a complex UI view, and spews out large amounts of deeply linked data:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Error frames<\/li>\n<li class=\"wp-block-list-item\">Affected endpoints<\/li>\n<li class=\"wp-block-list-item\">Stack traces\u00a0<\/li>\n<li class=\"wp-block-list-item\">Priority and trends\u00a0<\/li>\n<li class=\"wp-block-list-item\">Histograms<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">My first hunch was to simply expose the API <strong>as is <\/strong>as an MCP tool. After all, the agent should be able to make more sense of it than any UI view, and catch on to interesting details or connections between events. There were several scenarios I had in mind as to how I would expect this data to be useful. The agent could automatically offer fixes for recent exceptions recorded in production or in the testing environment, let me know about errors that stand out, or help me address some systematic problems that are the underlying root cause of the issues.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The basic premise was therefore to allow the agent to work its \u2018magic\u2019, with more data potentially meaning more hooks for the agent to latch on in its investigation efforts. I quickly coded a wrapper around our API on the MCP endpoint and decided to start with a basic prompt to see whether everything is working:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/session_1_1.png?ssl=1\" alt=\"\" class=\"wp-image-603734\"><figcaption class=\"wp-element-caption\">Image by author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">We can see the agent was smart enough to know that it needed to call another tool to grab the environment ID for that \u2018<strong>test<\/strong>\u2019 environment I mentioned. With that at hand, after discovering that there was actually no recent exception in the last 24 hours, it then took the liberty to scan a more extended time period, and this is when things got a little weird:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/session_1_2.png?ssl=1\" alt=\"\" class=\"wp-image-603735\"><figcaption class=\"wp-element-caption\">Image by author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">What a strange response. The agent queries for exceptions from the last seven days, gets back some tangible results this time, and yet proceeds to ramble on as if ignoring the data altogether. It continues to try and use the tool in different ways and different parameter combinations, obviously fumbling, until I notice it flat out calls out the fact that the data is completely invisible to it. While errors are being sent back in the response, the agent actually claims there are <strong>no errors. <\/strong>What is going on?<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/session_1_3.png?ssl=1\" alt=\"\" class=\"wp-image-603736\"><figcaption class=\"wp-element-caption\">Image by author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">After some investigation, the problem was revealed to be the fact that we\u2019ve simply reached a cap in the agent\u2019s capacity to process large amounts of data in the response.<\/p>\n<p class=\"wp-block-paragraph\">I used an existing API that was extremely verbose, which I initially even considered to be an advantage. The end result, however, was that I somehow managed to overwhelm the model. Overall, there were around 360k characters and 16k words in the response JSON. This includes call stacks, error frames, and references. This <strong>should<\/strong> have been supported just by looking at the context window limit for the model I was using (Claude 3.7 Sonnet should support up to 200k tokens), but nevertheless the large data dump left the agent thoroughly stumped.<\/p>\n<p class=\"wp-block-paragraph\">One strategy would be to change the model to one that supports an even bigger context window. I switched over to the <strong>Gemini 2.5 pro<\/strong> model just to test that theory out, as it boasts an outrageous limit of one million tokens. Sure enough, the same query now yielded a much more intelligent response:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/session_1_4.png?ssl=1\" alt=\"\" class=\"wp-image-603737\"><figcaption class=\"wp-element-caption\">Image by author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">This is great! The agent was able to parse the errors and find the systematic cause of many of them with some basic reasoning. However, we can\u2019t rely on the user using a specific model, and to complicate things, this was output from a relatively low bandwidth testing environment. What if the dataset were even larger?\u00a0<br \/>To solve this issue, I made some fundamental changes to how the API was structured:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Nested data hierarchy: <\/strong>Keep the initial response focused on high-level details and aggregations. Create a separate API to retrieve the call stacks of specific frames as needed.\u00a0<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Enhance queryability: <\/strong>All of the queries made so far by the agent used a very small page size for the data (10), if we want the agent to be able to to access more relevant subsets of the data to fit with the limitations of its context, we need to provide more APIs to query errors based on different dimensions, for example: affected methods, error type, priority and impact etc.\u00a0<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">With the new changes, the tool now consistently analyzes important new exceptions and comes up with fix suggestions. However, I glanced over another minor detail I needed to sort before I could really use it reliably. <\/p>\n<h3 class=\"wp-block-heading\">Lesson 2: What\u2019s the\u00a0time?<\/h3>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"574\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/doppleware_ai_robot_checking_the_time_on_its_wrist_watch_-ar_f373bba7-8994-4517-9ce9-ddfbfc881ee7_2-1024x574.png?resize=1024%2C574&#038;ssl=1\" alt=\"\" class=\"wp-image-603738\"><figcaption class=\"wp-element-caption\">Image generated by the author with Midjourney<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">The keen-eyed reader may have noticed that in the previous example, to retrieve the errors in a specific time range, the agent uses the <strong>ISO 8601 time duration<\/strong> format instead of the actual dates and times. So instead of including standard \u2018<strong>From<\/strong>\u2019 and \u2018<strong>To<\/strong>\u2019 parameters with datetime values, the AI sent a duration value, for example, seven days or <strong>P7D, <\/strong>to indicate it wants to check for errors in the past week.<\/p>\n<p class=\"wp-block-paragraph\">The reason for this is somewhat strange\u200a\u2014\u200a<strong>the agent might not know the current date and time!<\/strong> You can verify that yourself by asking the agent that simple question. The below would have made sense were it not for the fact that I typed that prompt in at around noon on May 4th\u2026<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/session_2_1.png?ssl=1\" alt=\"\" class=\"wp-image-603740\"><figcaption class=\"wp-element-caption\">Image by author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Using time <strong>duration<\/strong> values turned out to be a great solution that the agent handled quite well. Don\u2019t forget to document the expected value and example syntax in the tool parameter description, though!<\/p>\n<h3 class=\"wp-block-heading\">Lesson 3: When the agent makes a mistake, show it how to do better<\/h3>\n<p class=\"wp-block-paragraph\">In the first example, I was actually taken aback by how the agent was able to decipher the dependencies between the different tool calls In order to provide the right environment identifier. In studying the MCP contract, it figured out that it had to call on a dependent another tool to get the list of environment IDs first.<\/p>\n<p class=\"wp-block-paragraph\">However, responding to other requests, the agent would sometimes take the environment names mentioned in the prompt verbatim. For example, I noticed that in response to this question: <strong>compare slow traces for this method between the test and prod environments, are there any significant differences? <\/strong>Depending on the context,<strong> <\/strong>the agent would sometimes use the environment names mentioned in the request and would send the strings \u201ctest\u201d and \u201cprod\u201d as the environment ID.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">In my original implementation, my MCP server would silently fail in this scenario, returning an empty response. The agent, upon receiving no data or a generic error, would simply quit and try to solve the request using another strategy. To offset that behavior, I quickly changed my implementation so that if an incorrect value was provided, the JSON response would describe exactly what went wrong, and even provide a valid list of possible values to save the agent another tool call.<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/session_3_1.png?ssl=1\" alt=\"\" class=\"wp-image-603741\"><figcaption class=\"wp-element-caption\">Image by author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">This was enough for the agent, learning from its mistake, it repeated the call with the correct value and somehow also avoided making that same error in the future.<\/p>\n<h3 class=\"wp-block-heading\">Lesson 4: Focus on user intent and not functionality<\/h3>\n<p class=\"wp-block-paragraph\">While it is tempting to simply describe what the API is doing, sometimes the generic terms don\u2019t quite allow the agent to realize the type of requirements for which this functionality might apply best.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Let\u2019s take a simple example: My MCP server has a tool that, for each method, endpoint, or code location, can indicate how it\u2019s being used at runtime. Specifically, it uses the tracing data to indicate which application flows reach the specific function or method. <\/p>\n<p class=\"wp-block-paragraph\">The original documentation simply described this functionality:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-csharp\">[McpServerTool,\nDescription(\n@\"For this method, see which runtime flows in the application\n(including other microservices and code not in this project)\nuse this function or method.\nThis data is based on analyzing distributed tracing.\")]\npublic static async Task&lt;string&gt; GetUsagesForMethod(IMcpService client,\n[Description(\"The environment id to check for usages\")]\nstring environmentId,\n[Description(\"The name of the class. Provide only the class name without the namespace prefix.\")]\nstring codeClass,\n[Description(\"The name of the method to check, must specify a specific method to check\")]\nstring codeMethod)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The above represents a functionally accurate description of what this tool does, but it doesn\u2019t necessarily make it clear what types of activities it might be relevant for. After seeing that the agent wasn\u2019t picking this tool up for various prompts I thought it would be fairly useful for, I decided to rewrite the tool description, this time emphasizing the use cases:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-markup\">[McpServerTool,\nDescription(\n@\"Find out what is the how a specific code location is being used and by\nwhich other services\/code.\nUseful in order to detect possible breaking changes, to check whether\nthe generated code will fit the current usages,\nto generate tests based on the runtime usage of this method,\nor to check for related issues on the endpoints triggering this code\nafter any change to ensure it didnt impact it\"<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Updating the text helped the agent realize <strong>why<\/strong> the information was useful. For example, before making this change, the agent would not even trigger the tool in response to a prompt similar to the one below. Now, it has become completely seamless, without the user having to directly mention that this tool should be used:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/session_4_1.png?ssl=1\" alt=\"\" class=\"wp-image-603742\"><figcaption class=\"wp-element-caption\">Image by author<\/figcaption><\/figure>\n<h3 class=\"wp-block-heading\">Lesson 5: Document your JSON responses<\/h3>\n<p class=\"wp-block-paragraph\">The JSON standard, at least officially, does not support comments. That means that if the JSON is all the agent has to go on, it might be missing some clues about the context of the data you\u2019re returning. For example, in my aggregated error response, I returned the following <strong>score <\/strong>object:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-json\">\"Score\": {\"Score\":21,\n\"ScoreParams\":{ \"Occurrences\":1,\n\"Trend\":0,\n\"Recent\":20,\n\"Unhandled\":0,\n\"Unexpected\":0}}<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Without proper documentation, any non-clairvoyant agent would be hard pressed to make sense of what these numbers mean. Thankfully, it is easy to add a comment element at the beginning of the JSON file with additional information about the data provided:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-json\">\"_comment\": \"Each error contains a link to the error trace,\nwhich can be retrieved using the GetTrace tool,\ninformation about the affected endpoints the code and the\nrelevant stacktrace.\nEach error in the list represents numerous instances\nof the same error and is given a score after its been\nprioritized.\nThe score reflects the criticality of the error.\nThe number is between 0 and 100 and is comprised of several\nparameters, each can contribute to the error criticality,\nall are normalized in relation to the system\nand the other methods.\nThe score parameters value represents its contributation to the\noverall score, they include:\n\n1. 'Occurrences', representing the number of instances of this error\ncompared to others.\n2. 'Trend' whether this error is escalating in its\nfrequency.\n3. 'Unhandled' represents whether this error is caught\ninternally or poropagates all the way\nout of the endpoint scope\n4. 'Unexpected' are errors that are in high probability\nbugs, for example NullPointerExcetion or\nKeyNotFound\",\n\"EnvironmentErrors\":[]<\/code><\/pre>\n<p class=\"wp-block-paragraph\">This enables the agent to explain to the user what the score means if they ask, but also feed this explanation into its own reasoning and recommendations.<\/p>\n<h3 class=\"wp-block-heading\">Choosing the right architecture: SSE vs\u00a0STDIO,<\/h3>\n<p class=\"wp-block-paragraph\">There are two architectures you can use in developing an MCP server. The more common and widely supported implementation is making your server available as a <strong>command<\/strong> triggered by the MCP client. This could be any CLI-triggered command; <strong>npx, docker<\/strong>, and <strong>python <\/strong>are some common examples.<strong> <\/strong>In this configuration, all communication is done via the process <strong>STDIO<\/strong>, and the process itself is running on the client machine. The client is responsible for instantiating and maintaining the lifecycle of the MCP server.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"353\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/arch1-1024x353.png?resize=1024%2C353&#038;ssl=1\" alt=\"\" class=\"wp-image-603743\"><figcaption class=\"wp-element-caption\">Image by author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">This client-side architecture has one major drawback from my perspective: Since the MCP server implementation is run by the client on the local machine, it is much harder to roll out updates or new capabilities. Even if that problem is somehow solved, the tight coupling between the MCP server and the backend APIs it depends on in our applications would further complicate this model in terms of versioning and forward\/backward compatibility.<\/p>\n<p class=\"wp-block-paragraph\">For these reasons, I chose the second type of MCP Server\u200a\u2014\u200aan <a href=\"https:\/\/en.wikipedia.org\/wiki\/Server-sent_events\" target=\"_blank\" rel=\"noreferrer noopener\">SSE<\/a> Server hosted as a part of our application services. This removes any friction from running CLI commands on the client machine, as well as allows me to update and version the MCP server code along with the application code that it consumes. In this scenario, the client is provided with a URL of the SSE endpoint with which it interacts. While not all clients currently support this option, there is a brilliant commandMCP called <a href=\"https:\/\/github.com\/supercorp-ai\/supergateway\" target=\"_blank\" rel=\"noreferrer noopener\">supergateway<\/a> that can be used as a proxy to the SSE server implementation. That means users can still add the more widely supported STDIO variant and still consume the functionality hosted on your SSE backend.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"353\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/arch_2-1024x353.png?resize=1024%2C353&#038;ssl=1\" alt=\"\" class=\"wp-image-603744\"><figcaption class=\"wp-element-caption\">Image by author<\/figcaption><\/figure>\n<h3 class=\"wp-block-heading\">MCPs are still new<\/h3>\n<p class=\"wp-block-paragraph\">There are many more lessons and nuances to using this deceptively simple technology. I have found that there is a big gap between implementing a workable MCP to one that can actually integrate with user needs and usage scenarios, even beyond those you have anticipated. Hopefully, as the technology matures, we\u2019ll see more posts on <a href=\"https:\/\/towardsdatascience.com\/tag\/best-practices\/\" title=\"Best Practices\">Best Practices<\/a>.\u00a0<\/p>\n<p class=\"wp-block-paragraph\"><strong>Want to Connect?\u00a0<\/strong>You can reach me on Twitter at\u00a0<mark>@doppleware<\/mark>\u00a0or via\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/ronidover\/\">LinkedIn<\/a>.<br \/>Follow my\u00a0<strong><a href=\"https:\/\/towardsdatascience.com\/tag\/mcp\/\" title=\"mcp\">mcp<\/a><\/strong>\u00a0for dynamic code analysis using observability at\u00a0<a href=\"https:\/\/github.com\/digma-ai\/digma-mcp-server\">https:\/\/github.com\/digma-ai\/digma-mcp-server<\/a><\/p>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/how-not-to-write-an-mcp-server\/\">How Not to Write an MCP Server<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Roni Dover<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/how-not-to-write-an-mcp-server\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>How Not to Write an MCP Server I recently had the chance to create an MCP server for an observability application in order to provide the AI agent with dynamic code analysis capabilities. Because of its potential to transform applications, MCP is a technology I\u2019m even more ecstatic about than I originally was about genAI [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,2630,616,67,71,2533,2631],"tags":[448,2118,2632],"class_list":["post-3718","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-articial-intelligence","category-best-practices","category-deep-dives","category-large-language-models","category-mcp","category-observability","tag-agent","tag-mcp","tag-server"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/3718"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=3718"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/3718\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=3718"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=3718"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=3718"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}