{"id":1822,"date":"2025-02-13T07:02:23","date_gmt":"2025-02-13T07:02:23","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/02\/13\/method-of-moments-estimation-with-python-code\/"},"modified":"2025-02-13T07:02:23","modified_gmt":"2025-02-13T07:02:23","slug":"method-of-moments-estimation-with-python-code","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/02\/13\/method-of-moments-estimation-with-python-code\/","title":{"rendered":"Method of Moments Estimation with Python Code"},"content":{"rendered":"<p>    Method of Moments Estimation with Python Code<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-paragraph\" id=\"80a8\">Let\u2019s say you are in a customer care center, and you would like to know the probability distribution of the number of calls per minute, or in other words, you want to answer the question: what is the probability of receiving zero, one, two, \u2026 etc., calls per minute? You need this distribution in order to predict the probability of receiving different number of calls based on which you can plan how many employees are needed, whether or not an expansion is required, etc.<\/p>\n<p class=\"wp-block-paragraph\" id=\"3227\">In order to let our decision \u2018data informed\u2019 we start by collecting data from which we try to infer this distribution, or in other words, we want to generalize from the sample data to the unseen data which is also known as the population in statistical terms. This is the essence of statistical inference.<\/p>\n<p class=\"wp-block-paragraph\" id=\"cf95\">From the collected data we can compute the relative frequency of each value of calls per minute. For example, if the collected data over time looks something like this: 2, 2, 3, 5, 4, 5, 5, 3, 6, 3, 4, \u2026 etc. This data is obtained by counting the number of calls received every minute. In order to compute the relative frequency of each value you can count the number of occurrences of each value divided by the total number of occurrences. This way you will end up with something like the grey curve in the below figure, which is equivalent to the histogram of the data in this example.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"e1e5ea\" data-has-transparency=\"false\" style=\"--dominant-color: #e1e5ea;\" fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__7NfdxE1i4NwKpoW3fLT6w-1024x768.png?resize=1024%2C768&#038;ssl=1\" alt=\"\" class=\"wp-image-597802 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__7NfdxE1i4NwKpoW3fLT6w-1024x768.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__7NfdxE1i4NwKpoW3fLT6w-300x225.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__7NfdxE1i4NwKpoW3fLT6w-768x576.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__7NfdxE1i4NwKpoW3fLT6w.png 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Image generated by the Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Another option is to assume that each data point from our data is a realization of a random variable (X) that follows a certain probability distribution. This probability distribution represents all the possible values that are generated if we were to collect this data long into the future, or in other words, we can say that it represents the population from which our sample data was collected. Furthermore, we can assume that all the data points come from the same probability distribution, i.e., the data points are identically distributed. Moreover, we assume that the data points are independent, i.e., the value of one data point in the sample is not affected by the values of the other data points. The independence and identical distribution (iid) assumption of the sample data points allows us to proceed mathematically with our statistical inference problem in a systematic and straightforward way. In more formal terms, we assume that a generative probabilistic model is responsible for generating the iid data as shown below.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" data-dominant-color=\"6f123f\" data-has-transparency=\"true\" style=\"--dominant-color: #6f123f;\" decoding=\"async\" width=\"1024\" height=\"266\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_ePEU9L52q8IJ_fBq9EOLeg-1024x266.png?resize=1024%2C266&#038;ssl=1\" alt=\"\" class=\"wp-image-597803 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_ePEU9L52q8IJ_fBq9EOLeg-1024x266.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_ePEU9L52q8IJ_fBq9EOLeg-300x78.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_ePEU9L52q8IJ_fBq9EOLeg-768x199.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_ePEU9L52q8IJ_fBq9EOLeg.png 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Image generated by the Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">In this particular example, a Poisson distribution with mean value \u03bb = 5 is assumed to have generated the data as shown in the blue curve in the below figure. In other words, we assume here that we know the true value of \u03bb which is generally not known and needs to be estimated from the data.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" data-dominant-color=\"e1e5ea\" data-has-transparency=\"false\" style=\"--dominant-color: #e1e5ea;\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__7NfdxE1i4NwKpoW3fLT6w-1-1024x768.png?resize=1024%2C768&#038;ssl=1\" alt=\"\" class=\"wp-image-597804 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__7NfdxE1i4NwKpoW3fLT6w-1-1024x768.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__7NfdxE1i4NwKpoW3fLT6w-1-300x225.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__7NfdxE1i4NwKpoW3fLT6w-1-768x576.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__7NfdxE1i4NwKpoW3fLT6w-1.png 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Image generated by the Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"0044\">As opposed to the previous method in which we had to compute the relative frequency of each value of calls per minute (e.g., 12 values to be estimated in this example as shown in the grey figure above), now we only have one parameter that we aim at finding which is \u03bb. Another advantage of this generative model approach is that it is better in terms of generalization from sample to population. The assumed probability distribution can be said to have summarized the data in an elegant way that follows the Occam\u2019s razor principle.<\/p>\n<p class=\"wp-block-paragraph\" id=\"6a89\">Before proceeding further into how we aim at finding this parameter \u03bb, let\u2019s show some <a href=\"https:\/\/towardsdatascience.com\/tag\/python\/\" title=\"Python\">Python<\/a> code first that was used to generate the above figure.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Import the Python libraries that we will need in this article\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport seaborn as sns\nimport math\nfrom scipy import stats\n\n# Poisson distribution example\nlambda_ = 5\nsample_size = 1000\ndata_poisson = stats.poisson.rvs(lambda_,size= sample_size) # generate data\n\n# Plot the data histogram vs the PMF\nx1 = np.arange(data_poisson.min(), data_poisson.max(), 1)\nfig1, ax = plt.subplots()\nplt.bar(x1, stats.poisson.pmf(x1,lambda_),\n        label=\"Possion distribution (PMF)\",color = BLUE2,linewidth=3.0,width=0.3,zorder=2)\nax.hist(data_poisson, bins=x1.size, density=True, label=\"Data histogram\",color = GRAY9, width=1,zorder=1,align='left')\n\nax.set_title(\"Data histogram vs. Poisson true distribution\", fontsize=14, loc='left')\nax.set_xlabel('Data value')\nax.set_ylabel('Probability')\nax.legend()\nplt.savefig(\"Possion_hist_PMF.png\", format=\"png\", dpi=800)<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"431a\">Our problem now is about estimating the value of the unknown parameter \u03bb using the data we collected. This is where we will use the\u00a0<em>method of moments (MoM)\u00a0<\/em>approach that appears in the title of this article.<\/p>\n<p class=\"wp-block-paragraph\" id=\"6a89\">First, we need to define what is meant by the moment of a random variable. Mathematically, the kth moment of a discrete random variable (X) is defined as follows:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"f3f3f3\" data-has-transparency=\"false\" style=\"--dominant-color: #f3f3f3;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"292\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_nSLs6SVB_qPuZ3zYIXuPlg-1024x292.png?resize=1024%2C292&#038;ssl=1\" alt=\"\" class=\"wp-image-597805 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_nSLs6SVB_qPuZ3zYIXuPlg-1024x292.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_nSLs6SVB_qPuZ3zYIXuPlg-300x86.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_nSLs6SVB_qPuZ3zYIXuPlg-768x219.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_nSLs6SVB_qPuZ3zYIXuPlg.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><\/figure>\n<p class=\"wp-block-paragraph\">Take the first moment E(X) as an example, which is also the mean \u03bc of the random variable, and assuming that we collect our data which is modeled as N iid realizations of the random variable X. A reasonable estimate of \u03bc is the sample mean which is defined as follows:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"f8f8f8\" data-has-transparency=\"false\" style=\"--dominant-color: #f8f8f8;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"266\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_alRnr4BVh23vJ4RwVLvWpQ-1024x266.png?resize=1024%2C266&#038;ssl=1\" alt=\"\" class=\"wp-image-597806 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_alRnr4BVh23vJ4RwVLvWpQ-1024x266.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_alRnr4BVh23vJ4RwVLvWpQ-300x78.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_alRnr4BVh23vJ4RwVLvWpQ-768x199.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_alRnr4BVh23vJ4RwVLvWpQ.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"c33b\">Thus, in order to obtain a MoM estimate of a model parameter that parametrizes the probability distribution of the random variable X, we first write the unknown parameter as a function of one or more of the kth moments of the random variable, then we replace the kth moment with its sample estimate. The more unknown parameters we have in our models, the more moments we need.<\/p>\n<p class=\"wp-block-paragraph\" id=\"f9f5\">In our Poisson model example, this is very simple as shown below.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"fafafa\" data-has-transparency=\"false\" style=\"--dominant-color: #fafafa;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"496\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_fi-eAH3Ec3LVvQOtbbgLDA-1024x496.png?resize=1024%2C496&#038;ssl=1\" alt=\"\" class=\"wp-image-597807 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_fi-eAH3Ec3LVvQOtbbgLDA-1024x496.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_fi-eAH3Ec3LVvQOtbbgLDA-300x145.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_fi-eAH3Ec3LVvQOtbbgLDA-768x372.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_fi-eAH3Ec3LVvQOtbbgLDA.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"4134\">In the next part, we test our MoM estimator on the simulated data we had earlier. The Python code for obtaining the estimator and plotting the corresponding probability distribution using the estimated parameter is shown below.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Method of moments estimator using the data (Poisson Dist)\nlambda_hat = sum(data_poisson) \/ len(data_poisson)\n\n# Plot the MoM estimated PMF vs the true PMF\nx1 = np.arange(data_poisson.min(), data_poisson.max(), 1)\nfig2, ax = plt.subplots()\nplt.bar(x1, stats.poisson.pmf(x1,lambda_hat),\n        label=\"Estimated PMF\",color = ORANGE1,linewidth=3.0,width=0.3)\nplt.bar(x1+0.3, stats.poisson.pmf(x1,lambda_),\n        label=\"True PMF\",color = BLUE2,linewidth=3.0,width=0.3)\n\nax.set_title(\"Estimated Poisson distribution vs. true distribution\", fontsize=14, loc='left')\nax.set_xlabel('Data value')\nax.set_ylabel('Probability')\nax.legend()\n#ax.grid()\nplt.savefig(\"Possion_true_vs_est.png\", format=\"png\", dpi=800)<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"942e\">The below figure shows the estimated distribution versus the true distribution. The distributions are quite close indicating that the MoM estimator is a reasonable estimator for our problem. In fact, replacing expectations with averages in the MoM estimator implies that the estimator is a consistent estimator by the law of large numbers, which is a good justification for using such estimator.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"ece8e7\" data-has-transparency=\"false\" style=\"--dominant-color: #ece8e7;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_-nvE3J7ioBIoqAhUTFqRdw-1024x768.png?resize=1024%2C768&#038;ssl=1\" alt=\"\" class=\"wp-image-597808 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_-nvE3J7ioBIoqAhUTFqRdw-1024x768.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_-nvE3J7ioBIoqAhUTFqRdw-300x225.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_-nvE3J7ioBIoqAhUTFqRdw-768x576.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_-nvE3J7ioBIoqAhUTFqRdw-1536x1152.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_-nvE3J7ioBIoqAhUTFqRdw-2048x1536.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Image generated by the Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Another MoM estimation example is shown below assuming the iid data is generated by a normal distribution with mean \u03bc and variance \u03c3\u00b2 as shown below.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"243947\" data-has-transparency=\"true\" style=\"--dominant-color: #243947;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"257\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_t5-PRjeL3kDnWr4oI-UUvw-1024x257.png?resize=1024%2C257&#038;ssl=1\" alt=\"\" class=\"wp-image-597809 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_t5-PRjeL3kDnWr4oI-UUvw-1024x257.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_t5-PRjeL3kDnWr4oI-UUvw-300x75.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_t5-PRjeL3kDnWr4oI-UUvw-768x193.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_t5-PRjeL3kDnWr4oI-UUvw.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Image generated by the Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">In this particular example, a Gaussian (normal) distribution with mean value \u03bc = 10 and \u03c3 = 2 is assumed to have generated the data. The histogram of the generated data sample (sample size = 1000) is shown in grey in the below figure, while the true distribution is shown in the blue curve.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"ededee\" data-has-transparency=\"false\" style=\"--dominant-color: #ededee;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_uHkQ__nlas0wDZX90MnOBw-1024x768.png?resize=1024%2C768&#038;ssl=1\" alt=\"\" class=\"wp-image-597810 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_uHkQ__nlas0wDZX90MnOBw-1024x768.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_uHkQ__nlas0wDZX90MnOBw-300x225.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_uHkQ__nlas0wDZX90MnOBw-768x576.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_uHkQ__nlas0wDZX90MnOBw-1536x1152.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_uHkQ__nlas0wDZX90MnOBw-2048x1536.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Image generated by the Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"409c\">The Python code that was used to generate the above figure is shown below.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Normal distribution example\nmu = 10\nsigma = 2\nsample_size = 1000\ndata_normal = stats.norm.rvs(loc=mu, scale=sigma ,size= sample_size) # generate data\n\n# Plot the data histogram vs the PDF\nx2 = np.linspace(data_normal.min(), data_normal.max(), sample_size)\nfig3, ax = plt.subplots()\nax.hist(data_normal, bins=50, density=True, label=\"Data histogram\",color = GRAY9)\nax.plot(x2, stats.norm(loc=mu, scale=sigma).pdf(x2),\n        label=\"Normal distribution (PDF)\",color = BLUE2,linewidth=3.0)\n\nax.set_title(\"Data histogram vs. true distribution\", fontsize=14, loc='left')\nax.set_xlabel('Data value')\nax.set_ylabel('Probability')\nax.legend()\nax.grid()\n\nplt.savefig(\"Normal_hist_PMF.png\", format=\"png\", dpi=800)<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"fb26\">Now, we would like to use the MoM estimator to find an estimate of the model parameters, i.e., \u03bc and \u03c3\u00b2 as shown below.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"f9f9f9\" data-has-transparency=\"false\" style=\"--dominant-color: #f9f9f9;\" loading=\"lazy\" decoding=\"async\" width=\"927\" height=\"1024\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_A-6CEvekBWEnKEIoZ20iQ-927x1024.png?resize=927%2C1024&#038;ssl=1\" alt=\"\" class=\"wp-image-597811 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_A-6CEvekBWEnKEIoZ20iQ-927x1024.png 927w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_A-6CEvekBWEnKEIoZ20iQ-271x300.png 271w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_A-6CEvekBWEnKEIoZ20iQ-768x849.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_A-6CEvekBWEnKEIoZ20iQ-1390x1536.png 1390w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_A-6CEvekBWEnKEIoZ20iQ.png 1400w\" sizes=\"auto, (max-width: 927px) 100vw, 927px\"><\/figure>\n<p class=\"wp-block-paragraph\">In order to test this estimator using our sample data, we plot the distribution with the estimated parameters (orange) in the below figure, versus the true distribution (blue). Again, it can be shown that the distributions are quite close. Of course, in order to quantify this estimator, we need to test it on multiple realizations of the data and observe properties such as bias, variance, etc. Such important aspects <a href=\"https:\/\/medium.com\/@mahmoudabdelaziz_67006\/bias-variance-tradeoff-in-parameter-estimation-with-python-code-74e531092c6e\">have been discussed in an earlier article<\/a>.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"f5f5f6\" data-has-transparency=\"false\" style=\"--dominant-color: #f5f5f6;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_g_1eDQhggqi4WpKyX11qcA-1024x768.png?resize=1024%2C768&#038;ssl=1\" alt=\"\" class=\"wp-image-597812 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_g_1eDQhggqi4WpKyX11qcA-1024x768.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_g_1eDQhggqi4WpKyX11qcA-300x225.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_g_1eDQhggqi4WpKyX11qcA-768x576.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_g_1eDQhggqi4WpKyX11qcA.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Image generated by the Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"7fcf\">The Python code that was used to estimate the model parameters using MoM, and to plot the above figure is shown below.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Method of moments estimator using the data (Normal Dist)\nmu_hat = sum(data_normal) \/ len(data_normal) # MoM mean estimator\nvar_hat = sum(pow(x-mu_hat,2) for x in data_normal) \/ len(data_normal) # variance\nsigma_hat = math.sqrt(var_hat)  # MoM standard deviation estimator\n\n# Plot the MoM estimated PDF vs the true PDF\nx2 = np.linspace(data_normal.min(), data_normal.max(), sample_size)\nfig4, ax = plt.subplots()\nax.plot(x2, stats.norm(loc=mu_hat, scale=sigma_hat).pdf(x2),\n        label=\"Estimated PDF\",color = ORANGE1,linewidth=3.0)\nax.plot(x2, stats.norm(loc=mu, scale=sigma).pdf(x2),\n        label=\"True PDF\",color = BLUE2,linewidth=3.0)\n\nax.set_title(\"Estimated Normal distribution vs. true distribution\", fontsize=14, loc='left')\nax.set_xlabel('Data value')\nax.set_ylabel('Probability')\nax.legend()\nax.grid()\nplt.savefig(\"Normal_true_vs_est.png\", format=\"png\", dpi=800)<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"bbfe\">Another useful probability distribution is the Gamma distribution. An example for the application of this distribution in real life was discussed in a previous\u00a0<a href=\"https:\/\/medium.com\/python-in-plain-english\/univariate-statistical-modeling-fundamentals-0b178fbe8686\">article<\/a>. However, in this article, we derive the MoM estimator of the Gamma distribution parameters \u03b1 and \u03b2 as shown below, assuming the data is iid.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"0f0ffc\" data-has-transparency=\"true\" style=\"--dominant-color: #0f0ffc;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"257\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_6TBcG5odyJCg6rBtdeDS-Q-1024x257.png?resize=1024%2C257&#038;ssl=1\" alt=\"\" class=\"wp-image-597813 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_6TBcG5odyJCg6rBtdeDS-Q-1024x257.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_6TBcG5odyJCg6rBtdeDS-Q-300x75.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_6TBcG5odyJCg6rBtdeDS-Q-768x193.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_6TBcG5odyJCg6rBtdeDS-Q.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Image generated by the Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">In this particular example, a Gamma distribution with \u03b1 = 6 and \u03b2 = 0.5 is assumed to have generated the data. The histogram of the generated data sample (sample size = 1000) is shown in grey in the below figure, while the true distribution is shown in the blue curve.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"ededee\" data-has-transparency=\"false\" style=\"--dominant-color: #ededee;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_KgSGZI-x1V-ra6ZXwXDbwQ-1024x768.png?resize=1024%2C768&#038;ssl=1\" alt=\"\" class=\"wp-image-597814 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_KgSGZI-x1V-ra6ZXwXDbwQ-1024x768.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_KgSGZI-x1V-ra6ZXwXDbwQ-300x225.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_KgSGZI-x1V-ra6ZXwXDbwQ-768x576.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_KgSGZI-x1V-ra6ZXwXDbwQ-1536x1152.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_KgSGZI-x1V-ra6ZXwXDbwQ-2048x1536.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Image generated by the Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"293b\">The Python code that was used to generate the above figure is shown below.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Gamma distribution example\nalpha_ = 6 # shape parameter\nscale_ = 2 # scale paramter (lamda) = 1\/beta in gamma dist.\nsample_size = 1000\ndata_gamma = stats.gamma.rvs(alpha_,loc=0, scale=scale_ ,size= sample_size) # generate data\n\n# Plot the data histogram vs the PDF\nx3 = np.linspace(data_gamma.min(), data_gamma.max(), sample_size)\nfig5, ax = plt.subplots()\nax.hist(data_gamma, bins=50, density=True, label=\"Data histogram\",color = GRAY9)\nax.plot(x3, stats.gamma(alpha_,loc=0, scale=scale_).pdf(x3),\n        label=\"Gamma distribution (PDF)\",color = BLUE2,linewidth=3.0)\n\nax.set_title(\"Data histogram vs. true distribution\", fontsize=14, loc='left')\nax.set_xlabel('Data value')\nax.set_ylabel('Probability')\nax.legend()\nax.grid()\nplt.savefig(\"Gamma_hist_PMF.png\", format=\"png\", dpi=800)<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"b9ed\">Now, we would like to use the MoM estimator to find an estimate of the model parameters, i.e., \u03b1 and \u03b2, as shown below.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"f7f7f7\" data-has-transparency=\"false\" style=\"--dominant-color: #f7f7f7;\" loading=\"lazy\" decoding=\"async\" width=\"713\" height=\"1024\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_CWQEGPhZKPILCbrUQ4Xa-A-713x1024.png?resize=713%2C1024&#038;ssl=1\" alt=\"\" class=\"wp-image-597815 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_CWQEGPhZKPILCbrUQ4Xa-A-713x1024.png 713w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_CWQEGPhZKPILCbrUQ4Xa-A-209x300.png 209w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_CWQEGPhZKPILCbrUQ4Xa-A-768x1104.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_CWQEGPhZKPILCbrUQ4Xa-A-1069x1536.png 1069w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_CWQEGPhZKPILCbrUQ4Xa-A.png 1400w\" sizes=\"auto, (max-width: 713px) 100vw, 713px\"><\/figure>\n<p class=\"wp-block-paragraph\">In order to test this estimator using our sample data, we plot the distribution with the estimated parameters (orange) in the below figure, versus the true distribution (blue). Again, it can be shown that the distributions are quite close.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"f6f7f7\" data-has-transparency=\"false\" style=\"--dominant-color: #f6f7f7;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_Nbvfm4eDcJF-hEj07xcVng-1024x768.png?resize=1024%2C768&#038;ssl=1\" alt=\"\" class=\"wp-image-597816 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_Nbvfm4eDcJF-hEj07xcVng-1024x768.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_Nbvfm4eDcJF-hEj07xcVng-300x225.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_Nbvfm4eDcJF-hEj07xcVng-768x576.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_Nbvfm4eDcJF-hEj07xcVng-1536x1152.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_Nbvfm4eDcJF-hEj07xcVng-2048x1536.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Image generated by the Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"48af\">The Python code that was used to estimate the model parameters using MoM, and to plot the above figure is shown below.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Method of moments estimator using the data (Gamma Dist)\nsample_mean = data_gamma.mean()\nsample_var = data_gamma.var()\nscale_hat = sample_var\/sample_mean #scale is equal to 1\/beta in gamma dist.\nalpha_hat = sample_mean**2\/sample_var\n\n# Plot the MoM estimated PDF vs the true PDF\nx4 = np.linspace(data_gamma.min(), data_gamma.max(), sample_size)\nfig6, ax = plt.subplots()\n\nax.plot(x4, stats.gamma(alpha_hat,loc=0, scale=scale_hat).pdf(x4),\n        label=\"Estimated PDF\",color = ORANGE1,linewidth=3.0)\nax.plot(x4, stats.gamma(alpha_,loc=0, scale=scale_).pdf(x4),\n        label=\"True PDF\",color = BLUE2,linewidth=3.0)\n\nax.set_title(\"Estimated Gamma distribution vs. true distribution\", fontsize=14, loc='left')\nax.set_xlabel('Data value')\nax.set_ylabel('Probability')\nax.legend()\nax.grid()\nplt.savefig(\"Gamma_true_vs_est.png\", format=\"png\", dpi=800)<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"5155\">Note that we used the following equivalent ways of writing the variance when deriving the estimators in the cases of Gaussian and Gamma distributions.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"f8f8f8\" data-has-transparency=\"false\" style=\"--dominant-color: #f8f8f8;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"283\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_ocv2YeKXOP6PHuzx7nmp-Q-1024x283.png?resize=1024%2C283&#038;ssl=1\" alt=\"\" class=\"wp-image-597817 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_ocv2YeKXOP6PHuzx7nmp-Q-1024x283.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_ocv2YeKXOP6PHuzx7nmp-Q-300x83.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_ocv2YeKXOP6PHuzx7nmp-Q-768x213.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_ocv2YeKXOP6PHuzx7nmp-Q-1536x425.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_ocv2YeKXOP6PHuzx7nmp-Q-2048x567.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><\/figure>\n<h2 class=\"wp-block-heading\" id=\"70ea\">Conclusion<\/h2>\n<p class=\"wp-block-paragraph\" id=\"0049\">In this article, we explored various examples of the method of moments estimator and its applications in different problems in data science. Moreover, detailed Python code that was used to implement the estimators from scratch as well as to plot the different figures is also shown. I hope that you will find this article helpful.<\/p>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/method-of-moments-estimation-with-python-code\/\">Method of Moments Estimation with Python Code<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Mahmoud Abdelaziz<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/method-of-moments-estimation-with-python-code\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Method of Moments Estimation with Python Code Let\u2019s say you are in a customer care center, and you would like to know the probability distribution of the number of calls per minute, or in other words, you want to answer the question: what is the probability of receiving zero, one, two, \u2026 etc., calls per [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,83,1731,1183,157,1182,238],"tags":[84,582,921],"class_list":["post-1822","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-data-science","category-estimations","category-probability-theory","category-python","category-statistical-inference","category-statistics","tag-data","tag-distribution","tag-probability"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1822"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1822"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1822\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1822"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1822"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1822"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}