{"id":1063,"date":"2025-01-09T07:04:02","date_gmt":"2025-01-09T07:04:02","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/01\/09\/method-of-moments-estimation-with-python-code-f19102ce5897\/"},"modified":"2025-01-09T07:04:02","modified_gmt":"2025-01-09T07:04:02","slug":"method-of-moments-estimation-with-python-code-f19102ce5897","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/01\/09\/method-of-moments-estimation-with-python-code-f19102ce5897\/","title":{"rendered":"Method of Moments Estimation with Python Code"},"content":{"rendered":"<p>    Method of Moments Estimation with Python Code<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h4>How to understand and implement the estimator from\u00a0scratch<\/h4>\n<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/0*yHsw5ciEqO3mJqLX\"><figcaption>Photo by <a href=\"https:\/\/unsplash.com\/@machec?utm_source=medium&amp;utm_medium=referral\">Petr Mach\u00e1\u010dek<\/a> on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\">Unsplash<\/a><\/figcaption><\/figure>\n<p>Let\u2019s say you are in a customer care center, and you would like to know the probability distribution of the number of calls per minute, or in other words, you want to answer the question: what is the probability of receiving zero, one, two,\u00a0&#8230; etc., calls per minute? You need this distribution in order to predict the probability of receiving different number of calls based on which you can plan how many employees are needed, whether or not an expansion is required, etc.<\/p>\n<p>In order to let our decision \u2018data informed\u2019 we start by collecting data from which we try to infer this distribution, or in other words, we want to generalize from the sample data to the unseen data which is also known as the population in statistical terms. This is the essence of statistical inference.<\/p>\n<p>From the collected data we can compute the relative frequency of each value of calls per minute. For example, if the collected data over time looks something like this: 2, 2, 3, 5, 4, 5, 5, 3, 6, 3, 4,\u00a0\u2026 etc. This data is obtained by counting the number of calls received every minute. In order to compute the relative frequency of each value you can count the number of occurrences of each value divided by the total number of occurrences. This way you will end up with something like the grey curve in the below figure, which is equivalent to the histogram of the data in this\u00a0example.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A_7NfdxE1i4NwKpoW3fLT6w.png?ssl=1\"><figcaption>Image generated by the\u00a0Author<\/figcaption><\/figure>\n<p>Another option is to assume that each data point from our data is a realization of a random variable (X) that follows a certain probability distribution. This probability distribution represents all the possible values that are generated if we were to collect this data long into the future, or in other words, we can say that it represents the population from which our sample data was collected. Furthermore, we can assume that all the data points come from the same probability distribution, i.e., the data points are identically distributed. Moreover, we assume that the data points are independent, i.e., the value of one data point in the sample is not affected by the values of the other data points. The independence and identical distribution (iid) assumption of the sample data points allows us to proceed mathematically with our statistical inference problem in a systematic and straightforward way. In more formal terms, we assume that a generative probabilistic model is responsible for generating the iid data as shown\u00a0below.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AePEU9L52q8IJ_fBq9EOLeg.png?ssl=1\"><figcaption>Image generated by the\u00a0Author<\/figcaption><\/figure>\n<p>In this particular example, a Poisson distribution with mean value \u03bb = 5 is assumed to have generated the data as shown in the blue curve in the below figure. In other words, we assume here that we know the true value of \u03bb which is generally not known and needs to be estimated from the\u00a0data.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A_7NfdxE1i4NwKpoW3fLT6w.png?ssl=1\"><figcaption>Image generated by the\u00a0Author<\/figcaption><\/figure>\n<p>As opposed to the previous method in which we had to compute the relative frequency of each value of calls per minute (e.g., 12 values to be estimated in this example as shown in the grey figure above), now we only have one parameter that we aim at finding which is \u03bb. Another advantage of this generative model approach is that it is better in terms of generalization from sample to population. The assumed probability distribution can be said to have summarized the data in an elegant way that follows the Occam\u2019s razor principle.<\/p>\n<p>Before proceeding further into how we aim at finding this parameter \u03bb, let\u2019s show some Python code first that was used to generate the above\u00a0figure.<\/p>\n<pre># Import the Python libraries that we will need in this article<br>import pandas as pd<br>import matplotlib.pyplot as plt<br>import numpy as np<br>import seaborn as sns<br>import math<br>from scipy import stats<br><br># Poisson distribution example<br>lambda_ = 5<br>sample_size = 1000<br>data_poisson = stats.poisson.rvs(lambda_,size= sample_size) # generate data<br><br># Plot the data histogram vs the PMF<br>x1 = np.arange(data_poisson.min(), data_poisson.max(), 1)<br>fig1, ax = plt.subplots()<br>plt.bar(x1, stats.poisson.pmf(x1,lambda_),<br>        label=\"Possion distribution (PMF)\",color = BLUE2,linewidth=3.0,width=0.3,zorder=2)<br>ax.hist(data_poisson, bins=x1.size, density=True, label=\"Data histogram\",color = GRAY9, width=1,zorder=1,align='left')<br><br>ax.set_title(\"Data histogram vs. Poisson true distribution\", fontsize=14, loc='left')<br>ax.set_xlabel('Data value')<br>ax.set_ylabel('Probability')<br>ax.legend()<br>plt.savefig(\"Possion_hist_PMF.png\", format=\"png\", dpi=800)<\/pre>\n<p>Our problem now is about estimating the value of the unknown parameter \u03bb using the data we collected. This is where we will use the <em>method of moments (MoM) <\/em>approach that appears in the title of this\u00a0article.<\/p>\n<p>First, we need to define what is meant by the moment of a random variable. Mathematically, the kth moment of a discrete random variable (X) is defined as\u00a0follows<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AnSLs6SVB_qPuZ3zYIXuPlg.png?ssl=1\"><\/figure>\n<p>Take the first moment E(X) as an example, which is also the mean \u03bc of the random variable, and assuming that we collect our data which is modeled as N iid realizations of the random variable X. A reasonable estimate of \u03bc is the sample mean which is defined as\u00a0follows<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AalRnr4BVh23vJ4RwVLvWpQ.png?ssl=1\"><\/figure>\n<p>Thus, in order to obtain a MoM estimate of a model parameter that parametrizes the probability distribution of the random variable X, we first write the unknown parameter as a function of one or more of the kth moments of the random variable, then we replace the kth moment with its sample estimate. The more unknown parameters we have in our models, the more moments we\u00a0need.<\/p>\n<p>In our Poisson model example, this is very simple as shown\u00a0below<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Afi-eAH3Ec3LVvQOtbbgLDA.png?ssl=1\"><\/figure>\n<p>In the next part, we test our MoM estimator on the simulated data we had earlier. The Python code for obtaining the estimator and plotting the corresponding probability distribution using the estimated parameter is shown\u00a0below.<\/p>\n<pre># Method of moments estimator using the data (Poisson Dist)<br>lambda_hat = sum(data_poisson) \/ len(data_poisson)<br><br># Plot the MoM estimated PMF vs the true PMF<br>x1 = np.arange(data_poisson.min(), data_poisson.max(), 1)<br>fig2, ax = plt.subplots()<br>plt.bar(x1, stats.poisson.pmf(x1,lambda_hat),<br>        label=\"Estimated PMF\",color = ORANGE1,linewidth=3.0,width=0.3)<br>plt.bar(x1+0.3, stats.poisson.pmf(x1,lambda_),<br>        label=\"True PMF\",color = BLUE2,linewidth=3.0,width=0.3)<br><br>ax.set_title(\"Estimated Poisson distribution vs. true distribution\", fontsize=14, loc='left')<br>ax.set_xlabel('Data value')<br>ax.set_ylabel('Probability')<br>ax.legend()<br>#ax.grid()<br>plt.savefig(\"Possion_true_vs_est.png\", format=\"png\", dpi=800)<\/pre>\n<p>The below figure shows the estimated distribution versus the true distribution. The distributions are quite close indicating that the MoM estimator is a reasonable estimator for our problem. In fact, replacing expectations with averages in the MoM estimator implies that the estimator is a consistent estimator by the law of large numbers, which is a good justification for using such estimator.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A-nvE3J7ioBIoqAhUTFqRdw.png?ssl=1\"><figcaption>Image generated by the\u00a0Author<\/figcaption><\/figure>\n<p>Another MoM estimation example is shown below assuming the iid data is generated by a normal distribution with mean \u03bc and variance \u03c3\u00b2 as shown\u00a0below.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2At5-PRjeL3kDnWr4oI-UUvw.png?ssl=1\"><figcaption>Image generated by the\u00a0Author<\/figcaption><\/figure>\n<p>In this particular example, a Gaussian (normal) distribution with mean value \u03bc = 10 and \u03c3 = 2 is assumed to have generated the data. The histogram of the generated data sample (sample size = 1000) is shown in grey in the below figure, while the true distribution is shown in the blue\u00a0curve.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AuHkQ__nlas0wDZX90MnOBw.png?ssl=1\"><figcaption>Image generated by the\u00a0Author<\/figcaption><\/figure>\n<p>The Python code that was used to generate the above figure is shown\u00a0below.<\/p>\n<pre># Normal distribution example<br>mu = 10<br>sigma = 2<br>sample_size = 1000<br>data_normal = stats.norm.rvs(loc=mu, scale=sigma ,size= sample_size) # generate data<br><br># Plot the data histogram vs the PDF<br>x2 = np.linspace(data_normal.min(), data_normal.max(), sample_size)<br>fig3, ax = plt.subplots()<br>ax.hist(data_normal, bins=50, density=True, label=\"Data histogram\",color = GRAY9)<br>ax.plot(x2, stats.norm(loc=mu, scale=sigma).pdf(x2),<br>        label=\"Normal distribution (PDF)\",color = BLUE2,linewidth=3.0)<br><br>ax.set_title(\"Data histogram vs. true distribution\", fontsize=14, loc='left')<br>ax.set_xlabel('Data value')<br>ax.set_ylabel('Probability')<br>ax.legend()<br>ax.grid()<br><br>plt.savefig(\"Normal_hist_PMF.png\", format=\"png\", dpi=800)<\/pre>\n<p>Now, we would like to use the MoM estimator to find an estimate of the model parameters, i.e., \u03bc and \u03c3\u00b2 as shown\u00a0below.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AA--6CEvekBWEnKEIoZ20iQ.png?ssl=1\"><\/figure>\n<p>In order to test this estimator using our sample data, we plot the distribution with the estimated parameters (orange) in the below figure, versus the true distribution (blue). Again, it can be shown that the distributions are quite close. Of course, in order to quantify this estimator, we need to test it on multiple realizations of the data and observe properties such as bias, variance, etc. Such important aspects have been discussed in an earlier article <a href=\"https:\/\/medium.com\/@mahmoudabdelaziz_67006\/bias-variance-tradeoff-in-parameter-estimation-with-python-code-74e531092c6e\">Bias Variance Tradeoff in Parameter Estimation with Python Code | by Mahmoud Abdelaziz, PhD |\u00a0Medium<\/a><\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Ag_1eDQhggqi4WpKyX11qcA.png?ssl=1\"><figcaption>Image generated by the\u00a0Author<\/figcaption><\/figure>\n<p>The Python code that was used to estimate the model parameters using MoM, and to plot the above figure is shown\u00a0below.<\/p>\n<pre># Method of moments estimator using the data (Normal Dist)<br>mu_hat = sum(data_normal) \/ len(data_normal) # MoM mean estimator<br>var_hat = sum(pow(x-mu_hat,2) for x in data_normal) \/ len(data_normal) # variance<br>sigma_hat = math.sqrt(var_hat)  # MoM standard deviation estimator<br><br># Plot the MoM estimated PDF vs the true PDF<br>x2 = np.linspace(data_normal.min(), data_normal.max(), sample_size)<br>fig4, ax = plt.subplots()<br>ax.plot(x2, stats.norm(loc=mu_hat, scale=sigma_hat).pdf(x2),<br>        label=\"Estimated PDF\",color = ORANGE1,linewidth=3.0)<br>ax.plot(x2, stats.norm(loc=mu, scale=sigma).pdf(x2),<br>        label=\"True PDF\",color = BLUE2,linewidth=3.0)<br><br>ax.set_title(\"Estimated Normal distribution vs. true distribution\", fontsize=14, loc='left')<br>ax.set_xlabel('Data value')<br>ax.set_ylabel('Probability')<br>ax.legend()<br>ax.grid()<br>plt.savefig(\"Normal_true_vs_est.png\", format=\"png\", dpi=800)<\/pre>\n<p>Another useful probability distribution is the Gamma distribution. An example for the application of this distribution in real life was discussed in a previous <a href=\"https:\/\/medium.com\/python-in-plain-english\/univariate-statistical-modeling-fundamentals-0b178fbe8686\">article<\/a>. However, in this article, we derive the MoM estimator of the Gamma distribution parameters \u03b1 and \u03b2 as shown below, assuming the data is\u00a0iid.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A6TBcG5odyJCg6rBtdeDS-Q.png?ssl=1\"><figcaption>Image generated by the\u00a0Author<\/figcaption><\/figure>\n<p>In this particular example, a Gamma distribution with \u03b1 = 6 and \u03b2 = 0.5 is assumed to have generated the data. The histogram of the generated data sample (sample size = 1000) is shown in grey in the below figure, while the true distribution is shown in the blue\u00a0curve.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AKgSGZI-x1V-ra6ZXwXDbwQ.png?ssl=1\"><figcaption>Image generated by the\u00a0Author<\/figcaption><\/figure>\n<p>The Python code that was used to generate the above figure is shown\u00a0below.<\/p>\n<pre># Gamma distribution example<br>alpha_ = 6 # shape parameter<br>scale_ = 2 # scale paramter (lamda) = 1\/beta in gamma dist.<br>sample_size = 1000<br>data_gamma = stats.gamma.rvs(alpha_,loc=0, scale=scale_ ,size= sample_size) # generate data<br><br># Plot the data histogram vs the PDF<br>x3 = np.linspace(data_gamma.min(), data_gamma.max(), sample_size)<br>fig5, ax = plt.subplots()<br>ax.hist(data_gamma, bins=50, density=True, label=\"Data histogram\",color = GRAY9)<br>ax.plot(x3, stats.gamma(alpha_,loc=0, scale=scale_).pdf(x3),<br>        label=\"Gamma distribution (PDF)\",color = BLUE2,linewidth=3.0)<br><br>ax.set_title(\"Data histogram vs. true distribution\", fontsize=14, loc='left')<br>ax.set_xlabel('Data value')<br>ax.set_ylabel('Probability')<br>ax.legend()<br>ax.grid()<br>plt.savefig(\"Gamma_hist_PMF.png\", format=\"png\", dpi=800)<\/pre>\n<p>Now, we would like to use the MoM estimator to find an estimate of the model parameters, i.e., \u03b1 and \u03b2, as shown\u00a0below.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2ACWQEGPhZKPILCbrUQ4Xa-A.png?ssl=1\"><\/figure>\n<p>In order to test this estimator using our sample data, we plot the distribution with the estimated parameters (orange) in the below figure, versus the true distribution (blue). Again, it can be shown that the distributions are quite\u00a0close.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2ANbvfm4eDcJF-hEj07xcVng.png?ssl=1\"><figcaption>Image generated by the\u00a0Author<\/figcaption><\/figure>\n<p>The Python code that was used to estimate the model parameters using MoM, and to plot the above figure is shown\u00a0below.<\/p>\n<pre># Method of moments estimator using the data (Gamma Dist)<br>sample_mean = data_gamma.mean()<br>sample_var = data_gamma.var()<br>scale_hat = sample_var\/sample_mean #scale is equal to 1\/beta in gamma dist.<br>alpha_hat = sample_mean**2\/sample_var<br><br># Plot the MoM estimated PDF vs the true PDF<br>x4 = np.linspace(data_gamma.min(), data_gamma.max(), sample_size)<br>fig6, ax = plt.subplots()<br><br>ax.plot(x4, stats.gamma(alpha_hat,loc=0, scale=scale_hat).pdf(x4),<br>        label=\"Estimated PDF\",color = ORANGE1,linewidth=3.0)<br>ax.plot(x4, stats.gamma(alpha_,loc=0, scale=scale_).pdf(x4),<br>        label=\"True PDF\",color = BLUE2,linewidth=3.0)<br><br>ax.set_title(\"Estimated Gamma distribution vs. true distribution\", fontsize=14, loc='left')<br>ax.set_xlabel('Data value')<br>ax.set_ylabel('Probability')<br>ax.legend()<br>ax.grid()<br>plt.savefig(\"Gamma_true_vs_est.png\", format=\"png\", dpi=800)<\/pre>\n<p>Note that we used the following equivalent ways of writing the variance when deriving the estimators in the cases of Gaussian and Gamma distributions.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Aocv2YeKXOP6PHuzx7nmp-Q.png?ssl=1\"><\/figure>\n<h3>Conclusion<\/h3>\n<p>In this article, we explored various examples of the method of moments estimator and its applications in different problems in data science. Moreover, detailed Python code that was used to implement the estimators from scratch as well as to plot the different figures is also shown. I hope that you will find this article\u00a0helpful.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=f19102ce5897\" width=\"1\" height=\"1\" alt=\"\"><\/p>\n<hr>\n<p><a href=\"https:\/\/towardsdatascience.com\/method-of-moments-estimation-with-python-code-f19102ce5897\">Method of Moments Estimation with Python Code<\/a> was originally published in <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Mahmoud Abdelaziz, PhD<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/medium.com\/m\/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fmethod-of-moments-estimation-with-python-code-f19102ce5897\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Method of Moments Estimation with Python Code How to understand and implement the estimator from\u00a0scratch Photo by Petr Mach\u00e1\u010dek on\u00a0Unsplash Let\u2019s say you are in a customer care center, and you would like to know the probability distribution of the number of calls per minute, or in other words, you want to answer the question: [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,83,1181,1183,1182,238],"tags":[84,582,921],"class_list":["post-1063","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-data-science","category-estimation-theory","category-probability-theory","category-statistical-inference","category-statistics","tag-data","tag-distribution","tag-probability"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1063"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1063"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1063\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1063"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1063"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1063"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}