{"id":1737,"date":"2025-02-08T07:02:38","date_gmt":"2025-02-08T07:02:38","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/02\/08\/the-gamma-hurdle-distribution\/"},"modified":"2025-02-08T07:02:38","modified_gmt":"2025-02-08T07:02:38","slug":"the-gamma-hurdle-distribution","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/02\/08\/the-gamma-hurdle-distribution\/","title":{"rendered":"The Gamma Hurdle Distribution"},"content":{"rendered":"<p>    The Gamma Hurdle Distribution<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-paragraph\" id=\"94ec\"><strong>Which Outcome Matters?<\/strong><\/p>\n<p class=\"wp-block-paragraph\" id=\"fcdb\">Here is a common scenario : An A\/B test was conducted, where a random sample of units (e.g. customers) were selected for a campaign and they received Treatment A. Another sample was selected to receive Treatment B. \u201cA\u201d could be a communication or offer and \u201cB\u201d could be no communication or no offer. \u201cA\u201d could be 10% off and \u201cB\u201d could be 20% off. Two groups, two different treatments, where A and B are two discrete treatments, but without loss of generality to greater than 2 treatments and continuous treatments.<\/p>\n<p class=\"wp-block-paragraph\" id=\"2d3e\">So, the campaign runs and results are made available. With our backend system, we can track which of these units took the action of interest (e.g. made a purchase) and which did not. Further, for those that did, we log the intensity of that action. A common scenario is that we can track purchase amounts for those that purchased. This is often called an average order amount or revenue per buyer metric. Or a hundred different names that all mean the same thing \u2014 for those that purchased, how much did they spend, on average?<\/p>\n<p class=\"wp-block-paragraph\" id=\"a437\">For some use-cases, the marketer is interested in the former metric \u2014 the purchase rate. For example, did we drive more (potentially first time) buyers in our acquisition campaign with Treatment A or B? Sometimes, we are interested in driving the revenue per buyer higher so we put emphasis on the latter.<\/p>\n<p class=\"wp-block-paragraph\" id=\"8fd4\">More often though, we are interested in driving revenue in a cost effective manner and what we really care about is the revenue that the campaign produced\u00a0<em>overall<\/em>. Did treatment A or B drive more revenue? We don\u2019t always have balanced sample sizes (perhaps due to cost or risk avoidance) and so we divide the measured revenue by the number of candidates that were treated in each group (call these counts N_A and N_B). We want to compare this measure between the two groups, so the standard contrast is simply:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" data-dominant-color=\"ededed\" data-has-transparency=\"false\" style=\"--dominant-color: #ededed;\" decoding=\"async\" width=\"374\" height=\"68\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_NmHjbi_qPoTaRbMODRe30w.webp?resize=374%2C68&#038;ssl=1\" alt=\"\" class=\"wp-image-597597 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_NmHjbi_qPoTaRbMODRe30w.webp 374w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_NmHjbi_qPoTaRbMODRe30w-300x55.webp 300w\" sizes=\"(max-width: 374px) 100vw, 374px\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"b0e3\">This is just the mean revenue for Treatment A minus mean revenue for Treatment B, where that mean is taken over the entire set of targeted units, irrespective if they responded or not. Its interpretation is likewise straightforward \u2014 what is the average revenue per promoted unit increase going from Treatment A versus Treatment B?<\/p>\n<p class=\"wp-block-paragraph\" id=\"6342\">Of course, this last measure accounts for both of the prior: the response rate multiplied by the mean revenue per responder.<\/p>\n<p class=\"wp-block-paragraph\" id=\"bb21\"><strong>Uncertainty?<\/strong><\/p>\n<p class=\"wp-block-paragraph\" id=\"f798\">How much a buyer spends is highly variable and a couple large purchases in one treatment group or the other can skew the mean significantly. Likewise, sample variation can be significant. So, we want to understand how confident we are in this comparison of means and quantify the \u201csignificance\u201d of the observed difference.<\/p>\n<p class=\"wp-block-paragraph\" id=\"4f46\">So, you throw the data in a t-test and stare at the p-value. But wait! Unfortunately for the marketer, the vast majority of the time, the purchase rate is relatively low (sometimes VERY low) and hence there are a lot of zero revenue values \u2014 often the vast majority. The t-test assumptions may be badly violated. Very large sample sizes may come to the rescue, but there is a more principled way to analyze this data that is useful in multiple ways, that will be explained.<\/p>\n<p class=\"wp-block-paragraph\" id=\"5990\"><strong>Example Dataset<\/strong><\/p>\n<p class=\"wp-block-paragraph\" id=\"24c5\">Lets start with the sample dataset to makes things practical. One of my favorite direct marketing datasets is from the KDD Cup 98.<\/p>\n<pre class=\"wp-block-code\"><code>url = 'https:\/\/kdd.ics.uci.edu\/databases\/kddcup98\/epsilon_mirror\/cup98lrn.zip'\nfilename = 'cup98LRN.txt'\n\nr = requests.get(url)\nz = zipfile.ZipFile(io.BytesIO(r.content))\nz.extractall()\n\n\npdf_data = pd.read_csv(filename, sep=',')\npdf_data = pdf_data.query('TARGET_D &gt;=0')\npdf_data['TREATMENT'] =  np.where(pdf_data.RFA_2F &gt;1,'A','B')\npdf_data['TREATED'] =  np.where(pdf_data.RFA_2F &gt;1,1,0)\npdf_data['GT_0'] = np.where(pdf_data.TARGET_D &gt;0,1,0)\npdf_data = pdf_data[['TREATMENT', 'TREATED', 'GT_0', 'TARGET_D']]\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"a1a6\">In the code snippet above we are downloading a zip file (the learning dataset specifically), extracting it and reading it into a Pandas data frame. The nature of this dataset is campaign history from a non-profit organization that was seeking donations through direct mailings. There is no treatment variants within this dataset, so we are pretending instead and segmenting the dataset based on the frequency of past donations. We call this indicator\u00a0<em>TREATMENT<\/em>\u00a0(as the categorical and create\u00a0<em>TREATED<\/em>\u00a0as the binary indicator for \u2018A\u2019 ). Consider this the results of a randomized control trial where a portion of the sample population was treated with an offer and the remainder were not. We track each individual and accumulate the amount of their donation.<\/p>\n<p class=\"wp-block-paragraph\" id=\"2727\">So, if we examine this dataset, we see that there are about 95,000 promoted individuals, generally distributed equally across the two treatments:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"a8c9e0\" data-has-transparency=\"false\" style=\"--dominant-color: #a8c9e0;\" fetchpriority=\"high\" decoding=\"async\" width=\"596\" height=\"434\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_XhKq9V5wrHOoNdv-337O6g.webp?resize=596%2C434&#038;ssl=1\" alt=\"\" class=\"wp-image-597599 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_XhKq9V5wrHOoNdv-337O6g.webp 596w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_XhKq9V5wrHOoNdv-337O6g-300x218.webp 300w\" sizes=\"(max-width: 596px) 100vw, 596px\"><\/figure>\n<p class=\"wp-block-paragraph\">Treatment A has a larger response rate but overall the response rate in the dataset is only around 5%. So, we have 95% zeros.<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" data-dominant-color=\"b6d1e4\" data-has-transparency=\"false\" style=\"--dominant-color: #b6d1e4;\" decoding=\"async\" width=\"577\" height=\"429\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_d4Rey_jyrOFQFdQCRqTLgw.webp?resize=577%2C429&#038;ssl=1\" alt=\"\" class=\"wp-image-597600 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_d4Rey_jyrOFQFdQCRqTLgw.webp 577w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_d4Rey_jyrOFQFdQCRqTLgw-300x223.webp 300w\" sizes=\"(max-width: 577px) 100vw, 577px\"><\/figure>\n<p class=\"wp-block-paragraph\">For those that donated, Treatment A appears to be associated with a lower average donation amount.<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"b5d1e4\" data-has-transparency=\"false\" style=\"--dominant-color: #b5d1e4;\" loading=\"lazy\" decoding=\"async\" width=\"581\" height=\"425\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_oV16Oan-_R3-ZBrk5h-R2g.webp?resize=581%2C425&#038;ssl=1\" alt=\"\" class=\"wp-image-597601 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_oV16Oan-_R3-ZBrk5h-R2g.webp 581w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_oV16Oan-_R3-ZBrk5h-R2g-300x219.webp 300w\" sizes=\"auto, (max-width: 581px) 100vw, 581px\"><\/figure>\n<p class=\"wp-block-paragraph\">Combining together everyone that was targeted, Treatment A appears to be associated with a higher average donation amount \u2014 the higher response rate outweighs the lower donation amount for responders\u2014 but not by much.<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"a5c7df\" data-has-transparency=\"false\" style=\"--dominant-color: #a5c7df;\" loading=\"lazy\" decoding=\"async\" width=\"571\" height=\"427\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_uPBb_wAAR2TLlHlraK-WwQ.webp?resize=571%2C427&#038;ssl=1\" alt=\"\" class=\"wp-image-597602 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_uPBb_wAAR2TLlHlraK-WwQ.webp 571w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_uPBb_wAAR2TLlHlraK-WwQ-300x224.webp 300w\" sizes=\"auto, (max-width: 571px) 100vw, 571px\"><\/figure>\n<p class=\"wp-block-paragraph\">Finally, the histogram of the donation amount is shown here, pooled over both treatments, which illustrates the mass at zero and a right skew.<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"ecf1f5\" data-has-transparency=\"false\" style=\"--dominant-color: #ecf1f5;\" loading=\"lazy\" decoding=\"async\" width=\"570\" height=\"434\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_3o4B59kt_EOoY7ESCPGLBQ.webp?resize=570%2C434&#038;ssl=1\" alt=\"\" class=\"wp-image-597603 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_3o4B59kt_EOoY7ESCPGLBQ.webp 570w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_3o4B59kt_EOoY7ESCPGLBQ-300x228.webp 300w\" sizes=\"auto, (max-width: 570px) 100vw, 570px\"><\/figure>\n<p class=\"wp-block-paragraph\">A numerical summary of the two treatment groups quantifies the phenomenon observed above \u2014 while Treatment A appears to have driven significantly higher response, those that were treated with A donated less on average when they responded. The net of these two measures, the one we are ultimately after \u2014 the overall mean donation per targeted unit \u2013 appears to still be higher for Treatment A. How confident we are in that finding is the subject of this analysis.<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"ececec\" data-has-transparency=\"false\" style=\"--dominant-color: #ececec;\" loading=\"lazy\" decoding=\"async\" width=\"653\" height=\"166\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_5ubGZIXHhdFAekgvoK4IUA.webp?resize=653%2C166&#038;ssl=1\" alt=\"\" class=\"wp-image-597604 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_5ubGZIXHhdFAekgvoK4IUA.webp 653w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_5ubGZIXHhdFAekgvoK4IUA-300x76.webp 300w\" sizes=\"auto, (max-width: 653px) 100vw, 653px\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"3504\"><strong>Gamma Hurdle<\/strong><\/p>\n<p class=\"wp-block-paragraph\" id=\"4c7f\">One way to model this data and answer our research question in terms of the difference between the two treatments in generating the average donation per targeted unit is with the Gamma Hurdle distribution. Similar to the more well known Zero Inflated Poisson (ZIP) or NB (ZINB) distribution, this is a mixture distribution where one part pertains to the mass at zero and the other, in the cases where the random variable is positive, the gamma density function.<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"f3f3f3\" data-has-transparency=\"false\" style=\"--dominant-color: #f3f3f3;\" loading=\"lazy\" decoding=\"async\" width=\"326\" height=\"99\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_h0sDO1A0w0QgnmTzKIp5uQ.webp?resize=326%2C99&#038;ssl=1\" alt=\"\" class=\"wp-image-597605 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_h0sDO1A0w0QgnmTzKIp5uQ.webp 326w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_h0sDO1A0w0QgnmTzKIp5uQ-300x91.webp 300w\" sizes=\"auto, (max-width: 326px) 100vw, 326px\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"caed\">Here \u03c0 represents the probability that the random variable y is &gt; 0. In other words its the probability of the gamma process. Likewise, (1- \u03c0) is the probability that the random variable is zero. In terms of our problem, this pertains to the probability that a donation is made and if so, it\u2019s value.<\/p>\n<p class=\"wp-block-paragraph\" id=\"d25c\">Lets start with the component parts of using this distribution in a regression \u2013 logistic and gamma regression.<\/p>\n<p class=\"wp-block-paragraph\" id=\"c9ef\"><strong>Logistic Regression<\/strong><\/p>\n<p class=\"wp-block-paragraph\" id=\"aed9\">The logit function is the link function here, relating the log odds to the linear combination of our predictor variables, which with a single variable such as our binary treatment indicator, looks like:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"f2f3f3\" data-has-transparency=\"false\" style=\"--dominant-color: #f2f3f3;\" loading=\"lazy\" decoding=\"async\" width=\"288\" height=\"62\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_qkP9KSclpft2fxW438VfcA.webp?resize=288%2C62&#038;ssl=1\" alt=\"\" class=\"wp-image-597606 not-transparent\"><\/figure>\n<p class=\"wp-block-paragraph\">Where \u03c0 represents the probability that the outcome is a \u201cpositive\u201d (denoted as 1) event such as a purchase and (1-\u03c0) represents the probability that the outcome is a \u201cnegative\u201d (denoted as 0) event. Further, \u03c0 which is the qty of interest above, is defined by the inverse logit function:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"f4f4f4\" data-has-transparency=\"false\" style=\"--dominant-color: #f4f4f4;\" loading=\"lazy\" decoding=\"async\" width=\"242\" height=\"89\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_kLFG7jGpUEZOHqbp-DKFJA.webp?resize=242%2C89&#038;ssl=1\" alt=\"\" class=\"wp-image-597607 not-transparent\"><\/figure>\n<p class=\"wp-block-paragraph\">Fitting this model is very simple, we need to find the values of the two betas that maximize the likelihood of the data (the outcome y)\u2014 which assuming N iid observations is:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"f2f2f2\" data-has-transparency=\"false\" style=\"--dominant-color: #f2f2f2;\" loading=\"lazy\" decoding=\"async\" width=\"362\" height=\"94\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_iQBYBYICkhBKmuC0OCcilg.webp?resize=362%2C94&#038;ssl=1\" alt=\"\" class=\"wp-image-597608 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_iQBYBYICkhBKmuC0OCcilg.webp 362w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_iQBYBYICkhBKmuC0OCcilg-300x78.webp 300w\" sizes=\"auto, (max-width: 362px) 100vw, 362px\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"8884\">We could use any of multiple libraries to quickly fit this model but will demonstrate PYMC as the means to build a simple Bayesian logistic regression.<\/p>\n<p class=\"wp-block-paragraph\" id=\"aacc\">Without any of the normal steps of the Bayesian workflow, we fit this simple model using MCMC.<\/p>\n<pre class=\"wp-block-code\"><code>import pymc as pm\nimport arviz as az\nfrom scipy.special import expit\n\n\nwith pm.Model() as logistic_model:\n\n    # noninformative priors\n    intercept = pm.Normal('intercept', 0, sigma=10)\n    beta_treat = pm.Normal('beta_treat', 0, sigma=10)\n\n    # linear combination of the treated variable \n    # through the inverse logit to squish the linear predictor between 0 and 1\n    p =  pm.invlogit(intercept + beta_treat * pdf_data.TREATED)\n\n    # Individual level binary variable (respond or not)\n    pm.Bernoulli(name='logit', p=p, observed=pdf_data.GT_0)\n\n    idata = pm.sample(nuts_sampler=\"numpyro\")\n<\/code><\/pre>\n<pre class=\"wp-block-code\"><code>az.summary(idata, var_names=['intercept', 'beta_treat'])<\/code><\/pre>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"f0f0f0\" data-has-transparency=\"false\" style=\"--dominant-color: #f0f0f0;\" loading=\"lazy\" decoding=\"async\" width=\"763\" height=\"125\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_PCma1nanJ8DhgDOzwG-zjA.webp?resize=763%2C125&#038;ssl=1\" alt=\"\" class=\"wp-image-597609 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_PCma1nanJ8DhgDOzwG-zjA.webp 763w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_PCma1nanJ8DhgDOzwG-zjA-300x49.webp 300w\" sizes=\"auto, (max-width: 763px) 100vw, 763px\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"2b48\">If we construct a contrast of the two treatment mean response rates, we find that as expected, the mean response rate lift for Treatment A is 0.026 larger than Treatment B with a 94% credible interval of (0.024 , 0.029).<\/p>\n<pre class=\"wp-block-code\"><code># create a new column in the posterior which contrasts Treatment A - B\nidata.posterior['TREATMENT A - TREATMENT B'] = expit(idata.posterior.intercept + idata.posterior.beta_treat) -  expit(idata.posterior.intercept)\n\naz.plot_posterior(\n    idata,\n    var_names=['TREATMENT A - TREATMENT B']\n)\n<\/code><\/pre>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"f7f8f8\" data-has-transparency=\"false\" style=\"--dominant-color: #f7f8f8;\" loading=\"lazy\" decoding=\"async\" width=\"651\" height=\"556\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_w8gMLIEOtSjShj8Z39vVtg.webp?resize=651%2C556&#038;ssl=1\" alt=\"\" class=\"wp-image-597610 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_w8gMLIEOtSjShj8Z39vVtg.webp 651w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_w8gMLIEOtSjShj8Z39vVtg-300x256.webp 300w\" sizes=\"auto, (max-width: 651px) 100vw, 651px\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"8368\"><strong>Gamma Regression<\/strong><\/p>\n<p class=\"wp-block-paragraph\" id=\"6cbf\">The next component is the gamma distribution with one of it\u2019s parametrizations of it\u2019s probability density function, as shown above:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"f3f3f3\" data-has-transparency=\"false\" style=\"--dominant-color: #f3f3f3;\" loading=\"lazy\" decoding=\"async\" width=\"226\" height=\"82\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_VQCD_xGfUOsWpygyORrDtw.webp?resize=226%2C82&#038;ssl=1\" alt=\"\" class=\"wp-image-597611 not-transparent\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"df84\">This distribution is defined for strictly positive random variables and if used in business for values such as costs, customer demand spending and insurance claim amounts.<\/p>\n<p class=\"wp-block-paragraph\" id=\"5b0d\">Since the mean and variance of gamma are defined in terms of \u03b1 and \u03b2 according to the formulas:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"f5f5f5\" data-has-transparency=\"false\" style=\"--dominant-color: #f5f5f5;\" loading=\"lazy\" decoding=\"async\" width=\"220\" height=\"70\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_MzdTGteHRdwkAiVJcWDsbg.webp?resize=220%2C70&#038;ssl=1\" alt=\"\" class=\"wp-image-597612 not-transparent\"><\/figure>\n<p class=\"wp-block-paragraph\">for gamma regression, we can parameterize by \u03b1 and \u03b2 or by \u03bc and \u03c3. If we make \u03bc defined as a linear combination of predictor variables, then we can define gamma in terms of \u03b1 and \u03b2 using \u03bc:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"f2f2f2\" data-has-transparency=\"false\" style=\"--dominant-color: #f2f2f2;\" loading=\"lazy\" decoding=\"async\" width=\"92\" height=\"66\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_AN3LxtD93LctbhEdNrXHVA.webp?resize=92%2C66&#038;ssl=1\" alt=\"\" class=\"wp-image-597613 not-transparent\"><\/figure>\n<p class=\"wp-block-paragraph\">The gamma regression model assumes (in this case, the inverse link is another common option) the log link which is intended to \u201clinearize\u201d the relationship between predictor and outcome:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"ededed\" data-has-transparency=\"false\" style=\"--dominant-color: #ededed;\" loading=\"lazy\" decoding=\"async\" width=\"258\" height=\"49\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_zHxn8BgnahZoguvgfN7KUA.webp?resize=258%2C49&#038;ssl=1\" alt=\"\" class=\"wp-image-597614 not-transparent\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"3aaf\">Following almost exactly the same methodology as for the response rate, we limit the dataset to only responders and fit the gamma regression using PYMC.<\/p>\n<pre class=\"wp-block-code\"><code>with pm.Model() as gamma_model:\n\n    # noninformative priors\n    intercept = pm.Normal('intercept', 0, sigma=10)\n    beta_treat = pm.Normal('beta_treat', 0, sigma=10)\n\n    shape = pm.HalfNormal('shape', 5)\n\n    # linear combination of the treated variable \n    # through the exp to ensure the linear predictor is positive\n    mu =  pm.Deterministic('mu',pm.math.exp(intercept + beta_treat * pdf_responders.TREATED))\n\n    # Individual level binary variable (respond or not)\n    pm.Gamma(name='gamma', alpha = shape, beta = shape\/mu,  observed=pdf_responders.TARGET_D)\n\n    idata = pm.sample(nuts_sampler=\"numpyro\")\n<\/code><\/pre>\n<pre class=\"wp-block-code\"><code>az.summary(idata, var_names=['intercept', 'beta_treat'])<\/code><\/pre>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"f0f0f0\" data-has-transparency=\"false\" style=\"--dominant-color: #f0f0f0;\" loading=\"lazy\" decoding=\"async\" width=\"754\" height=\"117\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_vSZpBr3JIRb2nA3902nSxA.webp?resize=754%2C117&#038;ssl=1\" alt=\"\" class=\"wp-image-597615 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_vSZpBr3JIRb2nA3902nSxA.webp 754w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_vSZpBr3JIRb2nA3902nSxA-300x47.webp 300w\" sizes=\"auto, (max-width: 754px) 100vw, 754px\"><\/figure>\n<pre class=\"wp-block-code\"><code># create a new column in the posterior which contrasts Treatment A - B\nidata.posterior['TREATMENT A - TREATMENT B'] = np.exp(idata.posterior.intercept + idata.posterior.beta_treat) -  np.exp(idata.posterior.intercept)\n\naz.plot_posterior(\n    idata,\n    var_names=['TREATMENT A - TREATMENT B']\n)\n<\/code><\/pre>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"f8f9f9\" data-has-transparency=\"false\" style=\"--dominant-color: #f8f9f9;\" loading=\"lazy\" decoding=\"async\" width=\"645\" height=\"554\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_UDfCK0GRXrj8cDpo4gubOg.webp?resize=645%2C554&#038;ssl=1\" alt=\"\" class=\"wp-image-597616 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_UDfCK0GRXrj8cDpo4gubOg.webp 645w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_UDfCK0GRXrj8cDpo4gubOg-300x258.webp 300w\" sizes=\"auto, (max-width: 645px) 100vw, 645px\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"0083\">Again, as expected, we see the mean lift for Treatment A to have an expected value equal to the sample value of -7.8. The 94% credible interval is (-8.3, -7.3).<\/p>\n<p class=\"wp-block-paragraph\" id=\"77d6\">The components, response rate and average amount per responder shown above are about as simple as we can get. But, its a straight forward extension to add additional predictors in order to 1) estimate the Conditional Average Treatment Effects (CATE) when we expect the treatment effect to differ by segment or 2) reduce the variance of the average treatment effect estimate by conditioning on pre-treatment variables.<\/p>\n<p class=\"wp-block-paragraph\" id=\"cfb7\"><strong>Hurdle Model (Gamma) Regression<\/strong><\/p>\n<p class=\"wp-block-paragraph\" id=\"f63f\">At this point, it should be pretty straightforward to see where we are progressing. For the hurdle model, we have a conditional likelihood, depending on if the specific observation is 0 or greater than zero, as shown above for the gamma hurdle distribution. We can fit the two component models (logistic and gamma regression) simultaneously. We get for free, their product, which in our example is an estimate of the donation amount per targeted unit.<\/p>\n<p class=\"wp-block-paragraph\" id=\"8c74\">It would not be difficult to fit this model with using a likelihood function with a switch statement depending on the value of the outcome variable, but PYMC has this distribution already encoded for us.<\/p>\n<pre class=\"wp-block-code\"><code>import pymc as pm\nimport arviz as az\n\nwith pm.Model() as hurdle_model:\n\n    ## noninformative priors ##\n    # logistic\n    intercept_lr = pm.Normal('intercept_lr', 0, sigma=5)\n    beta_treat_lr = pm.Normal('beta_treat_lr', 0, sigma=1)\n\n    # gamma\n    intercept_gr = pm.Normal('intercept_gr', 0, sigma=5)\n    beta_treat_gr = pm.Normal('beta_treat_gr', 0, sigma=1)\n\n    # alpha\n    shape = pm.HalfNormal('shape', 1)\n\n    ## mean functions of predictors ##\n    p =  pm.Deterministic('p', pm.invlogit(intercept_lr + beta_treat_lr * pdf_data.TREATED))\n    mu =  pm.Deterministic('mu',pm.math.exp(intercept_gr + beta_treat_gr * pdf_data.TREATED))\n    \n    ## likliehood ##\n    # psi is pi\n    pm.HurdleGamma(name='hurdlegamma', psi=p, alpha = shape, beta = shape\/mu, observed=pdf_data.TARGET_D)\n\n    idata = pm.sample(cores = 10)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">If we examine the trace summary, we see that the results are exactly the same for the two component models.<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"eeefee\" data-has-transparency=\"false\" style=\"--dominant-color: #eeefee;\" loading=\"lazy\" decoding=\"async\" width=\"618\" height=\"151\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_jirR0FKbDagR9LAY3l8OTw.webp?resize=618%2C151&#038;ssl=1\" alt=\"\" class=\"wp-image-597617 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_jirR0FKbDagR9LAY3l8OTw.webp 618w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_jirR0FKbDagR9LAY3l8OTw-300x73.webp 300w\" sizes=\"auto, (max-width: 618px) 100vw, 618px\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"a39d\">As noted, the mean of the gamma hurdle distribution is \u03c0 * \u03bc so we can create a contrast:<\/p>\n<pre class=\"wp-block-code\"><code># create a new column in the posterior which contrasts Treatment A - B\nidata.posterior['TREATMENT A - TREATMENT B'] = ((expit(idata.posterior.intercept_lr + idata.posterior.beta_treat_lr))* np.exp(idata.posterior.intercept_gr + idata.posterior.beta_treat_gr)) - \n                                                    ((expit(idata.posterior.intercept_lr))* np.exp(idata.posterior.intercept_gr))\n\naz.plot_posterior(\n    idata,\n    var_names=['TREATMENT A - TREATMENT B']\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The mean expected value of this model is 0.043 with a 94% credible interval of (-0.0069, 0.092). We could interrogate the posterior to see what proportion of times the donation per buyer is predicted to be higher for Treatment A and any other decision functions that made sense for our case \u2014 including adding a fuller P&amp;L to the estimate (i.e. including margins and cost).<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"f7f8f8\" data-has-transparency=\"false\" style=\"--dominant-color: #f7f8f8;\" loading=\"lazy\" decoding=\"async\" width=\"503\" height=\"444\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_ne_kjD6As2ja2eiXh4Pc2g.webp?resize=503%2C444&#038;ssl=1\" alt=\"\" class=\"wp-image-597618 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_ne_kjD6As2ja2eiXh4Pc2g.webp 503w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_ne_kjD6As2ja2eiXh4Pc2g-300x265.webp 300w\" sizes=\"auto, (max-width: 503px) 100vw, 503px\"><\/figure>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\" id=\"e248\">Notes: Some implementations parameterize the gamma hurdle model differently where the probability of zero is \u03c0 and hence the mean of the gamma hurdle involves (1-\u03c0) instead. Also note that at the time of this writing there appears to be an\u00a0<a href=\"https:\/\/github.com\/pymc-devs\/nutpie\/issues\/163\" rel=\"noreferrer noopener\" target=\"_blank\">issue<\/a>\u00a0with the nuts samplers in PYMC and we had to fall back on the default python implementation for running the above code.<\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\" id=\"cb89\"><strong>Summary<\/strong><\/p>\n<p class=\"wp-block-paragraph\" id=\"da2f\">With this approach, we get the same inference for both models separately and the extra benefit of the third metric. Fitting these models with PYMC allows us all the benefits of Bayesian analysis \u2014 including injection of prior domain knowledge and a full posterior to answer questions and quantify uncertainty!<\/p>\n<p class=\"wp-block-paragraph\" id=\"efb1\">Credits:<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">All images are the authors, unless otherwise noted.<\/li>\n<li class=\"wp-block-list-item\">The dataset used is from the KDD 98 Cup sponsored by Epsilon.\u00a0<a href=\"https:\/\/kdd.ics.uci.edu\/databases\/kddcup98\/kddcup98.html\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/kdd.ics.uci.edu\/databases\/kddcup98\/kddcup98.html<\/a> (CC BY 4.0)<a href=\"https:\/\/medium.com\/tag\/statistics?source=post_page-----a499c318e2a3--------------------------------\"><\/a>\n<\/li>\n<\/ol>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/the-gamma-hurdle-distribution\/\">The Gamma Hurdle Distribution<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Jeff Allard<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/the-gamma-hurdle-distribution\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Gamma Hurdle Distribution Which Outcome Matters? Here is a common scenario : An A\/B test was conducted, where a random sample of units (e.g. customers) were selected for a campaign and they received Treatment A. Another sample was selected to receive Treatment B. \u201cA\u201d could be a communication or offer and \u201cB\u201d could be [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1690,62,83,67,1442,238,1691],"tags":[1694,1693,1692],"class_list":["post-1737","post","type-post","status-publish","format-standard","hentry","category-a-b-testing","category-aimldsaimlds","category-data-science","category-deep-dives","category-marketing-analytics","category-statistics","category-uncertainity","tag-campaign","tag-did","tag-revenue"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1737"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1737"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1737\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1737"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1737"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1737"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}