{"id":1010,"date":"2025-01-07T07:03:56","date_gmt":"2025-01-07T07:03:56","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/01\/07\/in-defense-of-statistical-significance-996e8c62f3e8\/"},"modified":"2025-01-07T07:03:56","modified_gmt":"2025-01-07T07:03:56","slug":"in-defense-of-statistical-significance-996e8c62f3e8","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/01\/07\/in-defense-of-statistical-significance-996e8c62f3e8\/","title":{"rendered":"In Defense of Statistical Significance"},"content":{"rendered":"<p>    In Defense of Statistical Significance<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h4>We have to draw the line somewhere<\/h4>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A7ZkV0yoRXok1o7McHSi-FA.jpeg?ssl=1\"><figcaption>Photo by <a href=\"https:\/\/unsplash.com\/@siora18?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash\">Siora Photography<\/a> on\u00a0<a href=\"https:\/\/unsplash.com\/photos\/a-black-and-white-photo-of-a-wave-in-the-sand-IEsj0BXQMCI?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash\">Unsplash<\/a><\/figcaption><\/figure>\n<p>It\u2019s become something of a meme that statistical significance is a bad standard. Several recent blogs have made the rounds, making the case that statistical significance is a \u201ccult\u201d or \u201carbitrary.\u201d If you\u2019d like a classic polemic (and who wouldn\u2019t?), check out: <a href=\"https:\/\/www.deirdremccloskey.com\/docs\/jsm.pdf\">https:\/\/www.deirdremccloskey.com\/docs\/jsm.pdf<\/a>.<\/p>\n<p>This little essay is a defense of the so-called Cult of Statistical Significance.<\/p>\n<p>Statistical significance is a good enough idea, and I\u2019ve yet to see anything fundamentally better or practical enough to use in industry.<\/p>\n<p>I won\u2019t argue that statistical significance is the <em>perfect<\/em> way to make decisions, but it is\u00a0<em>fine<\/em>.<\/p>\n<h3>Statistical significance does not equal business significance\u200a\u2014\u200abut so\u00a0what?<\/h3>\n<p>A common point made by those who would besmirch the Cult is that statistical significance is not the same as business significance. They are correct, but it\u2019s not an argument to avoid statistical significance when making decisions.<\/p>\n<p>Statistical significance says, for example, that if the estimated impact of some change is 1% with a standard error of 0.25%, it is statistically significant (at the 5% level), while if the estimated impact of another change is 10% with a standard error of 6%, it is statistically insignificant (at the 5%\u00a0level).<\/p>\n<p>The argument goes that the 10% impact is more meaningful to the business, even if it is less\u00a0precise.<\/p>\n<p>Well, let\u2019s look at this from the perspective of <em>decision-making.<\/em><\/p>\n<p>There are two cases\u00a0here.<\/p>\n<h4>The two initiatives are separable.<\/h4>\n<p>If the two initiatives are separable, we should still launch the 1% with a 0.25% standard error\u200a\u2014\u200aright? It\u2019s a positive effect, so statistical significance does not lead us astray. We should launch the stat sig positive\u00a0result.<\/p>\n<p>Okay, so let\u2019s turn to the larger effect size experiment.<\/p>\n<p>Suppose the effect size was +10% with a standard error of 20%, i.e., the 95% confidence interval was roughly [-30%, +50%]. In this case, we don\u2019t really think there\u2019s any evidence the effect is positive, right? Despite the larger effect size, the standard error is too large to draw any meaningful conclusion.<\/p>\n<p>The problem isn\u2019t statistical significance. The problem is that we think a standard error of 6% is small enough in this case to launch the new feature based on this evidence. This example doesn\u2019t show a problem with statistical significance as a framework. It shows we are less worried about Type 1 error than alpha =\u00a05%.<\/p>\n<p>That\u2019s fine! We accept other alphas in our Cult, so long as they were selected before the experiment. Just use a larger alpha. For example, this is statistically significant with alpha =\u00a010%.<\/p>\n<p>The point is that there <em>is <\/em>a level of noise that we\u2019d find unacceptable. There\u2019s a level of noise where even if the estimated effect were +20%, we\u2019d say, \u201cWe don\u2019t really know what it\u00a0is.\u201d<\/p>\n<p>So, we have to say how much noise is too\u00a0much.<\/p>\n<p>Statistical inference, like art and morality, requires us to draw the line somewhere.<\/p>\n<h4>The initiatives are alternatives.<\/h4>\n<p>Now, suppose the two initiatives are alternatives. If we do one, we can\u2019t do the other. Which should we\u00a0choose?<\/p>\n<p>In this case, the problem with the above setup is that we\u2019re testing the wrong hypothesis. We don\u2019t just want to compare these initiatives to control. We also want to compare them to each\u00a0other.<\/p>\n<p>But this is also not a problem with statistical significance. It\u2019s a problem with the hypothesis we\u2019re\u00a0testing.<\/p>\n<p>We want to test whether the 9% difference in effect sizes is statistically significant, using an alpha level that makes sense for the same reason as in the previous case. There\u2019s a level of noise at which the 9% is just spurious, and we have to set that\u00a0level.<\/p>\n<p>Again, we have to draw the line somewhere.<\/p>\n<p>Now, let\u2019s deal with some other common objections, and then I\u2019ll pass out a sign-up sheet to join the\u00a0Cult.<\/p>\n<h3>Statistical significance is arbitrary.<\/h3>\n<p>This objection to statistical significance is common but misses the\u00a0point.<\/p>\n<p>Our attitudes towards risk and ambiguity (in the Statistical Decision Theory sense) are \u201carbitrary\u201d because we choose them. But there isn\u2019t any solution to that. Preferences are a given in any decision-making problem.<\/p>\n<p>Statistical significance is no more \u201carbitrary\u201d than other decision-making rules, and it has the nice intuition of trading off how much noise we\u2019ll allow versus effect size. It has a simple scalar parameter that we can adjust to prefer more or less Type 1 error relative to Type 2 error. It\u2019s\u00a0lovely.<\/p>\n<h3>People misinterpret frequentist p-values to indicate the probability that the effect is zero. To avoid this mistake, they should use Bayesian inference.<\/h3>\n<p>Sometimes, people argue that we should use Bayesian inference to make decisions because it is easier to interpret.<\/p>\n<p>I\u2019ll start by admitting that in its ideal setting, Bayesian inference has nice properties. We can take the posterior and treat it exactly like \u201cbeliefs\u201d and make decisions based on, say, the probability the effect is positive, which is not possible with frequentist statistical significance.<\/p>\n<p>Bayesian inference in practice is another\u00a0animal.<\/p>\n<p>Bayesian inference only gets those nice \u201cbelief\u201d-like properties if the prior reflects the decision-maker\u2019s actual prior beliefs. This is extremely difficult to do in practice.<\/p>\n<p>If you think choosing an \u201calpha\u201d that draws the line on how much noise you\u2019ll accept is tricky, imagine having to choose a <em>density<\/em> that correctly captures your\u200a\u2014\u200aor the decision-maker\u2019s\u200a\u2014\u200a<em>beliefs\u2026 <\/em>before every experiment! This is a very difficult problem.<\/p>\n<p>So, the Bayesian priors selected in practice are usually chosen because they are \u201cconvenient,\u201d \u201cuninformative,\u201d etc. They have little to do with actual prior\u00a0beliefs.<\/p>\n<p>When we\u2019re not specifying our real prior beliefs, the posterior distribution is just some weighting of the likelihood function. Claiming that we can look at the quantiles of this so-called posterior distribution and say the parameter has a 10% chance of being less than 0 is nonsense statistically.<\/p>\n<p>So, if anything, it is easier to misinterpret what we\u2019re doing in Bayesian land than in frequentist land. It is hard for statisticians to translate their prior beliefs into a distribution. How much harder is it for whoever the actual decision-maker is on the\u00a0project?<\/p>\n<p>For these reasons, Bayesian inference doesn\u2019t scale well, which is why, I think, Experimentation Platforms across the industry generally don\u2019t use\u00a0it.<\/p>\n<h3>The Church of Statistical Significance<\/h3>\n<p>The arguments against the \u201cCult\u201d of Statistical Significance are, of course, a response to a real problem. There <em>is<\/em> a dangerous Cult within our\u00a0<em>Church<\/em>.<\/p>\n<p>The Church of Statistical Significance is quite accepting. We allow for other alpha\u2019s besides 5%. We choose hypotheses that don\u2019t test against zero nulls,\u00a0etc.<\/p>\n<p>But sometimes, our good name is tarnished by a radical element within the Church that treats anything insignificant versus a null hypothesis of 0 at the 5% level as \u201cnot\u00a0real.\u201d<\/p>\n<p>These heretics believe in a cargo-cult version of statistical analysis where the statistical significance procedure (at the 5% level) determines what is true instead of just being a useful way to make decisions and weigh uncertainty.<\/p>\n<p>We disavow all association with this dangerous sect, of\u00a0course.<\/p>\n<p>Let me know if you\u2019d like to join the Church. I\u2019ll sign you up for the monthly\u00a0potluck.<\/p>\n<p>Thanks for\u00a0reading!<\/p>\n<p>Zach<\/p>\n<p>Connect at: <a href=\"https:\/\/linkedin.com\/in\/zlflynn\">https:\/\/linkedin.com\/in\/zlflynn<\/a><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=996e8c62f3e8\" width=\"1\" height=\"1\" alt=\"\"><\/p>\n<hr>\n<p><a href=\"https:\/\/towardsdatascience.com\/in-defense-of-statistical-significance-996e8c62f3e8\">In Defense of Statistical Significance<\/a> was originally published in <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Zach Flynn<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/medium.com\/m\/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fin-defense-of-statistical-significance-996e8c62f3e8\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In Defense of Statistical Significance We have to draw the line somewhere Photo by Siora Photography on\u00a0Unsplash It\u2019s become something of a meme that statistical significance is a bad standard. Several recent blogs have made the rounds, making the case that statistical significance is a \u201ccult\u201d or \u201carbitrary.\u201d If you\u2019d like a classic polemic (and [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,692,83,312,313,238],"tags":[316,1069,315],"class_list":["post-1010","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-data","category-data-science","category-decision-making","category-statistical-significance","category-statistics","tag-significance","tag-standard","tag-statistical"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1010"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1010"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1010\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1010"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1010"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1010"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}