{"id":1429,"date":"2025-01-25T07:02:50","date_gmt":"2025-01-25T07:02:50","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/01\/25\/does-it-matter-that-online-experiments-interact-9c4012b75fbd\/"},"modified":"2025-01-25T07:02:50","modified_gmt":"2025-01-25T07:02:50","slug":"does-it-matter-that-online-experiments-interact-9c4012b75fbd","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/01\/25\/does-it-matter-that-online-experiments-interact-9c4012b75fbd\/","title":{"rendered":"Does It Matter That Online Experiments Interact?"},"content":{"rendered":"<p>    Does It Matter That Online Experiments Interact?<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h4>What interactions do, why they are just like any other change in the environment post-experiment, and some reassurance<\/h4>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2ARnHoQz0cChzDgCOJvQIA-w.jpeg?ssl=1\"><figcaption>Photo by <a href=\"https:\/\/unsplash.com\/@soberanes?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash\">Uriel Soberanes<\/a> on\u00a0<a href=\"https:\/\/unsplash.com\/photos\/two-bisons-fighting-head-L1bAGEWYCtk?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash\">Unsplash<\/a><\/figcaption><\/figure>\n<p>Experiments do not run one at a time. At any moment, hundreds to thousands of experiments run on a mature website. The question comes up: what if these experiments interact with each other? Is that a problem? As with many interesting questions, the answer is \u201cyes and no.\u201d Read on to get even more definite, actionable, entirely clear, and confident takes like\u00a0that!<\/p>\n<p>Definitions: Experiments <strong>interact<\/strong> when the treatment effect for one experiment depends on which variant of another experiment the unit gets assigned\u00a0to.<\/p>\n<p>For example, suppose we have an experiment testing a new search model and another testing a new recommendation model, powering a \u201cpeople also bought\u201d module. Both experiments are ultimately about helping customers find what they want to buy. Units assigned to the better recommendation algorithm may have a smaller treatment effect in the search experiment because they are less likely to be influenced by the search algorithm: they made their purchase because of the better recommendation.<\/p>\n<p>Some empirical evidence suggests that typical interaction effects are <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/articles\/a-b-interactions-a-call-to-relax\/\">small<\/a>. Maybe you don\u2019t find this particularly comforting. I\u2019m not sure I do, either. After all, the size of interaction effects depends on the experiments we run. For your particular organization, experiments might interact more or less. It might be the case that interaction effects are larger in your context than at the companies typically profiled in these types of analyses.<\/p>\n<p>So, this blog post is not an empirical argument. It\u2019s theoretical. That means it includes math. So it goes. We will try to understand the issues with interactions with an explicit model without reference to a particular company\u2019s data. Even if interaction effects are relatively large, we\u2019ll find that they rarely matter for <em>decision-making<\/em>. Interaction effects must be massive and have a peculiar pattern to affect which experiment wins. The point of the blog is to bring you peace of\u00a0mind.<\/p>\n<h3>Interactions Aren\u2019t So Special, And They Aren\u2019t So\u00a0Bad<\/h3>\n<p>Suppose we have two A\/B experiments. Let Z = 1 indicate treatment in the first experiment and W = 1 indicate treatment in the second experiment. Y is the metric of interest.<\/p>\n<p>The treatment effect in experiment 1\u00a0is:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/279\/1%2AQpr-e0xtvXjA2jsuk3v2gQ.png?ssl=1\"><\/figure>\n<p>Let\u2019s decompose these terms to look at how interaction impacts the treatment effect.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/768\/1%2AWXNjJs_CgoCd_mRdQcderw.png?ssl=1\"><\/figure>\n<p>Bucketing for one randomized experiment is independent of bucketing in another randomized experiment, so:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/656\/1%2ATxMWSvpFdOSgr_Sa_WSKoQ.png?ssl=1\"><\/figure>\n<p>So, the treatment effect\u00a0is:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/509\/1%2A7Eu9H9shgn4P4e08aA2RcA.png?ssl=1\"><\/figure>\n<p>Or, more succinctly, the treatment effect is the weighted average of the treatment effect within the W=1 and W=0 populations:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/456\/1%2AMKxbFE9rOl_G6XrhGX1B9A.png?ssl=1\"><\/figure>\n<p>One of the great things about just writing the math down is that it makes our problem concrete. We can see exactly the form the bias from interaction will take and what will determine its\u00a0size.<\/p>\n<p>The problem is this: only W = 1 or W = 0 will launch after the second experiment ends. So, the environment during the first experiment will not be the same as the environment after it. This introduces the following bias in the treatment effect:<\/p>\n<p>Suppose W = w launches, then the post-experiment treatment effect for the first experiment, TE(W=w), is mismeasured by the experiment treatment effect, TE, leading to the\u00a0bias:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/735\/1%2AWEpaMKdHvoWVHR7Bk7w7dw.png?ssl=1\"><\/figure>\n<p>If there is an interaction between the second experiment and the first, then TE(W=1-w)\u200a\u2014\u200aTE(W=w)\u00a0!= 0, so there is a\u00a0bias.<\/p>\n<p><strong>So, <em>yes<\/em>, interactions cause a bias.<\/strong> The bias is directly proportional to the size of the interaction effect.<\/p>\n<p>But <strong>interactions are not special<\/strong>. <strong>Anything<\/strong><em> <\/em>that differs between the experiment\u2019s environment and the future environment that affects the treatment effect leads to a bias with the same form. Does your product have seasonal demand? Was there a large supply shock? Did inflation rise sharply? What about the butterflies in Korea? Did they flap their\u00a0wings?<\/p>\n<p>Online Experiments are <strong>not<\/strong> Laboratory Experiments. We cannot control the environment. The economy is not under our control (sadly). We always face biases like\u00a0this.<\/p>\n<p>So, Online Experiments are not about estimating treatment effects that hold in perpetuity. They are about <strong>making decisions<\/strong>. Is A better than B? That answer is unlikely to change because of an interaction effect for the same reason that we don\u2019t usually worry about it flipping because we ran the experiment in March instead of some other month of the\u00a0year.<\/p>\n<p>For interactions to matter for decision-making, we need, say, TE \u2265 0 (so we would launch B in the first experiment) and TE(W=w) &lt; 0 (but we should have launched A given what happened in the second experiment).<\/p>\n<p>TE \u2265 0 if and only\u00a0if:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/628\/1%2ADsPXd-O8ckoGXshExQNwTw.png?ssl=1\"><\/figure>\n<p>Taking the typical allocation pr(W=w) = 0.50, this\u00a0means:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/404\/1%2AMrKN6O0lN1mq2btfLjJebw.png?ssl=1\"><\/figure>\n<p>Because TE(W=w) &lt; 0, this can only be true if TE(W=1-w) &gt; 0. Which makes sense. For interactions to be a problem for decision-making, the interaction effect has to be large enough that an experiment that is negative under one treatment is positive under the\u00a0other.<\/p>\n<p>The interaction effect has to be <em>extreme<\/em> at typical 50\u201350 allocations. If the treatment effect is +$2 per unit under one treatment, the treatment must be less than -$2 per unit under the other for interactions to affect decision-making. To make the wrong decision from the standard treatment effect, we\u2019d have to be cursed with massive interaction effects that change the sign of the treatment <em>and<\/em> maintain the same magnitude!<\/p>\n<p>This is why we\u2019re not concerned about interactions and all those other factors (seasonality, etc.) that we can\u2019t keep the same during and after the experiment. The change in environment would have to radically alter the user\u2019s experience of the feature. It probably\u00a0doesn\u2019t.<\/p>\n<p>It\u2019s always a good sign when your final take includes \u201cprobably.\u201d<\/p>\n<p>Thanks for\u00a0reading!<\/p>\n<p>Zach<\/p>\n<p>Connect at: <a href=\"https:\/\/linkedin.com\/in\/zlflynn\">https:\/\/linkedin.com\/in\/zlflynn<\/a><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=9c4012b75fbd\" width=\"1\" height=\"1\" alt=\"\"><\/p>\n<hr>\n<p><a href=\"https:\/\/towardsdatascience.com\/does-it-matter-that-online-experiments-interact-9c4012b75fbd\">Does It Matter That Online Experiments Interact?<\/a> was originally published in <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Zach Flynn<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/medium.com\/m\/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fdoes-it-matter-that-online-experiments-interact-9c4012b75fbd\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Does It Matter That Online Experiments Interact? What interactions do, why they are just like any other change in the environment post-experiment, and some reassurance Photo by Uriel Soberanes on\u00a0Unsplash Experiments do not run one at a time. At any moment, hundreds to thousands of experiments run on a mature website. The question comes up: [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,211,83,1059,238,1461],"tags":[1060,348,1462],"class_list":["post-1429","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-data-analysis","category-data-science","category-experiment","category-statistics","category-testing","tag-experiment","tag-experiments","tag-interact"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1429"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1429"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1429\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1429"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1429"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1429"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}