This creates a lot of inconsistency and people being confused about their experiments not being rolled out. Then, the PM questions why their colleague rolled out an experiment before it reached the required sample size. Often, data scientists get asked why we cannot roll out the winning variant since the dashboard is “all green.” Then, the data scientist has to explain that the experiment has not reached the required sample size and that if the experiment is rolled out, it could actually have a negative effect on users. In some experimentation dashboards, the statistical quantities (confidence intervals and p values) are not hidden from users even for fixed horizon testing. With sequential testing, the data scientist can always give valid confidence intervals and p-values to the PM at any time during the experiment. With fixed horizon testing, the data scientist cannot say anything statistically (confidence intervals or p values) about the experiment and can only say this is the number of exposed users and this is the treatment mean and control mean. Often, a PM will ask a data scientist how an experiment is doing a couple of days after the experiment has started. Naturally, as humans, we want to keep peeking at the data and roll out features that help our customer base as quickly as possible. By peeking often, we can decrease the experiment duration if the effect size is much bigger than the minimum detectable effect (MDE). In the fixed horizon framework, this should not be done as you will increase the false positive rate. The consequence of this is that we can do what all product managers (PM) want to do, which is “run a test until it is statistically significant and then stop.” It is similar to the “set it and forget it” approach with target-date funds. Also, you do not have to decide before the test starts how many times you are going to peek like you have to do with a grouped sequential test. The specific version of sequential testing that we use at Amplitude, called mixture Sequential Probability Ratio Test (mSPRT), allows you to peek as many times as you want. The advantage of sequential testing is that you can peek several times. Peeking several times → end experiment earlier Sequential testing advantagesįirst, we will explore the advantages of sequential testing. There are pros and cons for each approach, and it is not a case where one method is always better than the other. Note: Throughout this post, when we say T-test, we are referring to the fixed horizon T-test. In this technical post, we will explain the pros and cons of the sequential test and fixed horizon T-test. We envision several customers asking “How do I know what test to pick?” A big component of causality is a statistical analysis of experimentation data.Īt Amplitude, we have recently released a fixed horizon T-test in addition to sequential testing, which we have had since the beginning of Experiment. Now, data-driven companies use experimentation to make decision-making more objective. You are able to make statements like “changing caused conversion to increase by 5%.” Without experimentation, a more common approach is to make changes based on domain knowledge or select customer requests. Experimentation helps product teams make better decisions based on causality instead of correlations.
0 Comments
Leave a Reply. |