有以下两种错误:
通常,type 1 error is more important!因此我们type 2 error就是在“委曲求全”:
The probability of Type II error can be adjusted to the desired value by changing the size of the groups or by reducing the variance in the data.
定律:The larger the group size, the lower the variance, the smaller the probability of Type II error.
Formula:
接下来进入checking for correctness的具体步骤:
Estimate the required group size:(code如下)
By conducting 1000 experiments and calculating the proportion of type II errors, we obtain a point estimate of the probability of type II error.
Then, using numerical synthetic A/A and A/B experiments, we will estimate error probabilities and construct confidence intervals.
根据输出结果:Estimates of error probabilities are approximately equal to 0.1 and 0.2, as they should be. Everything is correct, the Student’s test on this data works correctly.
接下来我们看另一个指标:Distribution of p-values,定义如下:
任何significance level都应该遵循上图和以下的情况:
Answer:NO NO NO NO NO!
每次都需要做test,比如这个数据跑出来就有问题!
We obtained an estimate of the probability of type I error of about 0.25, which is much higher than the significance level of 0.1. The graph shows that the distribution of p-values for synthetic A/A tests is not uniform and deviates from the diagonal. In this example, the Student’s t-test is incorrect because the data are dependent (the costs of purchases by one person are dependent). If we had not immediately realized the dependence of the data, the estimation of error probabilities would have helped us understand that such a test is incorrect.
最终的大总结:(acceptable probability & p-value)