Python 中两个比例之间差异的置信区间
Confidence interval for the difference between two proportions in Python
例如,在 AB 测试中,A 总体可能有 1000 个数据点,其中 100 个是成功的。而 B 可能有 2000 个数据点和 220 次成功。这使 A 的成功比例为 0.1,B 的成功比例为 0.11,其增量为 0.01。如何计算 python 中此增量的置信区间?
Stats 模型可以对一个样本执行此操作,但似乎没有一个包来处理 AB 测试所必需的两个样本之间的差异。 (http://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportion_confint.html)
样本量不必相等。两个比例的置信区间为
p1 和 p2 是观察到的概率,根据各自的样本 n1 和 n2 计算得出。
更多请看this white paper.
我无法从 Statsmodels 中找到此功能。但是,this website 复习了生成置信区间的数学运算,并作为以下函数的来源:
def two_proprotions_confint(success_a, size_a, success_b, size_b, significance = 0.05):
"""
A/B test for two proportions;
given a success a trial size of group A and B compute
its confidence interval;
resulting confidence interval matches R's prop.test function
Parameters
----------
success_a, success_b : int
Number of successes in each group
size_a, size_b : int
Size, or number of observations in each group
significance : float, default 0.05
Often denoted as alpha. Governs the chance of a false positive.
A significance level of 0.05 means that there is a 5% chance of
a false positive. In other words, our confidence level is
1 - 0.05 = 0.95
Returns
-------
prop_diff : float
Difference between the two proportion
confint : 1d ndarray
Confidence interval of the two proportion test
"""
prop_a = success_a / size_a
prop_b = success_b / size_b
var = prop_a * (1 - prop_a) / size_a + prop_b * (1 - prop_b) / size_b
se = np.sqrt(var)
# z critical value
confidence = 1 - significance
z = stats.norm(loc = 0, scale = 1).ppf(confidence + significance / 2)
# standard formula for the confidence interval
# point-estimtate +- z * standard-error
prop_diff = prop_b - prop_a
confint = prop_diff + np.array([-1, 1]) * z * se
return prop_diff, confint
statsmodels 包现在有 confint_proportions_2indep,它获取比较两个比例的置信区间
您可以在文档 https://www.statsmodels.org/stable/generated/statsmodels.stats.proportion.confint_proportions_2indep.html
中查看详细信息
例如,在 AB 测试中,A 总体可能有 1000 个数据点,其中 100 个是成功的。而 B 可能有 2000 个数据点和 220 次成功。这使 A 的成功比例为 0.1,B 的成功比例为 0.11,其增量为 0.01。如何计算 python 中此增量的置信区间?
Stats 模型可以对一个样本执行此操作,但似乎没有一个包来处理 AB 测试所必需的两个样本之间的差异。 (http://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportion_confint.html)
样本量不必相等。两个比例的置信区间为
p1 和 p2 是观察到的概率,根据各自的样本 n1 和 n2 计算得出。
更多请看this white paper.
我无法从 Statsmodels 中找到此功能。但是,this website 复习了生成置信区间的数学运算,并作为以下函数的来源:
def two_proprotions_confint(success_a, size_a, success_b, size_b, significance = 0.05):
"""
A/B test for two proportions;
given a success a trial size of group A and B compute
its confidence interval;
resulting confidence interval matches R's prop.test function
Parameters
----------
success_a, success_b : int
Number of successes in each group
size_a, size_b : int
Size, or number of observations in each group
significance : float, default 0.05
Often denoted as alpha. Governs the chance of a false positive.
A significance level of 0.05 means that there is a 5% chance of
a false positive. In other words, our confidence level is
1 - 0.05 = 0.95
Returns
-------
prop_diff : float
Difference between the two proportion
confint : 1d ndarray
Confidence interval of the two proportion test
"""
prop_a = success_a / size_a
prop_b = success_b / size_b
var = prop_a * (1 - prop_a) / size_a + prop_b * (1 - prop_b) / size_b
se = np.sqrt(var)
# z critical value
confidence = 1 - significance
z = stats.norm(loc = 0, scale = 1).ppf(confidence + significance / 2)
# standard formula for the confidence interval
# point-estimtate +- z * standard-error
prop_diff = prop_b - prop_a
confint = prop_diff + np.array([-1, 1]) * z * se
return prop_diff, confint
statsmodels 包现在有 confint_proportions_2indep,它获取比较两个比例的置信区间 您可以在文档 https://www.statsmodels.org/stable/generated/statsmodels.stats.proportion.confint_proportions_2indep.html
中查看详细信息