binom.cdf 和 binom_test 在 python 中的区别

Question

我是运行二项式检验，无法理解为什么这两种方法会有不同的结果。从第二个到第一个的概率是不同的。当我们计算 two-tailed p-value 时，我们应该只加倍一个尾巴吗？

from scipy.stats import binom

n, p = 50, 0.4
prob = binom.cdf(2, n, p)
first = 2*prob

from scipy import stats

second = stats.binom_test(2, n, p, alternative='two-sided')

Answer 1

When we calculate the two-tailed p-value should we just double one of the tails?

不，因为 binomial distribution 通常不是对称的。您的计算可行的一种情况是 p = 0.5.

这是 two-sided 二项式检验的可视化。对于此演示，我将使用 n=14 而不是 n=50 以使情节更清晰。

虚线画在binom.pmf(2, n, p)的高度。有助于 two-sided 二项式检验 binom_test(2, n, p, alternative='two-sided') 的概率小于或等于该值。在这个例子中，我们可以看到 k 的值是 [0, 1, 2] （这是左尾）和 [10, 11, 12, 13, 14] （这是右边的尾巴）。 two-sided 二项式检验的 p-value 应该是这些概率的总和。事实上，这就是我们发现的：

In [20]: binom.pmf([0, 1, 2, 10, 11, 12, 13, 14], n, p).sum()
Out[20]: 0.05730112258048004

In [21]: binom_test(2, n, p)
Out[21]: 0.05730112258047999

请注意，scipy.stats.binom_test 已弃用。 SciPy 1.7.0 或更高版本的用户应改用 scipy.stats.binomtest：

In [36]: from scipy.stats import binomtest

In [37]: result = binomtest(2, n, p)

In [38]: result.pvalue
Out[38]: 0.05730112258047999

这是生成情节的脚本：

import numpy as np
from scipy.stats import binom
import matplotlib.pyplot as plt


n = 14
p = 0.4

k = np.arange(0, n+1)

plt.plot(k, binom.pmf(k, n, p), 'o')
plt.xlabel('k')
plt.ylabel('pmf(k, n, p)')
plt.title(f'Binomial distribution with {n=}, {p=}')
ax = plt.gca()
ax.set_xticks(k[::2])

pmf2 = binom.pmf(2, n, p)
plt.axhline(pmf2, linestyle='--', color='k', alpha=0.5)

plt.grid(True)
plt.show()

binom.cdf 和 binom_test 在 python 中的区别

Difference in binom.cdf and binom_test in python

python

statistics

scipy

binomial-cdf