Python: Chi 2 检验产生错误结果 (chi2_contingency)
Python: Chi 2 test produces wrong results (chi2_contingency)
我正在尝试使用意外事件 table 计算 python 中的卡方值。这是一个例子。
+--------+------+------+
| | Cat1 | Cat2 |
+--------+------+------+
| Group1 | 80 | 120 |
| Group2 | 420 | 380 |
+--------+------+------+
预期值为:
+--------+------+------+
| | Cat1 | Cat2 |
+--------+------+------+
| Group1 | 100 | 100 |
| Group2 | 400 | 400 |
+--------+------+------+
如果我手动计算卡方值,我得到 10。然而,使用 python 我得到 9.506。
我使用以下代码:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
import scipy
# Some fake data.
n = 5 # Number of samples.
d = 3 # Dimensionality.
c = 2 # Number of categories.
data = np.random.randint(c, size=(n, d))
data = pd.DataFrame(data, columns=['CAT1', 'CAT2', 'CAT3'])
# Contingency table.
contingency = pd.crosstab(data['CAT1'], data['CAT2'])
contingency.iloc[0][0]=80
contingency.iloc[0][1]=120
contingency.iloc[1][0]=420
contingency.iloc[1][1]=380
# Chi-square test of independence.
chi, p, dof, expected = chi2_contingency(contingency)
奇怪的是函数给出了正确的预期值,但是卡方和 p 值不对。我在这里做错了什么?
谢谢
p.s.
我知道我在 pandas 中创建初始 table 非常蹩脚,但我不是如何在 [=32] 中创建这些嵌套 table 的专家=].
来自文档:
correction : bool, optional
If True, and the degrees of freedom is 1, apply Yates’ correction for continuity.
The effect of the correction is to adjust each observed value by 0.5 towards
the corresponding expected value.
自由度为 1。如果将校正设置为 False,您将得到 10。
chi2_contingency(contingency, correction=False)
>>> (10.0, 0.001565402258002549, 1, array([[ 100., 100.],
[ 400., 400.]]))
我正在尝试使用意外事件 table 计算 python 中的卡方值。这是一个例子。
+--------+------+------+
| | Cat1 | Cat2 |
+--------+------+------+
| Group1 | 80 | 120 |
| Group2 | 420 | 380 |
+--------+------+------+
预期值为:
+--------+------+------+
| | Cat1 | Cat2 |
+--------+------+------+
| Group1 | 100 | 100 |
| Group2 | 400 | 400 |
+--------+------+------+
如果我手动计算卡方值,我得到 10。然而,使用 python 我得到 9.506。 我使用以下代码:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
import scipy
# Some fake data.
n = 5 # Number of samples.
d = 3 # Dimensionality.
c = 2 # Number of categories.
data = np.random.randint(c, size=(n, d))
data = pd.DataFrame(data, columns=['CAT1', 'CAT2', 'CAT3'])
# Contingency table.
contingency = pd.crosstab(data['CAT1'], data['CAT2'])
contingency.iloc[0][0]=80
contingency.iloc[0][1]=120
contingency.iloc[1][0]=420
contingency.iloc[1][1]=380
# Chi-square test of independence.
chi, p, dof, expected = chi2_contingency(contingency)
奇怪的是函数给出了正确的预期值,但是卡方和 p 值不对。我在这里做错了什么?
谢谢
p.s.
我知道我在 pandas 中创建初始 table 非常蹩脚,但我不是如何在 [=32] 中创建这些嵌套 table 的专家=].
来自文档:
correction : bool, optional
If True, and the degrees of freedom is 1, apply Yates’ correction for continuity.
The effect of the correction is to adjust each observed value by 0.5 towards
the corresponding expected value.
自由度为 1。如果将校正设置为 False,您将得到 10。
chi2_contingency(contingency, correction=False)
>>> (10.0, 0.001565402258002549, 1, array([[ 100., 100.],
[ 400., 400.]]))