Table 观察到的计数和预期计数之间的差异
Table of differences between observed and expected counts
我有数据,我正在为一个二元因变量建模。还有 5 个其他分类预测变量,我对每个变量和因变量进行了独立性卡方检验。所有人都得出了非常低的 p 值。
现在,我想创建一个图表来显示观察到的计数和预期计数之间的所有差异。这似乎应该是 scipy chi2_contingency 函数的一部分,但我无法弄清楚。
我唯一能想到的是 chi2_contingency 函数会输出一个预期计数数组,所以我想我需要弄清楚如何转换我的观察交叉表 table计数到一个数组中,然后将两者相减。
## Gender & Income: cross-tabulation table and chi-square
ct_sex_income=pd.crosstab(adult_df.sex, adult_df.income, margins=True)
ct_sex_income
## Run Chi-Square test
scipy.stats.chi2_contingency(ct_sex_income)
## try to subtract them
ct_sex_income.observed - chi2_contingency(ct_sex_income)[4]
我得到的错误是 "AttributeError: 'DataFrame' object has no attribute 'observed'"
我只想要一个显示差异的数组。
TIA 寻求帮助
我不知道你的数据,也不知道你的观察函数是如何定义的。我不太明白你的意图,可能是关于根据人们的婚姻状况预测他们的收入。
我在这里发布了一种可能的解决方案来解决您的问题。
import pandas as pd
import numpy as np
import scipy.stats as stats
from scipy.stats import chi2_contingency
# some bogus data
data = [['single','30k-35k'],['divorced','40k-45k'],['married','25k-30k'],
['single','25k-30k'],['married','40k-45k'],['divorced','40k-35k'],
['single','30k-35k'],['married','30k-35k'],['divorced','30k-35k'],
['single','30k-35k'],['married','40k-45k'],['divorced','25k-30k'],
['single','40k-45k'],['married','30k-35k'],['divorced','30k-35k'],
]
adult_df = pd.DataFrame(data,columns=['marital','income'])
X = adult_df['marital'] #variable
Y = adult_df['income'] #prediction
dfObserved = pd.crosstab(Y,X)
results = []
#Chi-Statistic, P-Value, Degrees of Freedom and the expected frequencies
results = stats.chi2_contingency(dfObserved.values)
chi2 = results[0]
pv = results[1]
free = results[2]
efreq = results[3]
dfExpected = pd.DataFrame(efreq, columns=dfObserved.columns, index = dfObserved.index)
print(dfExpected)
"""
marital divorced married single
income
25k-30k 1.000000 1.000000 1.000000
30k-35k 2.333333 2.333333 2.333333
40k-35k 0.333333 0.333333 0.333333
40k-45k 1.333333 1.333333 1.333333
"""
print(dfObserved)
"""
marital divorced married single
income
25k-30k 1 1 1
30k-35k 2 2 3
40k-35k 1 0 0
40k-45k 1 2 1
"""
difference = dfObserved - dfExpected
print(difference)
""""
marital divorced married single
income
25k-30k 0.000000 0.000000 0.000000
30k-35k -0.333333 -0.333333 0.666667
40k-35k 0.666667 -0.333333 -0.333333
40k-45k -0.333333 0.666667 -0.333333
"""
希望对你有帮助
我有数据,我正在为一个二元因变量建模。还有 5 个其他分类预测变量,我对每个变量和因变量进行了独立性卡方检验。所有人都得出了非常低的 p 值。
现在,我想创建一个图表来显示观察到的计数和预期计数之间的所有差异。这似乎应该是 scipy chi2_contingency 函数的一部分,但我无法弄清楚。
我唯一能想到的是 chi2_contingency 函数会输出一个预期计数数组,所以我想我需要弄清楚如何转换我的观察交叉表 table计数到一个数组中,然后将两者相减。
## Gender & Income: cross-tabulation table and chi-square
ct_sex_income=pd.crosstab(adult_df.sex, adult_df.income, margins=True)
ct_sex_income
## Run Chi-Square test
scipy.stats.chi2_contingency(ct_sex_income)
## try to subtract them
ct_sex_income.observed - chi2_contingency(ct_sex_income)[4]
我得到的错误是 "AttributeError: 'DataFrame' object has no attribute 'observed'"
我只想要一个显示差异的数组。
TIA 寻求帮助
我不知道你的数据,也不知道你的观察函数是如何定义的。我不太明白你的意图,可能是关于根据人们的婚姻状况预测他们的收入。
我在这里发布了一种可能的解决方案来解决您的问题。
import pandas as pd
import numpy as np
import scipy.stats as stats
from scipy.stats import chi2_contingency
# some bogus data
data = [['single','30k-35k'],['divorced','40k-45k'],['married','25k-30k'],
['single','25k-30k'],['married','40k-45k'],['divorced','40k-35k'],
['single','30k-35k'],['married','30k-35k'],['divorced','30k-35k'],
['single','30k-35k'],['married','40k-45k'],['divorced','25k-30k'],
['single','40k-45k'],['married','30k-35k'],['divorced','30k-35k'],
]
adult_df = pd.DataFrame(data,columns=['marital','income'])
X = adult_df['marital'] #variable
Y = adult_df['income'] #prediction
dfObserved = pd.crosstab(Y,X)
results = []
#Chi-Statistic, P-Value, Degrees of Freedom and the expected frequencies
results = stats.chi2_contingency(dfObserved.values)
chi2 = results[0]
pv = results[1]
free = results[2]
efreq = results[3]
dfExpected = pd.DataFrame(efreq, columns=dfObserved.columns, index = dfObserved.index)
print(dfExpected)
"""
marital divorced married single
income
25k-30k 1.000000 1.000000 1.000000
30k-35k 2.333333 2.333333 2.333333
40k-35k 0.333333 0.333333 0.333333
40k-45k 1.333333 1.333333 1.333333
"""
print(dfObserved)
"""
marital divorced married single
income
25k-30k 1 1 1
30k-35k 2 2 3
40k-35k 1 0 0
40k-45k 1 2 1
"""
difference = dfObserved - dfExpected
print(difference)
""""
marital divorced married single
income
25k-30k 0.000000 0.000000 0.000000
30k-35k -0.333333 -0.333333 0.666667
40k-35k 0.666667 -0.333333 -0.333333
40k-45k -0.333333 0.666667 -0.333333
"""
希望对你有帮助