我正在尝试使用统计模型计算 p_value
I am trying to calculate the p_value using the stats model
这是我获取所需号码的代码
import statsmodels.api as sm
from statsmodels.stats.proportion import proportions_ztest
convert_old = len(df2[df2['group'] == 'control']['converted'] == 1)
convert_new = len(df2[df2['group'] == 'treatment']['converted'] == 1)
n_old = len(df2[df2['group'] == 'control'])
n_new = len(df2[df2['group'] == 'treatment'])
实际型号为:
stat, pval = proportions_ztest([convert_new ,convert_old], [n_new, n_old])
我得到了这个结果:
p值为:nan
我也收到警告:
/opt/conda/lib/python3.6/site-packages/statsmodels/stats/weightstats.py:670:
RuntimeWarning: invalid value encountered in double_scalars
zstat = value / std_diff
/opt/conda/lib/python3.6/site-packages/statsmodels/stats/weightstats.py:672:
RuntimeWarning: invalid value encountered in absolute
pvalue = stats.norm.sf(np.abs(zstat))*2
我认为问题在于如何获得 convert_old
和 convert_new
的数字。通过设置 ['converted'] == 1
,您将根据每个单独的值获得一个带有 True/False 的系列,因此长度不会受到影响,并且您将始终拥有相同的长度。为了获得合适的长度,您可以尝试:
convert_old = len(df2[(df2['group'] == 'control') & (df2['converted'] == 1)]
convert_new = len(df2[(df2['group'] == 'treatment') & (df2['converted'] == 1)]
这是我获取所需号码的代码
import statsmodels.api as sm
from statsmodels.stats.proportion import proportions_ztest
convert_old = len(df2[df2['group'] == 'control']['converted'] == 1)
convert_new = len(df2[df2['group'] == 'treatment']['converted'] == 1)
n_old = len(df2[df2['group'] == 'control'])
n_new = len(df2[df2['group'] == 'treatment'])
实际型号为:
stat, pval = proportions_ztest([convert_new ,convert_old], [n_new, n_old])
我得到了这个结果:
p值为:nan
我也收到警告:
/opt/conda/lib/python3.6/site-packages/statsmodels/stats/weightstats.py:670:
RuntimeWarning: invalid value encountered in double_scalars
zstat = value / std_diff
/opt/conda/lib/python3.6/site-packages/statsmodels/stats/weightstats.py:672:
RuntimeWarning: invalid value encountered in absolute
pvalue = stats.norm.sf(np.abs(zstat))*2
我认为问题在于如何获得 convert_old
和 convert_new
的数字。通过设置 ['converted'] == 1
,您将根据每个单独的值获得一个带有 True/False 的系列,因此长度不会受到影响,并且您将始终拥有相同的长度。为了获得合适的长度,您可以尝试:
convert_old = len(df2[(df2['group'] == 'control') & (df2['converted'] == 1)]
convert_new = len(df2[(df2['group'] == 'treatment') & (df2['converted'] == 1)]