使用 scipy 进行 T 测试返回 NaN 错误代码,即使没有 NaN 值存在

Using scipy for T-testing returning NaN error code, even with no NaN values present

我正在编写一个代码,该代码将根据不同性别的身高(1 和 0 用于区分 excel sheet 中的性别)。我还需要使用不同的高度范围,所以前 10 个,然后是前 20 个,最后是前 30 个高度。但是,它总是 returns“nan”而不是数字,即使我写“nan_policy='omit'”也是如此。我知道这里的一个用户遇到了同样的问题但是,我,他使用的是 pandas,而我不是。我正在使用 spyder4 和最新版本的 Anaconda。我还使用 python 3.8.3 版和 scipy 的 1.5.0 版。这是代码:

import numpy as np
import scipy.stats

array = np.loadtxt(r'C:\filepath\Body-Data.csv', skiprows = 1, delimiter=',' )

slice10 = slice(0,10)
slice20 = slice(0,20)
slice30 = slice(0,30)

men_height = []
women_height = []

for i in range(8239):
    if array[i,0] == 0:
        women_height.append(array[i,2])
    elif array[i,0] == 1:
        men_height.append(array[i,2])
        
w_height10 = women_height[slice10]
w_height20 = women_height[slice20]
w_height30 = women_height[slice30]

m_height10 = men_height[slice10]
m_height20 = men_height[slice20]
m_height30 = men_height[slice30]

w_mean10 = np.mean(w_height10) 
w_mean20 = np.mean(w_height20) 
w_mean30 = np.mean(w_height30) 
    
m_mean10 = np.mean(m_height10)
m_mean20 = np.mean(m_height20)
m_mean30 = np.mean(m_height30)

t_statistic1, p_value1 = scipy.stats.ttest_ind(m_mean10, w_mean10, nan_policy='omit')
print("this is the t-statistic for the first 10 heights of women and men: \n", t_statistic1)
print("this is the p-value for the first 10 heights of women and men: \n", p_value1)


t_statistic2, p_value2 = scipy.stats.ttest_ind(m_mean20, w_mean20, nan_policy='omit')
print("this is the t-statistic for the first 20 heights of women and men: \n", t_statistic2)
print("this is the p-value for the first 20 heights of women and men: \n", p_value2)


t_statistic3, p_value3 = scipy.stats.ttest_ind(m_mean30, w_mean30, nan_policy='omit')
print("this is the t-statistic for the first 30 heights of women and men: \n", t_statistic3)
print("this is the p-value for the first 30 heights of women and men: \n", p_value3)

我的输出是:

this is the t-statistic for the first 10 heights of women and men: 
 nan

this is the p-value for the first 10 heights of women and men: 
 nan

this is the t-statistic for the first 20 heights of women and men: 
 nan

this is the p-value for the first 20 heights of women and men: 
 nan

this is the t-statistic for the first 30 heights of women and men: 
 nan

this is the p-value for the first 30 heights of women and men: 
 nan

scipy.stats.ttest_ind的前两个参数必须是要比较的数据集,而不是数据集的均值。更改此行

t_statistic1, p_value1 = scipy.stats.ttest_ind(m_mean10, w_mean10, nan_policy='omit')

t_statistic1, p_value1 = scipy.stats.ttest_ind(m_height10, w_height10, nan_policy='omit')

(如果输入中没有nan,可以去掉参数nan_policy='omit'。)

有关计算 t 统计量的其他变体,请参阅 Perform 2 sample t-test