TypeError: '<' not supported between instances of 'float' and 'str' when using shapiro test with scipy

Question

我正在尝试运行 shapiro 测试 pandas 数据框中的每一列基于列“代码”。

这是我的 df 的样子：

>>>name  code   2020-10-22   2020-10-23   2020-10-24 ...
0  a      1      0.05423      0.1254      0.1432
1  b      1      0.57289      0.0092      0.2314
2  c      2      0.1205       0.0072      0.12
3  d      3      0.3234       0.231       0.231
...

我有 80 行，有 6 个不同的代码 (1,2,3,4,5,6)。

我想运行对每一列的 Shapiro 测试，对于每个代码，例如，取 2020-10-22 的列，只取处理编号为 1 的行。 1 和运行夏皮罗测试。

我尝试使用以下循环来完成它：

shapiros=[]

for variable in df.columns[2:]:
    tmp=df[['code',variable]]
    tmp=tmp[tmp[variable].notnull()]
    
    for i in tmp.code.unique().tolist():
        shapiro_test = stats.shapiro(tmp[tmp['code'] == i])
        shapiros.append(shapiro_test)

但是我得到错误：

---> 13         shapiro_test = stats.shapiro(tmp[tmp['code'] == i])

TypeError: '<' not supported between instances of 'float' and 'str'

我看到这个错误可能是由于具有空值而发生的，但我已经使用 notnull() 摆脱了这个错误。我已经通过在每次迭代中打印“tmp”的长度来检查 notnull 是否有效，它确实发生了变化。

此外，两者的类型似乎是相同的——对象：

for variable in df.columns[2:]:
    tmp=df[['code',variable]]
    print(tmp.dtypes)
    tmp=tmp[tmp[variable].notnull()]
    
    for i in tmp.code.unique().tolist():
        print(type(i))


>>>code           object
2020-10-22    float64
dtype: object
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
...

（所有天都打印相同的内容）。

可能是什么问题？如何为每个代码的每一列计算夏皮罗？

Answer 1

您必须将列 Code 转换为 float/int 才能进行比较，根据您的代码，它目前是 str。尝试做：

df['code'] = df['code'].astype(float)

TypeError: '<' not supported between instances of 'float' and 'str' when using shapiro test with scipy

TypeError: '<' not supported between instances of 'float' and 'str' when using shapiro test with scipy

python

for-loop

scipy

pandas

scipy.stats