Python 3.+, Scipy 统计模式函数给出类型错误不可排序的类型:str() > float()

Python 3.+, Scipy Stats Mode function gives Type Error unorderable types: str() > float()

我正在尝试解决 kaggle titanic disaster 问题,特别是使用 mode/ mean/ median 来输入缺失值。这是我的数据集的峰值

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  

我正在尝试获取 'Embarked' 列的模式并键入 'Object'。我正在使用 python3。这是代码片段:

modeEmbarked = mode(df.Embarked)

这是错误片段:

<ipython-input-39-1b4237d65022> in clean(df)
     18 
     19     # Cleaning Embarked column
---> 20     modeEmbarked = mode(df.Embarked)
     21 #     print(mode(df.Embarked))
     22 #     le_embarked = preprocessing.LabelEncoder()

/home/singhaniya/anaconda3/lib/python3.5/site-packages/scipy/stats/stats.py in mode(a, axis)
    635     return np.array([]), np.array([])
    636 
--> 637     scores = np.unique(np.ravel(a))       # get ALL unique values
    638     testshape = list(a.shape)
    639     testshape[axis] = 1

/home/singhaniya/anaconda3/lib/python3.5/site-packages/numpy/lib/arraysetops.py in unique(ar, return_index, return_inverse, return_counts)
    196         aux = ar[perm]
    197     else:
--> 198         ar.sort()
    199         aux = ar
    200     flag = np.concatenate(([True], aux[1:] != aux[:-1]))

TypeError: unorderable types: str() > float()

这是因为您在 df.Embarked 中有混合类型。确保所有项目都是相同类型(或可以比较的类型)。

或者使用Series.mode(),可以处理混合类型。

modeEmbarked = mode(df.Embarked.dropna())

用这个代替

modeEmbarked = mode(df.Embarked)

解决问题。