IndexError: index 0 is out of bounds for axis 0 with size 0 for trying to find mode (most frequent value)

Question

我连接了 500 个 XSLX 文件，其形状为 (672006, 12)。所有的进程都有一个唯一的编号，我想对这些数据进行groupby()获取相关信息。对于温度，我想 select 第一个和数字最频繁的值。

测试数据：

df_test = 
pd.DataFrame({"number": [1,1,1,1,2,2,2,2,3,3,3,3], 
'temperature': [2,3,4,5,4,3,4,5,5, 3, 4, 4], 
'height': [100, 100, 0, 100, 100, 90, 90, 100, 100, 90, 80, 80]})

df_test.groupby('number')['temperature'].first()

df_test.groupby('number')['height'].agg(lambda x: x.value_counts().index[0])

我在尝试获取每个数字的最频繁高度时收到以下错误： IndexError：索引 0 超出轴 0 的范围，大小为 0

很奇怪，mean() / first() / max() 等都在工作。在我单独连接的数据集的第二部分，聚合起作用了。

有人可以建议如何处理这个错误吗？谢谢！

Answer 1

我认为您的问题是您的一个或多个 groupby 返回所有 NaN 高度：

查看此示例，其中我添加了一个数字 4，其高度为 np.NaN。

df_test = pd.DataFrame({"number": [1,1,1,1,2,2,2,2,3,3,3,3,4,4], 
'temperature': [2,3,4,5,4,3,4,5,5, 3, 4, 4, 5, 5], 
'height': [100, 100, 0, 100, 100, 90, 90, 100, 100, 90, 80, 80, np.nan, np.nan]})

df_test.groupby('number')['temperature'].first()

df_test.groupby('number')['height'].agg(lambda x: x.value_counts().index[0])

输出：

IndexError: index 0 is out of bounds for axis 0 with size 0

让我们用零填充这些 NaN，然后重新运行。

df_test = pd.DataFrame({"number": [1,1,1,1,2,2,2,2,3,3,3,3,4,4], 
'temperature': [2,3,4,5,4,3,4,5,5, 3, 4, 4, 5, 5], 
'height': [100, 100, 0, 100, 100, 90, 90, 100, 100, 90, 80, 80, np.nan, np.nan]})

df_test = df_test.fillna(0) #Add this line
df_test.groupby('number')['temperature'].first()

df_test.groupby('number')['height'].agg(lambda x: x.value_counts().index[0])

输出：

number
1    100.0
2     90.0
3     80.0
4      0.0
Name: height, dtype: float64

IndexError: index 0 is out of bounds for axis 0 with size 0 for trying to find mode (most frequent value)

IndexError: index 0 is out of bounds for axis 0 with size 0 for trying to find mode (most frequent value)

python

pandas

index-error