Pandas 枢轴 table 给出 "FutureWarning: Sorting because non-concatenation axis is not aligned"

Question

我有以下数据框：

df = pd.DataFrame({'category':['A', 'B', 'C', 'C'], 
                   'bar':[2, 5, float('nan'), float('nan')]})

然后我只有一行代码，我试图在我的 DataFrame 中的一列上应用两个聚合函数，并按另一列中的值分组：

df.pivot_table('bar', 'category', aggfunc=['median', 'count'])

出于某种原因，它给了我以下警告：

FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. To accept the future behavior, pass 'sort=False'. To retain the current behavior and silence the warning, pass 'sort=True'.

根据我对这个警告的理解，它涉及 "concat()" 或 "append()" 方法。我调用了这两个，所以我只能假设两者之一在 "pivot_table()" 方法内的某处隐式使用。我很乐意传递 "sort" 参数来消除警告，但如果隐式调用该方法，我看不到这样做的方法。

我运行对此示例进行了一些测试，看起来只有在满足以下所有三个条件时才会出现警告：

1) 聚合的值中至少有一组完全由缺失值组成；

2)至少有两个聚合函数；

3) 其中一个聚合函数是"count()".

我目前的工作理论是，两个聚合函数无法就生成的数据透视表 table 应该有多少行达成一致。 "count()" 函数在所有完全由缺失值组成的组中置零。但是其他函数完全忽略了这些组，因此当 "count()" 不存在时，相应的行从主元 table 中简单地丢失了。但是，当存在 "count()" 时，它会强制其他函数不忽略这些组，并在各个单元格中创建 NaN 值。

这个结果对我来说很好，我可以使用它，但我不喜欢让警告无人看管。关于可以做些什么的想法？

Answer 1

我能够在 pandas 0.25.1 上重现该问题，减弱与 pandas.core.reshape.pivot.py 有关，其中包括以下语句

# line 56
return concat(pieces, keys=keys, axis=1)

Concat 导致警告。 pieces 是一个数据帧列表，其中每个元素都与参数 aggfunc 中的每个函数相关，发生的情况如下：

pieces[0]
#           bar
# category     
# A         2.0
# B         5.0

pieces[1]
#           bar
# category     
# A           1
# B           1
# C           0

由于 pieces[0] 和 pieces[1] 具有不同的索引，pandas 需要对数据帧进行排序以匹配值。

此问题不会在 1.0.1 中发生。如果您不想显示警告，请添加参数 dropna=False，以便 NaN 列全部包含在 aggfunction 中。

df.pivot_table('bar', 'category', aggfunc=['median', 'count'], dropna=False)

小心，有些函数不适合与 nan 值一起使用，numpy 包含许多处理 nan 的函数，例如 np.nanmedian and np.nanmax 考虑检查这些函数。

Answer 2

jcaliz 的解决方案是正确的，但我最终在应用 "pivot_table()" 之前简单地过滤了 DataFrame。在我的例子中，我还想获得组的大小，因此直接过滤 NaN 值并不是我想要的。我最终这样做了：

not_na = df.groupby('category')['bar'].count()
not_na = not_na[not_na > 0]
pivot = df[df['category'].isin(not_na.index)].pivot_table('bar', 'category', aggfunc=['median', 'size'])
pivot['count'] = not_na

Pandas 枢轴 table 给出 "FutureWarning: Sorting because non-concatenation axis is not aligned"

Pandas pivot table gives "FutureWarning: Sorting because non-concatenation axis is not aligned"

python

warnings

concat

pandas