应用于 python 中的列或整个数据框时，函数 any 不一致

Question

我有一个可能包含 NaN 值的数据框。

array = np.empty((4,5))
array[:] = 10
df = pd.DataFrame(array)
df.iloc[1,3] = np.NaN

df.isna().apply(lambda x: any(x), axis = 0)

输出：

0    False
1    False
2    False
3     True
4    False
dtype: bool

当我运行:

any(df.isna())

它returns:

True

如果没有 NaN：

array = np.empty((4,5))
array[:] = 10
df = pd.DataFrame(array)
#df.iloc[1,3] = np.NaN

df.isna().apply(lambda x: any(x), axis = 0)

0    False
1    False
2    False
3    False
4    False
dtype: bool

然而当我运行:

any(df.isna())

它returns:

True

为什么会这样？我对any()函数有什么误解吗？

Answer 1

Why this is the case? Do I have any misunderstanding of the function any()?

当您遍历 DataFrame 时，您实际上是在遍历其列标签，而不是您可能认为的行或值。更准确地说，for 循环调用 Dataframe.__iter__，其中 returns 是 DataFrame 列标签上的迭代器。例如，在下面

df = pd.DataFrame(columns=['a', 'b', 'c'])
for x in df:
    print(x)

# Output:
#
# a
# b
# c

x 包含每个 df 列的名称。您还可以看到 list(df).

的输出是什么

这意味着当您执行 any(df.isna()) 时，any 实际上是遍历 df 的列标签并检查它们的真实性。如果至少有一个是真实的 returns True.

在您的两个示例中，列标签都是数字 list(df.isna()) = list(df.columns) = [0, 1, 2, 3]，其中只有 0 是 Falsy 值。因此，在这两种情况下 any(df.isna()) = True.

解决方案

解决方案是使用 DataFrame.any 和 axis=None 而不是使用 built-in any 函数。

df.isna().any(axis=None)

应用于 python 中的列或整个数据框时，函数 any 不一致

function any is not consistent when applied on columns or the whole dataframe in python

python

nan

pandas