多列的类别条件

Question

我有如下数据集：

ID  A1      A2
0   A123     1234
1   1234     5568
2   5568     NaN
3   Zabc     NaN
4   3456     3456
5   3456     3456
6   NaN    NaN
7   NaN    NaN

目的是遍历每一列（A1 和 A2），确定两列的空白位置（如第 6 行和第 7 行），创建一个新列并将其分类为“A1 和 A2 均为空白”

我使用了下面的代码：

df['Z_Tax No Not Mapped'] = np.NaN

df['Z_Tax No Not Mapped'] = np.where((df['A1'] == np.NaN) & (df['A2'] == np.NaN), 1, 0)

然而，输出将新列 'Z_Tax No Not Mapped' 下的所有行捕获为 0，但数据中有两列均为空白的实例。不确定我在哪里犯了错误来过滤这种情况。

注意：A1 和 A2 列有时是字母数字或只是数字。

想法是将类别作为“ID 未更新”或“ID 已更新”放置在单独的列中，这样通过在“ID 未更新”上放置一个简单的过滤器，我们就可以识别空白的案例在两列中。

Answer 1

df.loc[df.isna().all(axis=1), "Z_Tax No Not Mapped"] = "Both A1 and A2 are blank"

Answer 2

使用 DataFrame.isna with DataFrame.all 测试所有列是否为 Trues - 缺失值：

df['Z_Tax No Not Mapped'] = np.where(df[['A1','A2']].isna().all(axis=1),
                                     'Both A1 and A2 are blank', 
                                      '')

多列的类别条件

Category Condition on multiple columns

python

numpy

categories

dataframe

pandas