如何根据任一列中的 2 个变量删除数据框中的行

Question

我有一个包含 3 列（邮政编码、自治市镇和社区）的数据集，设置如下：

    df = pd.DataFrame({'Postcode' : ['M1', 'M2', 'M3', 'M4', 'M5'], 
            'Borough' : ['Ottawa', 'Not assigned', 'Montreal', 'Toronto', 'Kent'],
               'Neighbourhood' : ['Ottawa', 'Toronto', 'Montreal', 'Barrhaven', 'Not assigned']})

看起来像这样：

Borough 和 Neighbourhood 列中的值可以是 "Not assigned" 或有效文本 - 值 "Not assigned" 可以在两个单元格中，也可以在一个或另一个单元格中。

我想要做的是删除整个数据集中任何一列中有 "Not assigned" 的行。

我是 Python 的新手......我想我会尝试根据其中一个单元格的值创建一个额外的列，给出 True 或 False，所以我尝试了这个......

    df['Outcome'] = ["True" if x =='Not assigned' else "False" for x in df['Borough']]

...成功添加了一个额外的列

然后我想我会尝试使用 drop() 函数删除那些 TRUE 行，然后在 Neighborhood 列上重复该过程。但这似乎是一种混乱的方式，我最终会得到 20 行代码，我相信它可以更有效地完成。

有人可以告诉我删除这些行的最简单方法吗？

Answer 1

您可以使用按位 "or" |:

df_filtered = df[~((df['Borough'] == 'Not assigned') | 
                   (df['Neighbourhood'] == 'Not assigned'))]

您的示例数据集的结果是：

  Postcode   Borough Neighbourhood
0       M1    Ottawa        Ottawa
2       M3  Montreal      Montreal
3       M4   Toronto     Barrhaven

Answer 2

尝试：

df = df[~(df['Borough'].eq('Not assigned') | df['Borough'].eq('Not assigned'))]

  Postcode   Borough Neighbourhood
0       M1    Ottawa        Ottawa
2       M3  Montreal      Montreal
3       M4   Toronto     Barrhaven

Answer 3

我们可以使用DataFrame.ne + DataFrame.all with axis = 1 to performance a boolean indexing:

df_filtered = df[df[['Borough','Neighbourhood']].ne('Not assigned').all(axis=1)]
print(df_filtered)

输出

  Postcode   Borough Neighbourhood
0       M1    Ottawa        Ottawa
2       M3  Montreal      Montreal
3       M4   Toronto     Barrhaven

如何根据任一列中的 2 个变量删除数据框中的行

How to delete rows in a dataframe based on 2 variables in either column

python

python-3.x

pandas

python-3.6