筛选出具有公共字段且至少有一个满足条件的行

Filter out rows with common field where at least one fulfills a condition

我有这样的数据:

Task ID Status
Task1 123 Open
Task2 123 Closed
Task3 211 Closed
Task4 211 Closed
Task5 564 Closed
Task6 994 Open

我想删除 ID 相同但状态为 'Open' 的行。换句话说,我想删除所有具有 'Open' 状态的 ID。

最终结果是这样的:

Task ID Status
Task3 211 Closed
Task4 211 Closed
Task5 564 Closed

数据:

{'Task': ['Task1', 'Task2', 'Task3', 'Task4', 'Task5', 'Task6'],
 'ID': [123, 123, 211, 211, 564, 994],
 'Status': ['Open', 'Closed', 'Closed', 'Closed', 'Closed', 'Open']}

我们可以使用打开状态和 groupby + cummax 创建布尔过滤器。

我们的想法是,如果一个状态是打开的,我们将它出现的所有行的相应 ID 标记为 True,然后我们过滤掉所有这样的行:

out = df[~df['Status'].eq('Open').groupby(df['ID']).cummax()]

输出:

    Task   ID  Status
2  Task3  211  Closed
3  Task4  211  Closed
4  Task5  564  Closed