保留不在列表中的行

Question

我有一个包含销售和报价的数据框。

df  offer                       sales
0   £10 off appple               10
1   £10 off apple and samsung    20

我有一个要避免的报价列表，在本示例中只有 1 个报价。

remove_these_offers_list = ["£10 off appple"]

当我尝试使用 df.loc[~(df.offer.isin(remove_these_offers_list))] 我得到一个空的 df，因为字符串在技术上包含在两行中。

预期输出

df  offer                        sales
1   £10 off apple and samsung     20

Answer 1

尝试使用 str.strip() 对白色 space 进行条纹处理：

df=df.loc[~(df['offer'].str.strip().isin(remove_these_offers_list))]

或

由于您提到的方法正在通过 str.fullmatch() 以另一种方式起作用：

df=df.loc[~df['offer'].str.fullmatch('|'.join(remove_these_offers_list))]

df的输出：

    df  offer                       sales
1   1   £10 off apple and samsung   20

Answer 2

你可以这样做：

df[~df['offer'].isin(remove_these_offers_list)]

isin 应该应用于列表，而不是包含的字符串，因此只有完全匹配才能工作

Keeps rows that aren't in list