如果包含列表的列具有来自另一个更大列表的元素,您如何输出布尔值?

How do you output boolean if column containing lists have elements from another larger list?

我有一列,其中每一行都包含一个长度不一的字符串列表。我需要创建一个新列,其中包含一个布尔值列表(相当于原始列表),表示每个元素是否在另一个(更大的)列表中找到。

这就是我正在做的,很好,显然行不通。我基于这个问题: How to return list of booleans to see if elements of one list in another list

data = [
    [1, ["cat", "cat", "mouse"]],
    [2, ["dog", "horse"]],
    [3, ["cat"]],
    [
        4,
        np.nan,
    ],
]

df = pd.DataFrame(data, columns=["ID", "list"])
df

main_list = ["cat", "dog", "mouse", "pig", "cow"]

df["contains_item_from_list"] = df["list"].apply(
    (lambda x: [x in main_list for x in b])
)

期望的输出:

ID     list          contains_item_from_list
1  [cat,cat,mouse]      [True, True, True]
2  [dog,horse]          [True, False]
3  [cat]                [True]
4   NaN                 [False]

你可以 explode 然后 isin

df['new'] = df['list'].explode().isin(main_list).groupby(level=0).any()
df
Out[130]: 
   ID               list    new
0   1  [cat, cat, mouse]   True
1   2       [dog, horse]   True
2   3              [cat]   True
3   4                NaN  False

更新

df['new'] = df['list'].explode().isin(main_list).groupby(level=0).agg(list)
df
Out[132]: 
   ID               list                 new
0   1  [cat, cat, mouse]  [True, True, True]
1   2       [dog, horse]       [True, False]
2   3              [cat]              [True]
3   4                NaN             [False]

explode 将系列中的所有列表展平,但同一列表中的项目都共享与它们来自的列表相同的索引,因此在您使用 isin 检查之后main_list 的哪些项目在系列中,您可以使用 groupbylevel=0 按索引的第 0(第一)级分组,然后将它们转换回列表:

df['contains_item_from_list'] = df['list'].explode().isin(main_list).groupby(level=0).apply(list)

输出:

>>> df
0    [True, True, True]
1         [True, False]
2                [True]
3               [False]
Name: list, dtype: object

您还可以应用一个函数来遍历 list 中的每个列表。这应该比分解列更快:

main_set = set(main_list)
df["contains_item_from_list"] = df['list'].apply(lambda x: [w in main_set for w in x] if isinstance(x, list) else [x in main_set])

输出:

   ID               list contains_item_from_list
0   1  [cat, cat, mouse]      [True, True, True]
1   2       [dog, horse]           [True, False]
2   3              [cat]                  [True]
3   4                NaN                 [False]

使用列表推导,简单快捷

df["contains_item_from_list"]= df['list'].fillna('xx').apply(lambda x: [val in main_list for val in x])

    ID            list     contains_item_from_list
0   1  [cat, cat, mouse]      [True, True, True]
1   2       [dog, horse]           [True, False]
2   3              [cat]                  [True]
3   4                NaN                  [False]