如果包含列表的列具有来自另一个更大列表的元素,您如何输出布尔值?
How do you output boolean if column containing lists have elements from another larger list?
我有一列,其中每一行都包含一个长度不一的字符串列表。我需要创建一个新列,其中包含一个布尔值列表(相当于原始列表),表示每个元素是否在另一个(更大的)列表中找到。
这就是我正在做的,很好,显然行不通。我基于这个问题:
How to return list of booleans to see if elements of one list in another list
data = [
[1, ["cat", "cat", "mouse"]],
[2, ["dog", "horse"]],
[3, ["cat"]],
[
4,
np.nan,
],
]
df = pd.DataFrame(data, columns=["ID", "list"])
df
main_list = ["cat", "dog", "mouse", "pig", "cow"]
df["contains_item_from_list"] = df["list"].apply(
(lambda x: [x in main_list for x in b])
)
期望的输出:
ID list contains_item_from_list
1 [cat,cat,mouse] [True, True, True]
2 [dog,horse] [True, False]
3 [cat] [True]
4 NaN [False]
你可以 explode
然后 isin
df['new'] = df['list'].explode().isin(main_list).groupby(level=0).any()
df
Out[130]:
ID list new
0 1 [cat, cat, mouse] True
1 2 [dog, horse] True
2 3 [cat] True
3 4 NaN False
更新
df['new'] = df['list'].explode().isin(main_list).groupby(level=0).agg(list)
df
Out[132]:
ID list new
0 1 [cat, cat, mouse] [True, True, True]
1 2 [dog, horse] [True, False]
2 3 [cat] [True]
3 4 NaN [False]
explode
将系列中的所有列表展平,但同一列表中的项目都共享与它们来自的列表相同的索引,因此在您使用 isin
检查之后main_list
的哪些项目在系列中,您可以使用 groupby
和 level=0
按索引的第 0(第一)级分组,然后将它们转换回列表:
df['contains_item_from_list'] = df['list'].explode().isin(main_list).groupby(level=0).apply(list)
输出:
>>> df
0 [True, True, True]
1 [True, False]
2 [True]
3 [False]
Name: list, dtype: object
您还可以应用一个函数来遍历 list
中的每个列表。这应该比分解列更快:
main_set = set(main_list)
df["contains_item_from_list"] = df['list'].apply(lambda x: [w in main_set for w in x] if isinstance(x, list) else [x in main_set])
输出:
ID list contains_item_from_list
0 1 [cat, cat, mouse] [True, True, True]
1 2 [dog, horse] [True, False]
2 3 [cat] [True]
3 4 NaN [False]
使用列表推导,简单快捷
df["contains_item_from_list"]= df['list'].fillna('xx').apply(lambda x: [val in main_list for val in x])
ID list contains_item_from_list
0 1 [cat, cat, mouse] [True, True, True]
1 2 [dog, horse] [True, False]
2 3 [cat] [True]
3 4 NaN [False]
我有一列,其中每一行都包含一个长度不一的字符串列表。我需要创建一个新列,其中包含一个布尔值列表(相当于原始列表),表示每个元素是否在另一个(更大的)列表中找到。
这就是我正在做的,很好,显然行不通。我基于这个问题: How to return list of booleans to see if elements of one list in another list
data = [
[1, ["cat", "cat", "mouse"]],
[2, ["dog", "horse"]],
[3, ["cat"]],
[
4,
np.nan,
],
]
df = pd.DataFrame(data, columns=["ID", "list"])
df
main_list = ["cat", "dog", "mouse", "pig", "cow"]
df["contains_item_from_list"] = df["list"].apply(
(lambda x: [x in main_list for x in b])
)
期望的输出:
ID list contains_item_from_list
1 [cat,cat,mouse] [True, True, True]
2 [dog,horse] [True, False]
3 [cat] [True]
4 NaN [False]
你可以 explode
然后 isin
df['new'] = df['list'].explode().isin(main_list).groupby(level=0).any()
df
Out[130]:
ID list new
0 1 [cat, cat, mouse] True
1 2 [dog, horse] True
2 3 [cat] True
3 4 NaN False
更新
df['new'] = df['list'].explode().isin(main_list).groupby(level=0).agg(list)
df
Out[132]:
ID list new
0 1 [cat, cat, mouse] [True, True, True]
1 2 [dog, horse] [True, False]
2 3 [cat] [True]
3 4 NaN [False]
explode
将系列中的所有列表展平,但同一列表中的项目都共享与它们来自的列表相同的索引,因此在您使用 isin
检查之后main_list
的哪些项目在系列中,您可以使用 groupby
和 level=0
按索引的第 0(第一)级分组,然后将它们转换回列表:
df['contains_item_from_list'] = df['list'].explode().isin(main_list).groupby(level=0).apply(list)
输出:
>>> df
0 [True, True, True]
1 [True, False]
2 [True]
3 [False]
Name: list, dtype: object
您还可以应用一个函数来遍历 list
中的每个列表。这应该比分解列更快:
main_set = set(main_list)
df["contains_item_from_list"] = df['list'].apply(lambda x: [w in main_set for w in x] if isinstance(x, list) else [x in main_set])
输出:
ID list contains_item_from_list
0 1 [cat, cat, mouse] [True, True, True]
1 2 [dog, horse] [True, False]
2 3 [cat] [True]
3 4 NaN [False]
使用列表推导,简单快捷
df["contains_item_from_list"]= df['list'].fillna('xx').apply(lambda x: [val in main_list for val in x])
ID list contains_item_from_list
0 1 [cat, cat, mouse] [True, True, True]
1 2 [dog, horse] [True, False]
2 3 [cat] [True]
3 4 NaN [False]