如何生成两个数据框之间不共享的项目列表
How Do I Generate a List of Items Not Shared Between Two Dataframes
基本上,我有一个列表,其中包含一堆按颜色分类的独特项目(项目)。我做了一些事情并生成了一个数据框,其中包含这些独特项目(组合)的选定组合。我的目标是列出原始列表中未出现在组合数据框中的项目。理想情况下,我想检查所有四个颜色列,但对于我的初始测试,我只选择了“红色”列。
import pandas as pd
Items = pd.DataFrame({'Id': ["6917529336306454104","6917529268375577150","6917529175831101427","6917529351156928903","6917529249201580539","6917529246740186376","6917529286870790429","6917529212665335174","6917529206310658443","6917529207434353786","6917529309798817021","6917529352287607192","6917529268327711171","6917529316674574229"
],'Type': ['Red','Blue','Green','Cyan','Red','Blue','Blue','Blue','Blue','Green','Green','Green','Cyan','Cyan']})
Items = Items.set_index('Id', drop=True)
#Do stuff
Combinations = pd.DataFrame({
'Red': ["6917529336306454104","6917529336306454104","6917529336306454104","6917529336306454104"],
'Blue': ["6917529268375577150","6917529286870790429","6917529206310658443","6917529206310658443"],
'Green': ["6917529175831101427","6917529207434353786","6917529309798817021","6917529309798817021"],
'Cyan': ["6917529351156928903","6917529268327711171","6917529351156928903","6917529268327711171"],
'Other': [12,15,18,32]
})
我的第一次尝试是使用下面的行,但这会引发执行错误“KeyError:'Id'”。一个论坛 post 表示 set_index
中的 drop=True
可能会解决它,但在我的情况下似乎不起作用。
UnusedItems = ~Items[Items['Id'].isin(list(Combinations['Red']))]
我试图通过使用这条线来解决它。在执行时,它会生成一个空数据框。仅通过检查,在考虑“红色”列时应退回项目 6917529249201580539。考虑到所有组合列,项目 6917529249201580539、6917529246740186376、6917529212665335174 和 6917529316674574229 应作为未使用项返回。
UnusedItems = ~Items[Items.iloc[:,0].isin(list(Combinations['Red']))]
我会很感激和想法或指导。谢谢。
一个选择是使用 iloc
and reformat to long form with stack
:
从 Combinations
中获取前 4 列
(Combinations.iloc[:, :4].stack()
.droplevel(0).rename_axis(index='Type').reset_index(name='Id'))
Type Id
0 Red 6917529336306454104
1 Blue 6917529268375577150
2 Green 6917529175831101427
3 Cyan 6917529351156928903
4 Red 6917529336306454104
5 Blue 6917529286870790429
6 Green 6917529207434353786
7 Cyan 6917529268327711171
8 Red 6917529336306454104
9 Blue 6917529206310658443
10 Green 6917529309798817021
11 Cyan 6917529351156928903
12 Red 6917529336306454104
13 Blue 6917529206310658443
14 Green 6917529309798817021
15 Cyan 6917529268327711171
然后用Items
、reset_index
to get the 'Id' column back from the index, merge
with indicator, and query
to filter out values that are present in both frames, then drop
指标列进行反加入:
UnusedItems = Items.reset_index().merge(
Combinations.iloc[:, :4].stack()
.droplevel(0).rename_axis(index='Type').reset_index(name='Id'),
how='outer',
indicator='I').query('I != "both"').drop('I', 1)
UnusedItems
:
Id Type
8 6917529249201580539 Red
9 6917529246740186376 Blue
11 6917529212665335174 Blue
17 6917529352287607192 Green
20 6917529316674574229 Cyan
在组合上使用.melt()
,然后将两者都变成集合并减去
set(Items.index) - set(Combinations.melt().value)
基本上,我有一个列表,其中包含一堆按颜色分类的独特项目(项目)。我做了一些事情并生成了一个数据框,其中包含这些独特项目(组合)的选定组合。我的目标是列出原始列表中未出现在组合数据框中的项目。理想情况下,我想检查所有四个颜色列,但对于我的初始测试,我只选择了“红色”列。
import pandas as pd
Items = pd.DataFrame({'Id': ["6917529336306454104","6917529268375577150","6917529175831101427","6917529351156928903","6917529249201580539","6917529246740186376","6917529286870790429","6917529212665335174","6917529206310658443","6917529207434353786","6917529309798817021","6917529352287607192","6917529268327711171","6917529316674574229"
],'Type': ['Red','Blue','Green','Cyan','Red','Blue','Blue','Blue','Blue','Green','Green','Green','Cyan','Cyan']})
Items = Items.set_index('Id', drop=True)
#Do stuff
Combinations = pd.DataFrame({
'Red': ["6917529336306454104","6917529336306454104","6917529336306454104","6917529336306454104"],
'Blue': ["6917529268375577150","6917529286870790429","6917529206310658443","6917529206310658443"],
'Green': ["6917529175831101427","6917529207434353786","6917529309798817021","6917529309798817021"],
'Cyan': ["6917529351156928903","6917529268327711171","6917529351156928903","6917529268327711171"],
'Other': [12,15,18,32]
})
我的第一次尝试是使用下面的行,但这会引发执行错误“KeyError:'Id'”。一个论坛 post 表示 set_index
中的 drop=True
可能会解决它,但在我的情况下似乎不起作用。
UnusedItems = ~Items[Items['Id'].isin(list(Combinations['Red']))]
我试图通过使用这条线来解决它。在执行时,它会生成一个空数据框。仅通过检查,在考虑“红色”列时应退回项目 6917529249201580539。考虑到所有组合列,项目 6917529249201580539、6917529246740186376、6917529212665335174 和 6917529316674574229 应作为未使用项返回。
UnusedItems = ~Items[Items.iloc[:,0].isin(list(Combinations['Red']))]
我会很感激和想法或指导。谢谢。
一个选择是使用 iloc
and reformat to long form with stack
:
Combinations
中获取前 4 列
(Combinations.iloc[:, :4].stack()
.droplevel(0).rename_axis(index='Type').reset_index(name='Id'))
Type Id
0 Red 6917529336306454104
1 Blue 6917529268375577150
2 Green 6917529175831101427
3 Cyan 6917529351156928903
4 Red 6917529336306454104
5 Blue 6917529286870790429
6 Green 6917529207434353786
7 Cyan 6917529268327711171
8 Red 6917529336306454104
9 Blue 6917529206310658443
10 Green 6917529309798817021
11 Cyan 6917529351156928903
12 Red 6917529336306454104
13 Blue 6917529206310658443
14 Green 6917529309798817021
15 Cyan 6917529268327711171
然后用Items
、reset_index
to get the 'Id' column back from the index, merge
with indicator, and query
to filter out values that are present in both frames, then drop
指标列进行反加入:
UnusedItems = Items.reset_index().merge(
Combinations.iloc[:, :4].stack()
.droplevel(0).rename_axis(index='Type').reset_index(name='Id'),
how='outer',
indicator='I').query('I != "both"').drop('I', 1)
UnusedItems
:
Id Type
8 6917529249201580539 Red
9 6917529246740186376 Blue
11 6917529212665335174 Blue
17 6917529352287607192 Green
20 6917529316674574229 Cyan
在组合上使用.melt()
,然后将两者都变成集合并减去
set(Items.index) - set(Combinations.melt().value)