两列的 Frozenset 并集
Frozenset union of two columns
我有一个数据集包含两列冻结集。现在我想 merge/take 合并这些 frozensets。我可以使用 for 循环执行此操作,但是我的数据集包含 > 2700 万行,因此我正在寻找一种避免 for 循环的方法。大家有什么想法吗?
数据
import pandas as pd
import numpy as np
d = {'ID1': [frozenset(['a', 'b']), frozenset(['a','c']), frozenset(['c','d'])],
'ID2': [frozenset(['c', 'g']), frozenset(['i','f']), frozenset(['t','l'])]}
df = pd.DataFrame(data=d)
for循环代码
from functools import reduce
df['frozenset']=0
for i in range(len(df)):
df['frozenset'].iloc[i] = reduce(frozenset.union, [df['ID1'][i],df['ID2'][i]])
期望输出
ID1 ID2 frozenset
0 (a, b) (c, g) (a, c, g, b)
1 (a, c) (f, i) (a, c, f, i)
2 (c, d) (t, l) (c, d, t, l)
你可以试试:
import pandas as pd
import numpy as np
d = {'ID1': [frozenset(['a', 'b']), frozenset(['a','c']), frozenset(['c','d'])],
'ID2': [frozenset(['c', 'g']), frozenset(['i','f']), frozenset(['t','l'])]}
df = pd.DataFrame(data=d)
from functools import reduce
df['frozenset']=0
add = []
for i in range(len(df)):
df['frozenset'].iloc[i] = reduce(frozenset.union, [df['ID1'][i],df['ID2'][i]])
add.append(df)
print(add)
您似乎不需要在此处使用 functools.reduce
。与每对 frozensets 直接联合就足够了。
如果您想要此类操作的最快速度,我建议您查看列表推导(详尽的讨论请参阅 )。
df['union'] = [x | y for x, y in zip(df['ID1'], df['ID2'])]
df
ID1 ID2 union
0 (a, b) (c, g) (c, a, b, g)
1 (c, a) (f, i) (c, a, i, f)
2 (c, d) (l, t) (c, l, d, t)
如果您希望这对多列进行概括,您可以使用 frozenset.union()
将它们全部合并。
df['union2'] = [frozenset.union(*X) for X in df[['ID1', 'ID2']].values]
df
ID1 ID2 union union2
0 (a, b) (c, g) (c, a, b, g) (c, a, b, g)
1 (c, a) (f, i) (c, a, i, f) (c, a, i, f)
2 (c, d) (l, t) (c, l, d, t) (c, l, d, t)
我有一个数据集包含两列冻结集。现在我想 merge/take 合并这些 frozensets。我可以使用 for 循环执行此操作,但是我的数据集包含 > 2700 万行,因此我正在寻找一种避免 for 循环的方法。大家有什么想法吗?
数据
import pandas as pd
import numpy as np
d = {'ID1': [frozenset(['a', 'b']), frozenset(['a','c']), frozenset(['c','d'])],
'ID2': [frozenset(['c', 'g']), frozenset(['i','f']), frozenset(['t','l'])]}
df = pd.DataFrame(data=d)
for循环代码
from functools import reduce
df['frozenset']=0
for i in range(len(df)):
df['frozenset'].iloc[i] = reduce(frozenset.union, [df['ID1'][i],df['ID2'][i]])
期望输出
ID1 ID2 frozenset
0 (a, b) (c, g) (a, c, g, b)
1 (a, c) (f, i) (a, c, f, i)
2 (c, d) (t, l) (c, d, t, l)
你可以试试:
import pandas as pd
import numpy as np
d = {'ID1': [frozenset(['a', 'b']), frozenset(['a','c']), frozenset(['c','d'])],
'ID2': [frozenset(['c', 'g']), frozenset(['i','f']), frozenset(['t','l'])]}
df = pd.DataFrame(data=d)
from functools import reduce
df['frozenset']=0
add = []
for i in range(len(df)):
df['frozenset'].iloc[i] = reduce(frozenset.union, [df['ID1'][i],df['ID2'][i]])
add.append(df)
print(add)
您似乎不需要在此处使用 functools.reduce
。与每对 frozensets 直接联合就足够了。
如果您想要此类操作的最快速度,我建议您查看列表推导(详尽的讨论请参阅
df['union'] = [x | y for x, y in zip(df['ID1'], df['ID2'])]
df
ID1 ID2 union
0 (a, b) (c, g) (c, a, b, g)
1 (c, a) (f, i) (c, a, i, f)
2 (c, d) (l, t) (c, l, d, t)
如果您希望这对多列进行概括,您可以使用 frozenset.union()
将它们全部合并。
df['union2'] = [frozenset.union(*X) for X in df[['ID1', 'ID2']].values]
df
ID1 ID2 union union2
0 (a, b) (c, g) (c, a, b, g) (c, a, b, g)
1 (c, a) (f, i) (c, a, i, f) (c, a, i, f)
2 (c, d) (l, t) (c, l, d, t) (c, l, d, t)