数据框中列值之间的成对交集操作
Pair-wise intersection set operation between column values in a data frame
我有一个包含一列的数据框。此列中的每个值都是一个列表。例如,
A
0 [1, 3, 4]
1 [43, 1, 42]
2 [50, 3]
我想在每个列表之间执行集合交集操作以找到公共元素并生成如下数据框。
0 1 2
0 [1, 2, 3] [1] [3]
1 [1] [43, 1, 42] []
2 [3] [] [50, 3]
是否有一种优雅的方式来执行此操作而不是循环?
我们可以apply
set to convert all values in A
to set
then broadcast
设置交集:
import pandas as pd
df = pd.DataFrame({'A': [[1, 3, 4], [43, 1, 42], [50, 3]]})
# Convert to set
a = df['A'].apply(set).values
# Broadcast set intersection
new_df = pd.DataFrame(a[:, None] & a)
new_df
:
0 1 2
0 {1, 3, 4} {1} {3}
1 {1} {1, 42, 43} {}
2 {3} {} {50, 3}
或者np.vectorize
可以根据需要转换成list
(也可以用来转换成set
而不是apply
):
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': [[1, 3, 4], [43, 1, 42], [50, 3]]})
# Convert to set (using vectorize instead of apply):
a = np.vectorize(set, otypes=['O'])(df['A'])
# Broadcast set intersection and convert back to list
new_df = pd.DataFrame(
np.vectorize(list, otypes=['O'])(a[:, None] & a)
)
new_df
:
0 1 2
0 [1, 3, 4] [1] [3]
1 [1] [1, 42, 43] []
2 [3] [] [50, 3]
我有一个包含一列的数据框。此列中的每个值都是一个列表。例如,
A
0 [1, 3, 4]
1 [43, 1, 42]
2 [50, 3]
我想在每个列表之间执行集合交集操作以找到公共元素并生成如下数据框。
0 1 2
0 [1, 2, 3] [1] [3]
1 [1] [43, 1, 42] []
2 [3] [] [50, 3]
是否有一种优雅的方式来执行此操作而不是循环?
我们可以apply
set to convert all values in A
to set
then broadcast
设置交集:
import pandas as pd
df = pd.DataFrame({'A': [[1, 3, 4], [43, 1, 42], [50, 3]]})
# Convert to set
a = df['A'].apply(set).values
# Broadcast set intersection
new_df = pd.DataFrame(a[:, None] & a)
new_df
:
0 1 2
0 {1, 3, 4} {1} {3}
1 {1} {1, 42, 43} {}
2 {3} {} {50, 3}
或者np.vectorize
可以根据需要转换成list
(也可以用来转换成set
而不是apply
):
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': [[1, 3, 4], [43, 1, 42], [50, 3]]})
# Convert to set (using vectorize instead of apply):
a = np.vectorize(set, otypes=['O'])(df['A'])
# Broadcast set intersection and convert back to list
new_df = pd.DataFrame(
np.vectorize(list, otypes=['O'])(a[:, None] & a)
)
new_df
:
0 1 2
0 [1, 3, 4] [1] [3]
1 [1] [1, 42, 43] []
2 [3] [] [50, 3]