根据 Python 中的 groupby 从数据框中两列的所有可能组合创建列

Question

我有一个如下所示的数据框

id	group	log
10	UU1Q	23
10	UU1Q	12
10	UU2Q	15
11	UU2Q	17
11	UU3Q	35.6
11	UU1Q	29.8
11	UU1Q	33
11	UU1Q	44
13	UU2Q	17.77
13	UU2Q	19.90
13	UU2Q	55
14	UU3Q	33
15	UU3Q	22

对于每个 ID 和组，我想在新列中创建数据框中存在的所有可能的日志值组合。 期望输出

id	group	log	new_col
10	UU1Q	23	(23,23)
10	UU1Q	12	(23,12)
10	UU2Q	15	(15,15)
11	UU2Q	17	(17,17)
11	UU3Q	35.6	(35.6,35.6)
11	UU1Q	29.8	(29.8, 29.8)
11	UU1Q	33	(29.8,33)
11	UU1Q	44	(29.8,44)
11	UU1Q		(33,44)
13	UU2Q	17.77	(17.77,17.77)
13	UU2Q	19.90	(17.77,19.90)
13	UU2Q	55	(17.77,55)
13	UU2Q		(19.90,55)
14	UU3Q	33	(33,33)
15	UU3Q	22	(22,22)

我使用了 shift 函数，但它只生成与下一个匹配单元格的组合。我想得到每个组中所有可能的组合。
dummy['new'] = dummy.groupby(['ID', 'group'])['log'].shift()

Answer 1

这将 return 所需的输出：

df.groupby(['id','group'], as_index=False).agg({'log':lambda x: list(x)})


Output:

    id  group   log
0   10  UU1Q    [23.0, 12.0]
1   10  UU2Q    [15.0]
2   11  UU1Q    [29.8, 33.0, 44.0]
3   11  UU2Q    [17.0]
4   11  UU3Q    [35.6]
5   13  UU2Q    [17.77, 19.9, 55.0]
6   14  UU3Q    [33.0]
7   15  UU3Q    [22.0]

Answer 2

这很接近所需要的 - 添加了所有组合，如果每组创建一个元素具有相同值的元组：

from  itertools import  combinations

df = (df.groupby(['id','group'])['log']
        .apply(lambda x: list(combinations(x, 2)) if len(x) > 1 else [(*x, *x)])
        .explode()
        .reset_index(name='comb'))
print (df)
    id group           comb
0   10  UU1Q   (23.0, 12.0)
1   10  UU2Q   (15.0, 15.0)
2   11  UU1Q   (29.8, 33.0)
3   11  UU1Q   (29.8, 44.0)
4   11  UU1Q   (33.0, 44.0)
5   11  UU2Q   (17.0, 17.0)
6   11  UU3Q   (35.6, 35.6)
7   13  UU2Q  (17.77, 19.9)
8   13  UU2Q  (17.77, 55.0)
9   13  UU2Q   (19.9, 55.0)
10  14  UU3Q   (33.0, 33.0)
11  15  UU3Q   (22.0, 22.0)

或者可以为每个 ['id','group'] 创建第一行的相同值元组并连接到由组合填充的 DataFrame df1:

from  itertools import  combinations

df1 = (df.groupby(['id','group'])['log']
        .apply(lambda x: list(combinations(x, 2)))
        .explode()
        .dropna()
        .reset_index(name='comb'))

df2 = df.groupby(['id','group']).head(1).copy()
df2['comb'] = df2.pop('log').map(lambda x: (x,x))

df = pd.concat([df2, df1]).sort_values(['id','group'], ignore_index=True)
print (df)
    id group            comb
0   10  UU1Q    (23.0, 23.0)
1   10  UU1Q    (23.0, 12.0)
2   10  UU2Q    (15.0, 15.0)
3   11  UU1Q    (29.8, 29.8)
4   11  UU1Q    (29.8, 33.0)
5   11  UU1Q    (29.8, 44.0)
6   11  UU1Q    (33.0, 44.0)
7   11  UU2Q    (17.0, 17.0)
8   11  UU3Q    (35.6, 35.6)
9   13  UU2Q  (17.77, 17.77)
10  13  UU2Q   (17.77, 19.9)
11  13  UU2Q   (17.77, 55.0)
12  13  UU2Q    (19.9, 55.0)
13  14  UU3Q    (33.0, 33.0)
14  15  UU3Q    (22.0, 22.0)

根据 Python 中的 groupby 从数据框中两列的所有可能组合创建列

Create column from all possible combination of two columns in dataframe based on groupby in Python

python

data-analysis

dataframe

pandas