Pandas 重新排列 groupby 对象（多对多）

Question

我有一个看起来像这样的多对多数据框，其中 id 可能包含多个土地，而一个土地也可能包含多个 id：

t = pd.DataFrame({
    'id': ['a', 'a', 'b', 'c', 'c', 'c'],
    'land': ['A', 'B', 'A', 'A', 'B', 'C'],
    'area': [123, 234, 123, 23, 342, 12],
    'info': ['Im', 'never', 'gonna', 'give', 'you', 'up']
})

最终我要将其加入 GIS，我需要将数据重新排列为按陆地一对一的格式分组。

我的预期输出可能是这样的： ( 我也很难让它打印出 groupby 对象。我想要 完整数据 而不是使用 count() 或 size()，但它只会 return pandas.groupbyObject.)

		area	info
land	id
A	a	123	Im
	b	123	gonna
	c	23	give
B	a	234	never
	c	342	you
C	c	12	up

...并合并数据以使其成为一对一：（只是一个示例输出格式。可以是您自己的可读方式，只要该土地可以显示所有 ID和信息正确）

land	id	area	info
A	1.a ; 2.b ; 3.c	a:123 ; b:123 ; c:23	a:'Im' ; b:'gonna' ; c:'give'
B	1.a ; 2.c	a:234 ; c:342	a:'never' ; c:'you'
C	c	12	'up'

非常感谢。

Answer 1

选项 1 很简单。只是 set_index + sort_index:

option1 = t.set_index(['land','id']).sort_index()

输出：

         area   info
land id             
A    a    123     Im
     b    123  gonna
     c     23   give
B    a    234  never
     c    342    you
C    c     12     up

选项 2 有点棘手。一种方法是使用 groupby.apply，在其中应用一个自定义函数，该函数将 ids 与 area 和 info 组合在一起用于每个 land:

def combine(x):
    return '; '.join(f'{i}:{j}' for i,j in x) if len(x) > 1 else f'{x[0][1]}'

tmp = t.groupby('land').apply(lambda x: [combine(list(enumerate(x['id'], 1))),
                                         combine(x[['id','area']].to_numpy()),
                                         combine(x[['id','info']].to_numpy())])

option2 = pd.DataFrame(tmp.tolist(), index=tmp.index, columns=['id','area','info'])

输出：

                 id                area                   info
land                                                          
A     1:a; 2:b; 3:c  a:123; b:123; c:23  a:Im; b:gonna; c:give
B          1:a; 2:c        a:234; c:342         a:never; c:you
C                 c                  12                     up

Pandas 重新排列 groupby 对象（多对多）

Pandas rearrange groupby objects ( many-to-many)

python

group-by

dataframe

pandas

pandas-groupby