如何在 Python 中使用 groupby 合并文本,同时保持其他行固定?

How to use groupby in Python to merge text while keeping the other rows fixed?

我有以下数据框:

import pandas as pd

df = pd.DataFrame({'Date':['2022-01-01', '2022-01-01','2022-01-01','2022-02-01','2022-02-01',
                      '2022-03-01','2022-03-01','2022-03-01'],
              'Type': ['R','R','R','P','P','G','G','G'],
              'Class':[1,1,1,0,0,2,2,2],
              'Text':['Hello-','I would like.','to be merged.','with all other.',
                      'sentences that.','belong to my same.','group.','thanks a lot.']})

df.index =[1,1,1,2,2,3,3,3]

我想做的是按索引分组以加入文本的列,同时仅保留其他列的第一行。

我尝试了以下两种解决方案均未成功。可能我应该把它们结合起来,但我不知道该怎么做。

# Approach 1
df.groupby([df.index],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))

# Approach 2
df.groupby([df.index], as_index=False).agg({'Date': 'first',
                    'Type': 'first', 'Class': 'first', 'Test': 'join'})

结果应该是:


Date          Type   Class   Text
2022-01-01     R      1      Hello. I would like to be merged.
2022-02-01     P      0      with all other sentences that.
2022-03-01     G      2      belong to my same. group. thanks a lot.

谁能帮我做一下?

谢谢!

我的想法是采用第二种方法并将文本聚合到一个列表中,然后像这样简单地连接各个字符串:

new_df = df.groupby([df.index], as_index=False).agg({'Date': 'first',
                    'Type': 'first', 'Class': 'first', 'Text': list})
new_df['Text'] = new_df['Text'].str.join('')
print(new_df)

输出:


Date    Type    Class   Text
0   2022-01-01  R   1   Hello-I would like.to be merged.
1   2022-02-01  P   0   with all other.sentences that.
2   2022-03-01  G   2   belong to my same.group.thanks a lot.

发现您也可以在一条语句中完成(相同的方法):

new_df = df.groupby([df.index], as_index=False).agg({'Date': 'first',
                    'Type': 'first', 'Class': 'first', 'Text': ''.join})