数据框列的条件聚合，将 'n' 行合并为 1 行

Question

我有一个输入数据框，它包含以下内容：

NAME    TEXT                                            START   END
Tim     Tim Wagner is a teacher.                        10      20.5
Tim     He is from Cleveland, Ohio.                     20.5    40
Frank   Frank is a musician.                            40      50
Tim     He like to travel with his family               50      62
Frank   He is a performing artist who plays the cello.  62      70
Frank   He performed at the Carnegie Hall last year.    70      85
Frank   It was fantastic listening to him.              85      90
Frank   I really enjoyed                                90      93

想要输出dataframe如下：

NAME    TEXT                                                                                       START       END
Tim     Tim Wagner is a teacher.  He is from Cleveland, Ohio.                                         10          40  
Frank   Frank is a musician                                                                           40          50
Tim     He like to travel with his family                                                             50          62
Frank   He is a performing artist who plays the cello. He performed at the Carnegie Hall last year.   62          85
Frank   It was fantastic listening to him. I really enjoyed                                           85          93

我当前的代码：

grp = (df['NAME'] != df['NAME'].shift()).cumsum().rename('group')
df.groupby(['NAME', grp], sort=False)['TEXT','START','END']\
  .agg({'TEXT':lambda x: ' '.join(x), 'START': 'min', 'END':'max'})\
  .reset_index().drop('group', axis=1)

这会将最后 4 行合二为一。相反，我只想合并 2 行（比如任何 n 行），即使 'NAME' 具有相同的值。

感谢您对此的帮助。

谢谢

Answer 1

您可以按 grp 分组以获取组内的相关块：

blocks = df.NAME.ne(df.NAME.shift()).cumsum()

(df.groupby([blocks, df.groupby(blocks).cumcount()//2])
   .agg({'NAME':'first', 'TEXT':' '.join,
         'START':'min', 'END':'max'})
)

输出：

         NAME                                               TEXT  START   END
NAME                                                                         
1    0    Tim  Tim Wagner is a teacher. He is from Cleveland,...   10.0  40.0
2    0  Frank                               Frank is a musician.   40.0  50.0
3    0    Tim                  He like to travel with his family   50.0  62.0
4    0  Frank  He is a performing artist who plays the cello....   62.0  85.0
     1  Frank  It was fantastic listening to him. I really en...   85.0  93.0

数据框列的条件聚合，将 'n' 行合并为 1 行

Conditional aggregation on dataframe columns with combining 'n' rows into 1 row

python

aggregation

dataframe

pandas