Pandas 字符串列的滚动总和
Pandas rolling sum on string column
我正在使用 Python3 pandas 版本“0.19.2”。
我有一个 pandas df 如下:
chat_id line
1 'Hi.'
1 'Hi, how are you?.'
1 'I'm well, thanks.'
2 'Is it going to rain?.'
2 'No, I don't think so.'
我想按 'chat_id' 进行分组,然后对 'line' 进行滚动求和之类的操作以获得以下结果:
chat_id line conversation
1 'Hi.' 'Hi.'
1 'Hi, how are you?.' 'Hi. Hi, how are you?.'
1 'I'm well, thanks.' 'Hi. Hi, how are you?. I'm well, thanks.'
2 'Is it going to rain?.' 'Is it going to rain?.'
2 'No, I don't think so.' 'Is it going to rain?. No, I don't think so.'
我相信 df.groupby('chat_id')['line'].cumsum() 只适用于数字列。
我也试过 df.groupby(by=['chat_id'], as_index=False)['line'].apply(list) 来获取列表完整对话中的所有行,但后来我不知道如何解压缩该列表以创建 'rolling sum' 样式的对话列。
对我来说有效 apply
with Series.cumsum
,如果需要分隔符添加 space
:
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: (x + ' ').cumsum().str.strip())
print (df)
chat_id line new
0 1 Hi. Hi.
1 1 Hi, how are you?. Hi. Hi, how are you?.
2 1 I'm well, thanks. Hi. Hi, how are you?. I'm well, thanks.
3 2 Is it going to rain?. Is it going to rain?.
4 2 No, I don't think so. Is it going to rain?. No, I don't think so.
df['line'] = df['line'].str.strip("'")
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: "'" + (x + ' ').cumsum().str.strip() + "'")
print (df)
chat_id line \
0 1 Hi.
1 1 Hi, how are you?.
2 1 I'm well, thanks.
3 2 Is it going to rain?.
4 2 No, I don't think so.
new
0 'Hi.'
1 'Hi. Hi, how are you?.'
2 'Hi. Hi, how are you?. I'm well, thanks.'
3 'Is it going to rain?.'
4 'Is it going to rain?. No, I don't think so.'
我正在使用 Python3 pandas 版本“0.19.2”。
我有一个 pandas df 如下:
chat_id line
1 'Hi.'
1 'Hi, how are you?.'
1 'I'm well, thanks.'
2 'Is it going to rain?.'
2 'No, I don't think so.'
我想按 'chat_id' 进行分组,然后对 'line' 进行滚动求和之类的操作以获得以下结果:
chat_id line conversation
1 'Hi.' 'Hi.'
1 'Hi, how are you?.' 'Hi. Hi, how are you?.'
1 'I'm well, thanks.' 'Hi. Hi, how are you?. I'm well, thanks.'
2 'Is it going to rain?.' 'Is it going to rain?.'
2 'No, I don't think so.' 'Is it going to rain?. No, I don't think so.'
我相信 df.groupby('chat_id')['line'].cumsum() 只适用于数字列。
我也试过 df.groupby(by=['chat_id'], as_index=False)['line'].apply(list) 来获取列表完整对话中的所有行,但后来我不知道如何解压缩该列表以创建 'rolling sum' 样式的对话列。
对我来说有效 apply
with Series.cumsum
,如果需要分隔符添加 space
:
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: (x + ' ').cumsum().str.strip())
print (df)
chat_id line new
0 1 Hi. Hi.
1 1 Hi, how are you?. Hi. Hi, how are you?.
2 1 I'm well, thanks. Hi. Hi, how are you?. I'm well, thanks.
3 2 Is it going to rain?. Is it going to rain?.
4 2 No, I don't think so. Is it going to rain?. No, I don't think so.
df['line'] = df['line'].str.strip("'")
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: "'" + (x + ' ').cumsum().str.strip() + "'")
print (df)
chat_id line \
0 1 Hi.
1 1 Hi, how are you?.
2 1 I'm well, thanks.
3 2 Is it going to rain?.
4 2 No, I don't think so.
new
0 'Hi.'
1 'Hi. Hi, how are you?.'
2 'Hi. Hi, how are you?. I'm well, thanks.'
3 'Is it going to rain?.'
4 'Is it going to rain?. No, I don't think so.'