如何根据组对数据框的行进行配对?
how to pair rows of a data frame with respect of a group?
我有一个大数据。一列文本和一列id。
column id
hello world 1
dinner 1
father 1
hi 1
work/related 2
summer 2
我想将具有相同 ID 且相互跟随的单词配对
输出:
new column
hello world ,dinner
dinner ,father
father, hi
work/related , summer
使用 str.cat 连接一组中的每 2 个连续行。
df=df.assign(newcolumn=df.groupby('id')['column'].apply(lambda x: x.str.cat(x.shift(-1),sep=','))).dropna()
column id newcolumn
0 helloworld 1 helloworld,dinner
1 dinner 1 dinner,father
2 father 1 father,hi
4 work/related 2 work/related,summer
我有一个大数据。一列文本和一列id。
column id
hello world 1
dinner 1
father 1
hi 1
work/related 2
summer 2
我想将具有相同 ID 且相互跟随的单词配对
输出:
new column
hello world ,dinner
dinner ,father
father, hi
work/related , summer
使用 str.cat 连接一组中的每 2 个连续行。
df=df.assign(newcolumn=df.groupby('id')['column'].apply(lambda x: x.str.cat(x.shift(-1),sep=','))).dropna()
column id newcolumn
0 helloworld 1 helloworld,dinner
1 dinner 1 dinner,father
2 father 1 father,hi
4 work/related 2 work/related,summer