通过 space 将 datraframe 中的单词拆分为行,同时复制其他列中的信息(python、pandas)

Split words from datraframe by space to rows while duplicating the info from other columns ( python,pandas)

我有一个 df,它由 5 列组成,其中一列是人们发表评论的评论栏。

我想做的是将该评论列 space 拆分为多行,同时复制其他列:

df:

r_id start comments
1 2021-01-01 i am the text that needs splitting by space to rows
2 2021-01-02 hello hello

想要的结果:

r_id start comments
1 2021-01-01 i
1 2021-01-01 am
1 2021-01-01 the
1 2021-01-01 text
2 2021-01-02 hello
2 2021-01-02 hello

我已经尝试了从 str.split() 到正则表达式的任何方法,但没有结果。

-- 代码为:

df = df.apply(lambda x: x.str.lower() if x.dtype == "object" else x) 
(df
 .assign(comments=df['comments'].str.split())
 .explode('comments')
)
print(df)

df['comments'] = df['comments'].str.replace('ă','a')
df['comments'] = df['comments'].str.replace('â','a')
df['comments'] = df['comments'].str.replace('î','i')
df['comments'] = df['comments'].str.replace('ș','s')
df['comments'] = df['comments'].str.replace('ț','t')
df.replace('[^a-zA-Z0-9]', ' ',regex=True)
df.dropna(inplace=True)
print(df)

但它不会拆分评论

您需要split and explode:

df2 = (df
 .assign(comments=df['comments'].str.split())
 .explode('comments')
)

输出:

   r_id        start    comments
0      1  2021-01-01           i
0      1  2021-01-01          am
0      1  2021-01-01         the
0      1  2021-01-01        text
0      1  2021-01-01        that
0      1  2021-01-01       needs
0      1  2021-01-01   splitting
0      1  2021-01-01          by
0      1  2021-01-01       space
0      1  2021-01-01          to
0      1  2021-01-01        rows
1      2  2021-01-02       hello
1      2  2021-01-02       hello