通过 space 将 datraframe 中的单词拆分为行，同时复制其他列中的信息（python、pandas）

Question

我有一个 df，它由 5 列组成，其中一列是人们发表评论的评论栏。

我想做的是将该评论列 space 拆分为多行，同时复制其他列：

df:

r_id	start	comments
1	2021-01-01	i am the text that needs splitting by space to rows
2	2021-01-02	hello hello

想要的结果：

r_id	start	comments
1	2021-01-01	i
1	2021-01-01	am
1	2021-01-01	the
1	2021-01-01	text
2	2021-01-02	hello
2	2021-01-02	hello

我已经尝试了从 str.split() 到正则表达式的任何方法，但没有结果。

-- 代码为：

df = df.apply(lambda x: x.str.lower() if x.dtype == "object" else x) 
(df
 .assign(comments=df['comments'].str.split())
 .explode('comments')
)
print(df)

df['comments'] = df['comments'].str.replace('ă','a')
df['comments'] = df['comments'].str.replace('â','a')
df['comments'] = df['comments'].str.replace('î','i')
df['comments'] = df['comments'].str.replace('ș','s')
df['comments'] = df['comments'].str.replace('ț','t')
df.replace('[^a-zA-Z0-9]', ' ',regex=True)
df.dropna(inplace=True)
print(df)

但它不会拆分评论

Answer 1

您需要split and explode:

df2 = (df
 .assign(comments=df['comments'].str.split())
 .explode('comments')
)

输出：

   r_id        start    comments
0      1  2021-01-01           i
0      1  2021-01-01          am
0      1  2021-01-01         the
0      1  2021-01-01        text
0      1  2021-01-01        that
0      1  2021-01-01       needs
0      1  2021-01-01   splitting
0      1  2021-01-01          by
0      1  2021-01-01       space
0      1  2021-01-01          to
0      1  2021-01-01        rows
1      2  2021-01-02       hello
1      2  2021-01-02       hello

通过 space 将 datraframe 中的单词拆分为行，同时复制其他列中的信息（python、pandas）

Split words from datraframe by space to rows while duplicating the info from other columns ( python,pandas)

python

split

rows

dataframe

pandas