将句子拆分为单词 pandas 并保留标签
Split sentence into words pandas and keep tags
我有一个 Pandas 数据框,例如
Text label value
board members A1 NaN
a really long sent A2 B2
结果:我想取消嵌套句子并保留每个单词拆分的每个标签,就像这样
Sentence Text label value
1 board A1 NaN
1 members A1 NaN
2 a A2 B2
2 really A2 B2
2 long A2 B2
2 sent A2 B2
额外:如果可能,我想在新列中提取每个单词的 POS(词性)标记_
Sentence Text label value POS
1 board A1 NaN Something
1 members A1 NaN Something
2 a A2 B2 Something
2 really A2 B2 etc
2 long A2 B2
2 sent A2 B2
您可以将 Text
转换为列表,然后 explode
:
df['Text'] = df['Text'].str.split()
df = df.explode("Text")
print(df)
Text label value
0 board A1 NaN
0 members A1 NaN
1 a A2 B2
1 really A2 B2
1 long A2 B2
1 sent A2 B2
我有一个 Pandas 数据框,例如
Text label value
board members A1 NaN
a really long sent A2 B2
结果:我想取消嵌套句子并保留每个单词拆分的每个标签,就像这样
Sentence Text label value
1 board A1 NaN
1 members A1 NaN
2 a A2 B2
2 really A2 B2
2 long A2 B2
2 sent A2 B2
额外:如果可能,我想在新列中提取每个单词的 POS(词性)标记_
Sentence Text label value POS
1 board A1 NaN Something
1 members A1 NaN Something
2 a A2 B2 Something
2 really A2 B2 etc
2 long A2 B2
2 sent A2 B2
您可以将 Text
转换为列表,然后 explode
:
df['Text'] = df['Text'].str.split()
df = df.explode("Text")
print(df)
Text label value
0 board A1 NaN
0 members A1 NaN
1 a A2 B2
1 really A2 B2
1 long A2 B2
1 sent A2 B2