将 pandas df 中的句子拆分为单词行并为每个句子编号
split sentences in pandas df into rows of words and number each sentence
我是 python 的新手,我有一个 pandas 数据框,如下所示:
df =
sn sent ent
0 ['an', 'apple', 'is', 'an', 'example', 'of', 'what?'] ['O', 'F', '0', '0', '0', 'O', 'O']
1 ['a', 'potato', 'is', 'an', 'example', 'of', 'what?'] ['O', 'V', '0', '0', '0', 'O', 'O']
我想创建另一个 pandas 数据框,如下所示:
newdf=
sn sent ent
0 an O
apple F
is O
an O
example O
of O
what? O
1 a O
potato V
is O
an O
example O
of O
what? O
我试过这段代码,结果如下所示
df.set_index('sn')
.stack()
.str.split(expand=True)
.stack()
.unstack(level=1)
.reset_index(level=0, drop=0)
它接近我想要的,但似乎可以弄清楚其余部分
sn sent ent
0 ['an', ['O',
0 'apple', 'F',
0 'is', 'O',
0 'an', 'O',
0 'example', 'O',
0 'of', 'O',
0 'what?', 'O',
1 'a', 'O',
1 'potato', 'V',
1 'is', 'O',
1 'an', 'O',
1 'example', 'O',
1 'of', 'O',
1 'what?'] 'O']
非常感谢任何指点
df = pd.DataFrame({'sn': [0,1],
'sent': [['an', 'apple', 'is', 'an', 'example', 'of', 'what?'], ['a', 'potato', 'is', 'an', 'example', 'of', 'what?']],
'ent': [['O', 'F', '0', '0', '0', 'O', 'O'], ['O', 'V', '0', '0', '0', 'O', 'O']]})
df.apply(pd.Series.explode).set_index('sn')
结果:
sent ent
sn
0 an O
0 apple F
0 is 0
0 an 0
0 example 0
0 of O
0 what? O
1 a O
1 potato V
1 is 0
1 an 0
1 example 0
1 of O
1 what? O
我是 python 的新手,我有一个 pandas 数据框,如下所示:
df =
sn sent ent
0 ['an', 'apple', 'is', 'an', 'example', 'of', 'what?'] ['O', 'F', '0', '0', '0', 'O', 'O']
1 ['a', 'potato', 'is', 'an', 'example', 'of', 'what?'] ['O', 'V', '0', '0', '0', 'O', 'O']
我想创建另一个 pandas 数据框,如下所示:
newdf=
sn sent ent
0 an O
apple F
is O
an O
example O
of O
what? O
1 a O
potato V
is O
an O
example O
of O
what? O
我试过这段代码,结果如下所示
df.set_index('sn')
.stack()
.str.split(expand=True)
.stack()
.unstack(level=1)
.reset_index(level=0, drop=0)
它接近我想要的,但似乎可以弄清楚其余部分
sn sent ent
0 ['an', ['O',
0 'apple', 'F',
0 'is', 'O',
0 'an', 'O',
0 'example', 'O',
0 'of', 'O',
0 'what?', 'O',
1 'a', 'O',
1 'potato', 'V',
1 'is', 'O',
1 'an', 'O',
1 'example', 'O',
1 'of', 'O',
1 'what?'] 'O']
非常感谢任何指点
df = pd.DataFrame({'sn': [0,1],
'sent': [['an', 'apple', 'is', 'an', 'example', 'of', 'what?'], ['a', 'potato', 'is', 'an', 'example', 'of', 'what?']],
'ent': [['O', 'F', '0', '0', '0', 'O', 'O'], ['O', 'V', '0', '0', '0', 'O', 'O']]})
df.apply(pd.Series.explode).set_index('sn')
结果:
sent ent
sn
0 an O
0 apple F
0 is 0
0 an 0
0 example 0
0 of O
0 what? O
1 a O
1 potato V
1 is 0
1 an 0
1 example 0
1 of O
1 what? O