
pandas: repeat a row if a column contains certain value


import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
               'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']], 'info':['school','door','play']})

如果相应的 'pos' 是 'VERB',我想重复文本栏中的动词。所以我到目前为止做了以下事情,

df['text'] = df['text'].str.split()
df_new = df.apply(pd.Series.explode)



但它没有 return 任何东西。我想要的输出是,

       text    pos    info
0        I   PRON  school
1       go   VERB  school
2       go   VERB  school
3       to    ADP  school
4   school   NOUN  school
5     open   VERB    door
6     open   VERB    door
7      the    DET    door
8    green    ADJ    door
9     door   NOUN    door
10       go   VERB    play
11       go   VERB    play
12      out    ADP    play
13     and  CCONJ    play
14    play   VERB    play
15    play   VERB    play


df2 = (df.assign(text=df['text'].str.split())
         .explode(['text', 'pos'], ignore_index=True)

df_new = (pd.concat([df2, df2[df2['pos'].eq('VERB')]])

替代方法使用 repeat(和上面的 df2):

df_new = (df2.loc[df2.index.repeat(df2['pos'].eq('VERB').add(1))]


      text    pos    info
0        I   PRON  school
1       go   VERB  school
2       go   VERB  school
3       to    ADP  school
4   school   NOUN  school
5     open   VERB    door
6     open   VERB    door
7      the    DET    door
8    green    ADJ    door
9     door   NOUN    door
10      go   VERB    play
11      go   VERB    play
12     out    ADP    play
13     and  CCONJ    play
14    play   VERB    play
15    play   VERB    play