pandas:如果某列包含特定值,则重复一行

pandas: repeat a row if a column contains certain value

我有如下数据框,

import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
               'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']], 'info':['school','door','play']})

如果相应的 'pos' 是 'VERB',我想重复文本栏中的动词。所以我到目前为止做了以下事情,

df['text'] = df['text'].str.split()
df_new = df.apply(pd.Series.explode)

然后我尝试以这种方式重复特定的行,

print(df_new.loc[df_new.index.repeat(df_new['pos']=='VERB')].reset_index(drop=True))

但它没有 return 任何东西。我想要的输出是,

    new_df 
       text    pos    info
0        I   PRON  school
1       go   VERB  school
2       go   VERB  school
3       to    ADP  school
4   school   NOUN  school
5     open   VERB    door
6     open   VERB    door
7      the    DET    door
8    green    ADJ    door
9     door   NOUN    door
10       go   VERB    play
11       go   VERB    play
12      out    ADP    play
13     and  CCONJ    play
14    play   VERB    play
15    play   VERB    play

如果索引不重要可以使用:

df2 = (df.assign(text=df['text'].str.split())
         .explode(['text', 'pos'], ignore_index=True)
      )

df_new = (pd.concat([df2, df2[df2['pos'].eq('VERB')]])
            .sort_index().reset_index(drop=True)
          )

替代方法使用 repeat(和上面的 df2):

df_new = (df2.loc[df2.index.repeat(df2['pos'].eq('VERB').add(1))]
             .reset_index(drop=True)
          )

输出:

      text    pos    info
0        I   PRON  school
1       go   VERB  school
2       go   VERB  school
3       to    ADP  school
4   school   NOUN  school
5     open   VERB    door
6     open   VERB    door
7      the    DET    door
8    green    ADJ    door
9     door   NOUN    door
10      go   VERB    play
11      go   VERB    play
12     out    ADP    play
13     and  CCONJ    play
14    play   VERB    play
15    play   VERB    play