pandas：如果某列包含特定值，则重复一行

Question

我有如下数据框，

import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
               'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']], 'info':['school','door','play']})

如果相应的 'pos' 是 'VERB'，我想重复文本栏中的动词。所以我到目前为止做了以下事情，

df['text'] = df['text'].str.split()
df_new = df.apply(pd.Series.explode)

然后我尝试以这种方式重复特定的行，

print(df_new.loc[df_new.index.repeat(df_new['pos']=='VERB')].reset_index(drop=True))

但它没有 return 任何东西。我想要的输出是，

    new_df 
       text    pos    info
0        I   PRON  school
1       go   VERB  school
2       go   VERB  school
3       to    ADP  school
4   school   NOUN  school
5     open   VERB    door
6     open   VERB    door
7      the    DET    door
8    green    ADJ    door
9     door   NOUN    door
10       go   VERB    play
11       go   VERB    play
12      out    ADP    play
13     and  CCONJ    play
14    play   VERB    play
15    play   VERB    play

Answer 1

如果索引不重要可以使用：

df2 = (df.assign(text=df['text'].str.split())
         .explode(['text', 'pos'], ignore_index=True)
      )

df_new = (pd.concat([df2, df2[df2['pos'].eq('VERB')]])
            .sort_index().reset_index(drop=True)
          )

替代方法使用 repeat（和上面的 df2）：

df_new = (df2.loc[df2.index.repeat(df2['pos'].eq('VERB').add(1))]
             .reset_index(drop=True)
          )

输出：

      text    pos    info
0        I   PRON  school
1       go   VERB  school
2       go   VERB  school
3       to    ADP  school
4   school   NOUN  school
5     open   VERB    door
6     open   VERB    door
7      the    DET    door
8    green    ADJ    door
9     door   NOUN    door
10      go   VERB    play
11      go   VERB    play
12     out    ADP    play
13     and  CCONJ    play
14    play   VERB    play
15    play   VERB    play

pandas：如果某列包含特定值，则重复一行

pandas: repeat a row if a column contains certain value

python

row

repeat

dataframe

pandas