pandas:如果某列包含特定值,则重复一行
pandas: repeat a row if a column contains certain value
我有如下数据框,
import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']], 'info':['school','door','play']})
如果相应的 'pos' 是 'VERB',我想重复文本栏中的动词。所以我到目前为止做了以下事情,
df['text'] = df['text'].str.split()
df_new = df.apply(pd.Series.explode)
然后我尝试以这种方式重复特定的行,
print(df_new.loc[df_new.index.repeat(df_new['pos']=='VERB')].reset_index(drop=True))
但它没有 return 任何东西。我想要的输出是,
new_df
text pos info
0 I PRON school
1 go VERB school
2 go VERB school
3 to ADP school
4 school NOUN school
5 open VERB door
6 open VERB door
7 the DET door
8 green ADJ door
9 door NOUN door
10 go VERB play
11 go VERB play
12 out ADP play
13 and CCONJ play
14 play VERB play
15 play VERB play
如果索引不重要可以使用:
df2 = (df.assign(text=df['text'].str.split())
.explode(['text', 'pos'], ignore_index=True)
)
df_new = (pd.concat([df2, df2[df2['pos'].eq('VERB')]])
.sort_index().reset_index(drop=True)
)
替代方法使用 repeat
(和上面的 df2
):
df_new = (df2.loc[df2.index.repeat(df2['pos'].eq('VERB').add(1))]
.reset_index(drop=True)
)
输出:
text pos info
0 I PRON school
1 go VERB school
2 go VERB school
3 to ADP school
4 school NOUN school
5 open VERB door
6 open VERB door
7 the DET door
8 green ADJ door
9 door NOUN door
10 go VERB play
11 go VERB play
12 out ADP play
13 and CCONJ play
14 play VERB play
15 play VERB play
我有如下数据框,
import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']], 'info':['school','door','play']})
如果相应的 'pos' 是 'VERB',我想重复文本栏中的动词。所以我到目前为止做了以下事情,
df['text'] = df['text'].str.split()
df_new = df.apply(pd.Series.explode)
然后我尝试以这种方式重复特定的行,
print(df_new.loc[df_new.index.repeat(df_new['pos']=='VERB')].reset_index(drop=True))
但它没有 return 任何东西。我想要的输出是,
new_df
text pos info
0 I PRON school
1 go VERB school
2 go VERB school
3 to ADP school
4 school NOUN school
5 open VERB door
6 open VERB door
7 the DET door
8 green ADJ door
9 door NOUN door
10 go VERB play
11 go VERB play
12 out ADP play
13 and CCONJ play
14 play VERB play
15 play VERB play
如果索引不重要可以使用:
df2 = (df.assign(text=df['text'].str.split())
.explode(['text', 'pos'], ignore_index=True)
)
df_new = (pd.concat([df2, df2[df2['pos'].eq('VERB')]])
.sort_index().reset_index(drop=True)
)
替代方法使用 repeat
(和上面的 df2
):
df_new = (df2.loc[df2.index.repeat(df2['pos'].eq('VERB').add(1))]
.reset_index(drop=True)
)
输出:
text pos info
0 I PRON school
1 go VERB school
2 go VERB school
3 to ADP school
4 school NOUN school
5 open VERB door
6 open VERB door
7 the DET door
8 green ADJ door
9 door NOUN door
10 go VERB play
11 go VERB play
12 out ADP play
13 and CCONJ play
14 play VERB play
15 play VERB play