pandas:如果该值在第二个数据框中,则根据另一个数据框中的条件替换列中的值
pandas: replace values in a column based on a condition in another dataframe if that value is in the second dataframe
我有如下两个数据框,
import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']]})
df2 = pd.DataFrame({'verbs':['go','open','close','share','divide'],
'new_verbs':['went','opened','closed','shared','divided']})
如果在 df2.verbs 中找到动词,我想将 df.text 中的动词替换为它们在 df2.new_verbs 中的过去形式。到目前为止,我已经完成了以下工作,
df['text'] = df['text'].str.split()
new_df = df.apply(pd.Series.explode)
new_df = new_df.assign(new=lambda d: d['pos'].mask(d['pos'] == 'VERB', d['text']))
new_df.text[new_df.new.isin(df2.verbs)] = df2.new_verbs
但是当我打印出结果时,并不是所有的动词都被正确替换了。我想要的输出是,
text pos new
0 I PRON PRON
0 went VERB go
0 to ADP ADP
0 school NOUN NOUN
1 opened VERB open
1 the DET DET
1 green ADJ ADJ
1 door NOUN NOUN
2 went VERB go
2 out ADP ADP
2 and CCONJ CCONJ
2 play VERB play
您可以为此使用正则表达式:
import re
regex = '|'.join(map(re.escape, df2['verbs']))
s = df2.set_index('verbs')['new_verbs']
df['text'] = df['text'].str.replace(regex, lambda m: s.get(m.group(), m),
regex=True)
输出(为清楚起见,这里作为列文本2):
text pos text2
0 I go to school [PRON, VERB, ADP, NOUN] I went to school
1 open the green door [VERB, DET, ADJ, NOUN] opened the green door
2 go out and play [VERB, ADP, CCONJ, VERB] went out and play
对于较小的列表,您可以使用 pandas replace
和这样的字典:
verbs_map = dict(zip(df2.verbs, df2.new_verbs))
new_df.text.replace(verbs_map)
基本上,dict(zip(df2.verbs, df2.new_verbs)
创建了一个新词典,将旧动词映射到它们的新(过去时)动词,例如{'go' : 'went' , 'close' : 'closed', ...}
.
我有如下两个数据框,
import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']]})
df2 = pd.DataFrame({'verbs':['go','open','close','share','divide'],
'new_verbs':['went','opened','closed','shared','divided']})
如果在 df2.verbs 中找到动词,我想将 df.text 中的动词替换为它们在 df2.new_verbs 中的过去形式。到目前为止,我已经完成了以下工作,
df['text'] = df['text'].str.split()
new_df = df.apply(pd.Series.explode)
new_df = new_df.assign(new=lambda d: d['pos'].mask(d['pos'] == 'VERB', d['text']))
new_df.text[new_df.new.isin(df2.verbs)] = df2.new_verbs
但是当我打印出结果时,并不是所有的动词都被正确替换了。我想要的输出是,
text pos new
0 I PRON PRON
0 went VERB go
0 to ADP ADP
0 school NOUN NOUN
1 opened VERB open
1 the DET DET
1 green ADJ ADJ
1 door NOUN NOUN
2 went VERB go
2 out ADP ADP
2 and CCONJ CCONJ
2 play VERB play
您可以为此使用正则表达式:
import re
regex = '|'.join(map(re.escape, df2['verbs']))
s = df2.set_index('verbs')['new_verbs']
df['text'] = df['text'].str.replace(regex, lambda m: s.get(m.group(), m),
regex=True)
输出(为清楚起见,这里作为列文本2):
text pos text2
0 I go to school [PRON, VERB, ADP, NOUN] I went to school
1 open the green door [VERB, DET, ADJ, NOUN] opened the green door
2 go out and play [VERB, ADP, CCONJ, VERB] went out and play
对于较小的列表,您可以使用 pandas replace
和这样的字典:
verbs_map = dict(zip(df2.verbs, df2.new_verbs))
new_df.text.replace(verbs_map)
基本上,dict(zip(df2.verbs, df2.new_verbs)
创建了一个新词典,将旧动词映射到它们的新(过去时)动词,例如{'go' : 'went' , 'close' : 'closed', ...}
.