pandas:用列表值字典中的键和值替换列值
pandas: replace column value with keys and values in a dictionary of list values
我有一个数据框和一个字典如下(但更大),
import pandas as pd
df = pd.DataFrame({'text': ['can you open the door?','shall you write the address?']})
dic = {'Should': ['can','could'], 'Could': ['shall'], 'Would': ['will']}
如果可以在 dic 值列表中找到它们,我想替换文本列中的单词,所以我做了以下操作,它适用于具有一个值但不适用于另一个列表的列表,
for key, val in dic.items():
if df['text'].str.lower().str.split().map(lambda x: x[0]).str.contains('|'.join(val)).any():
df['text'] = df['text'].str.replace('|'.join(val), key, regex=False)
print(df)
我想要的输出是,
text
0 Should you open the door?
1 Could you write the address?
您可以在 flatten 字典中使用小写到 d
作为键和值,然后用单词边界替换值,最后使用 Series.str.capitalize
:
d = {x.lower(): k.lower() for k, v in dic.items() for x in v}
regex = '|'.join(r"\b{}\b".format(x) for x in d.keys())
df['text'] = (df['text'].str.lower()
.str.replace(regex, lambda x: d[x.group()], regex=True)
.str.capitalize())
print(df)
text
0 Should you open the door?
1 Could you write the address?
最好是改变逻辑并尽量减少 pandas 步骤。
您可以创建一个直接包含您理想输出的字典:
dic2 = {v:k for k,l in dic.items() for v in l}
# {'can': 'Should', 'could': 'Should', 'shall': 'Could', 'will': 'Would'}
# or if not yet formatted:
# dic2 = {v.lower():k.capitalize() for k,l in dic.items() for v in l}
import re
regex = '|'.join(map(re.escape, dic2))
df['text'] = df['text'].str.replace(f'\b({regex})\b',
lambda m: dic2.get(m.group()),
case=False, # only if case doesn't matter
regex=True)
输出(为清楚起见作为 text2 列):
text text2
0 can you open the door? Should you open the door?
1 shall you write the address? Could you write the address?
我有一个数据框和一个字典如下(但更大),
import pandas as pd
df = pd.DataFrame({'text': ['can you open the door?','shall you write the address?']})
dic = {'Should': ['can','could'], 'Could': ['shall'], 'Would': ['will']}
如果可以在 dic 值列表中找到它们,我想替换文本列中的单词,所以我做了以下操作,它适用于具有一个值但不适用于另一个列表的列表,
for key, val in dic.items():
if df['text'].str.lower().str.split().map(lambda x: x[0]).str.contains('|'.join(val)).any():
df['text'] = df['text'].str.replace('|'.join(val), key, regex=False)
print(df)
我想要的输出是,
text
0 Should you open the door?
1 Could you write the address?
您可以在 flatten 字典中使用小写到 d
作为键和值,然后用单词边界替换值,最后使用 Series.str.capitalize
:
d = {x.lower(): k.lower() for k, v in dic.items() for x in v}
regex = '|'.join(r"\b{}\b".format(x) for x in d.keys())
df['text'] = (df['text'].str.lower()
.str.replace(regex, lambda x: d[x.group()], regex=True)
.str.capitalize())
print(df)
text
0 Should you open the door?
1 Could you write the address?
最好是改变逻辑并尽量减少 pandas 步骤。
您可以创建一个直接包含您理想输出的字典:
dic2 = {v:k for k,l in dic.items() for v in l}
# {'can': 'Should', 'could': 'Should', 'shall': 'Could', 'will': 'Would'}
# or if not yet formatted:
# dic2 = {v.lower():k.capitalize() for k,l in dic.items() for v in l}
import re
regex = '|'.join(map(re.escape, dic2))
df['text'] = df['text'].str.replace(f'\b({regex})\b',
lambda m: dic2.get(m.group()),
case=False, # only if case doesn't matter
regex=True)
输出(为清楚起见作为 text2 列):
text text2
0 can you open the door? Should you open the door?
1 shall you write the address? Could you write the address?