多个匹配和空格变体(python 在匹配后查找到 return 另一列)
Multiple matches and spaces variants (python lookup to return another column after match)
之前,我在另一个列表中匹配了值(此线程 )
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name':['a cat dog - multiple', 'grey puppy - narrow term', 'a cat puppy', 'reddog - single no spaces', 'acatdog - multiple no spaces']})
df2 = pd.DataFrame({'BroadTerm':['cat', 'cat', 'dog', 'dog'], 'NarrowTerm':['cat', 'kitten', 'puppy', 'dog']})
有几个问题:
- 单元格中有 1 个或多个值的匹配值(例如数据框的第 1 行)
- 不包含任何空格的匹配值(例如 df 的第 4 行和第 5 行)
基本代码是
df['Animal'] = df['Name'].str.extract(pat = f"({'|'.join(df2.NarrowTerm)})")[0].map(dict(df2.iloc[:,::-1].values))
但这只适用于单次命中细胞/returns第一次命中)
如何修改代码来执行此操作?
我们可以尝试 findall
然后 explode
df['step1'] = df['Name'].str.findall(pat = f"({'|'.join(df2.NarrowTerm)})")
df['animal'] = df['step1'].explode().map(dict(df2.iloc[:,::-1].values)).groupby(level=0).agg(list)
df
Out[63]:
Name step1 animal
0 a cat dog - multiple [cat, dog] [cat, dog]
1 grey puppy - narrow term [puppy] [dog]
2 a cat puppy [cat, puppy] [cat, dog]
3 reddog - single no spaces [dog] [dog]
4 acatdog - multiple no spaces [cat, dog] [cat, dog]
之前,我在另一个列表中匹配了值(此线程
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name':['a cat dog - multiple', 'grey puppy - narrow term', 'a cat puppy', 'reddog - single no spaces', 'acatdog - multiple no spaces']})
df2 = pd.DataFrame({'BroadTerm':['cat', 'cat', 'dog', 'dog'], 'NarrowTerm':['cat', 'kitten', 'puppy', 'dog']})
有几个问题:
- 单元格中有 1 个或多个值的匹配值(例如数据框的第 1 行)
- 不包含任何空格的匹配值(例如 df 的第 4 行和第 5 行)
基本代码是
df['Animal'] = df['Name'].str.extract(pat = f"({'|'.join(df2.NarrowTerm)})")[0].map(dict(df2.iloc[:,::-1].values))
但这只适用于单次命中细胞/returns第一次命中)
如何修改代码来执行此操作?
我们可以尝试 findall
然后 explode
df['step1'] = df['Name'].str.findall(pat = f"({'|'.join(df2.NarrowTerm)})")
df['animal'] = df['step1'].explode().map(dict(df2.iloc[:,::-1].values)).groupby(level=0).agg(list)
df
Out[63]:
Name step1 animal
0 a cat dog - multiple [cat, dog] [cat, dog]
1 grey puppy - narrow term [puppy] [dog]
2 a cat puppy [cat, puppy] [cat, dog]
3 reddog - single no spaces [dog] [dog]
4 acatdog - multiple no spaces [cat, dog] [cat, dog]