多个匹配和空格变体(python 在匹配后查找到 return 另一列)

Multiple matches and spaces variants (python lookup to return another column after match)

之前,我在另一个列表中匹配了值(此线程

import pandas as pd
import numpy as np
df = pd.DataFrame({'Name':['a cat dog - multiple', 'grey puppy - narrow term', 'a cat puppy', 'reddog - single no spaces', 'acatdog - multiple no spaces']})
df2 = pd.DataFrame({'BroadTerm':['cat', 'cat', 'dog', 'dog'], 'NarrowTerm':['cat', 'kitten', 'puppy', 'dog']})

有几个问题:

  1. 单元格中有 1 个或多个值的匹配值(例如数据框的第 1 行)
  2. 不包含任何空格的匹配值(例如 df 的第 4 行和第 5 行)

基本代码是

df['Animal'] = df['Name'].str.extract(pat = f"({'|'.join(df2.NarrowTerm)})")[0].map(dict(df2.iloc[:,::-1].values))

但这只适用于单次命中细胞/returns第一次命中)

如何修改代码来执行此操作?

我们可以尝试 findall 然后 explode

df['step1'] = df['Name'].str.findall(pat = f"({'|'.join(df2.NarrowTerm)})")
df['animal'] = df['step1'].explode().map(dict(df2.iloc[:,::-1].values)).groupby(level=0).agg(list)
df
Out[63]: 
                           Name         step1      animal
0          a cat dog - multiple    [cat, dog]  [cat, dog]
1      grey puppy - narrow term       [puppy]       [dog]
2                   a cat puppy  [cat, puppy]  [cat, dog]
3     reddog - single no spaces         [dog]       [dog]
4  acatdog - multiple no spaces    [cat, dog]  [cat, dog]