Vlookup函数/合并Pandas但不完全匹配

Vlookup function / merge Pandas but not exact match

我有一个数据框 df1:

Column1      Column2    Column3    Value
000_abc111   Def _ 1    xyz876     Box1
Def _ 1      11111ghi   Def _ 1    Box2
23uvw-00-11  Def _ 1    Def _ 1    Box3

另一个 df2:

To_Check
abc
xyza
ghi
xyz
uvw

要在第 1、2 和 3 列(几乎有 20 列)中搜索 df2 的值和 return 值列中的值。

结果 df:

To_Check    Value
abc         Box1
xyza    
ghi         Box2
xyz         Box1
uvw         Box3

pandas 中的合并、映射和 isin 函数适用于精确匹配,但由于数据包含数字、特殊字符和列中的宽空格,因此无法弄清楚(文件是 csv) .

谢谢。

对左连接使用 DataFrame.set_index with DataFrame.stack for Series, then get all matched valeus by Series.str.extractall and last use DataFrame.merge

s = df1.set_index('Value').stack()
df3 = s.str.extractall(f'({"|".join(df2["To_Check"])})')[0].reset_index(name='To_Check')

df = df2.merge(df3[['To_Check','Value']], how='left', on='To_Check')
print (df)
  To_Check Value
0      abc  Box1
1     xyza   NaN
2      ghi  Box2
3      xyz  Box1
4      uvw  Box3

如果有多个值匹配:

print (df1)

       Column1   Column2     Column3 Value
0   000_abc111   Def _ 1      xyz876  Box1
1      Def _ 1  11111ghi  Def _abc 1  Box2 <- added abc
2  23uvw-00-11   Def _ 1     Def _ 1  Box3


s = df1.set_index('Value').stack()
df3 = s.str.extractall(f'({"|".join(df2["To_Check"])})')[0].reset_index(name='To_Check')

df = df2.merge(df3[['To_Check','Value']], how='left', on='To_Check')
print (df)
  To_Check Value
0      abc  Box1
1      abc  Box2 <- 2 rows for abc
2     xyza   NaN
3      ghi  Box2
4      xyz  Box1
5      uvw  Box3

或通过 groupbyjoin:

连接多个值
s = df1.set_index('Value').stack()
df3 = (s.str.extractall(f'({"|".join(df2["To_Check"])})')[0]
       .reset_index(name='To_Check')
        .groupby('To_Check')['Value'].agg(','.join)

df = df2.join(df3, on='To_Check')
print (df)
  To_Check      Value
0      abc  Box1,Box2
1     xyza        NaN
2      ghi       Box2
3      xyz       Box1
4      uvw       Box3