Vlookup函数/合并Pandas但不完全匹配
Vlookup function / merge Pandas but not exact match
我有一个数据框 df1:
Column1 Column2 Column3 Value
000_abc111 Def _ 1 xyz876 Box1
Def _ 1 11111ghi Def _ 1 Box2
23uvw-00-11 Def _ 1 Def _ 1 Box3
另一个 df2:
To_Check
abc
xyza
ghi
xyz
uvw
要在第 1、2 和 3 列(几乎有 20 列)中搜索 df2 的值和 return 值列中的值。
结果 df:
To_Check Value
abc Box1
xyza
ghi Box2
xyz Box1
uvw Box3
pandas 中的合并、映射和 isin 函数适用于精确匹配,但由于数据包含数字、特殊字符和列中的宽空格,因此无法弄清楚(文件是 csv) .
谢谢。
对左连接使用 DataFrame.set_index
with DataFrame.stack
for Series
, then get all matched valeus by Series.str.extractall
and last use DataFrame.merge
:
s = df1.set_index('Value').stack()
df3 = s.str.extractall(f'({"|".join(df2["To_Check"])})')[0].reset_index(name='To_Check')
df = df2.merge(df3[['To_Check','Value']], how='left', on='To_Check')
print (df)
To_Check Value
0 abc Box1
1 xyza NaN
2 ghi Box2
3 xyz Box1
4 uvw Box3
如果有多个值匹配:
print (df1)
Column1 Column2 Column3 Value
0 000_abc111 Def _ 1 xyz876 Box1
1 Def _ 1 11111ghi Def _abc 1 Box2 <- added abc
2 23uvw-00-11 Def _ 1 Def _ 1 Box3
s = df1.set_index('Value').stack()
df3 = s.str.extractall(f'({"|".join(df2["To_Check"])})')[0].reset_index(name='To_Check')
df = df2.merge(df3[['To_Check','Value']], how='left', on='To_Check')
print (df)
To_Check Value
0 abc Box1
1 abc Box2 <- 2 rows for abc
2 xyza NaN
3 ghi Box2
4 xyz Box1
5 uvw Box3
或通过 groupby
与 join
:
连接多个值
s = df1.set_index('Value').stack()
df3 = (s.str.extractall(f'({"|".join(df2["To_Check"])})')[0]
.reset_index(name='To_Check')
.groupby('To_Check')['Value'].agg(','.join)
df = df2.join(df3, on='To_Check')
print (df)
To_Check Value
0 abc Box1,Box2
1 xyza NaN
2 ghi Box2
3 xyz Box1
4 uvw Box3
我有一个数据框 df1:
Column1 Column2 Column3 Value
000_abc111 Def _ 1 xyz876 Box1
Def _ 1 11111ghi Def _ 1 Box2
23uvw-00-11 Def _ 1 Def _ 1 Box3
另一个 df2:
To_Check
abc
xyza
ghi
xyz
uvw
要在第 1、2 和 3 列(几乎有 20 列)中搜索 df2 的值和 return 值列中的值。
结果 df:
To_Check Value
abc Box1
xyza
ghi Box2
xyz Box1
uvw Box3
pandas 中的合并、映射和 isin 函数适用于精确匹配,但由于数据包含数字、特殊字符和列中的宽空格,因此无法弄清楚(文件是 csv) .
谢谢。
对左连接使用 DataFrame.set_index
with DataFrame.stack
for Series
, then get all matched valeus by Series.str.extractall
and last use DataFrame.merge
:
s = df1.set_index('Value').stack()
df3 = s.str.extractall(f'({"|".join(df2["To_Check"])})')[0].reset_index(name='To_Check')
df = df2.merge(df3[['To_Check','Value']], how='left', on='To_Check')
print (df)
To_Check Value
0 abc Box1
1 xyza NaN
2 ghi Box2
3 xyz Box1
4 uvw Box3
如果有多个值匹配:
print (df1)
Column1 Column2 Column3 Value
0 000_abc111 Def _ 1 xyz876 Box1
1 Def _ 1 11111ghi Def _abc 1 Box2 <- added abc
2 23uvw-00-11 Def _ 1 Def _ 1 Box3
s = df1.set_index('Value').stack()
df3 = s.str.extractall(f'({"|".join(df2["To_Check"])})')[0].reset_index(name='To_Check')
df = df2.merge(df3[['To_Check','Value']], how='left', on='To_Check')
print (df)
To_Check Value
0 abc Box1
1 abc Box2 <- 2 rows for abc
2 xyza NaN
3 ghi Box2
4 xyz Box1
5 uvw Box3
或通过 groupby
与 join
:
s = df1.set_index('Value').stack()
df3 = (s.str.extractall(f'({"|".join(df2["To_Check"])})')[0]
.reset_index(name='To_Check')
.groupby('To_Check')['Value'].agg(','.join)
df = df2.join(df3, on='To_Check')
print (df)
To_Check Value
0 abc Box1,Box2
1 xyza NaN
2 ghi Box2
3 xyz Box1
4 uvw Box3