检查子字符串是否在不同 DF 的字符串中,如果是则来自另一行的 return 值

Check if substring is in a string in a different DF, if it is then return value from another row

我想检查 DF1 中的子字符串是否在 DF2 中。如果是,我想 return 相应行的值。

DF1

Name ID Region
John AAA A
John AAA B
Pat CCC C
Sandra CCC D
Paul DD E
Sandra R9D F
Mia dfg4 G
Kim asfdh5 H
Louise 45gh I

DF2

Name ID Company
John AAAxx1 Microsoft
John AAAxxREG1 Microsoft
Michael BBBER4 Microsoft
Pat CCCERG Dell
Pat CCCERGG Dell
Paul DFHDHF Facebook

期望输出

DF1 的 ID 在 DF2 的 ID 列中我想在 DF1 中创建一个与公司匹配的新列

Name ID Region Company
John AAA A Microsoft
John AAA B Microsoft
Pat CCC C Dell
Sandra CCC D
Paul DD E
Sandra R9D F
Mia dfg4 G
Kim asfdh5 H
Louise 45gh I

我有下面的代码来确定来自 DF1 的 ID 是否在 DF2 中,但是我不确定如何引入公司名称。

DF1['Get company'] = np.in1d(DF1['ID'], DF2['ID'])

尝试从 df1 中找到 ID 字符串到 df2 然后 merge 在此列:

key = df2['ID'].str.extract(fr"({'|'.join(df1['ID'].values)})", expand=False)
df1 = df1.merge(df2['Company'], left_on='ID', right_on=key, how='left').fillna('')
print(df1)

# Output:
    Name    ID    Company
0   John   AAA           
1  Peter   BAB  Microsoft
2   Paul  CCHF     Google
3  Rosie   R9D           

详细信息:从 df1['ID'] 创建正则表达式以从 df2['ID']:

中提取部分字符串
# Regex pattern: try to extract the following pattern
>>> fr"({'|'.join(df1['ID'].values)})"
'(AAA|BAB|CCHF|R9D)'

# After extraction
>>> pd.concat([df2['ID'], key], axis=1)
        ID    ID
0    AEDSV   NaN  # Nothing was found
1   123BAB   BAB  # Found partial string BAB
2  CCHF-RB  CCHF  # Found partial string CCHF
3     YYYY   NaN  # Nothing was found

更新:

To solve this I wonder is it possible to merge based on 2 columns. e.g merge on Name and ID?

key = df2['ID'].str.extract(fr"({'|'.join(df1['ID'].values)})", expand=False)
df1 = pd.merge(df1, df2[['Name', 'Company']], left_on=['Name', 'ID'], 
               right_on=['Name', key], how='left').drop_duplicates().fillna('')
print(df1)

# Output:
      Name      ID Region    Company
0     John     AAA      A  Microsoft
2     John     AAA      B  Microsoft
4      Pat     CCC      C       Dell
6   Sandra     CCC      D           
7     Paul      DD      E           
8   Sandra     R9D      F           
9      Mia    dfg4      G           
10     Kim  asfdh5      H           
11  Louise    45gh      I