仅当特定列至少包含另一列的单词之一时，才从 Dataframe2 合并 Dataframe1 的 Python/Pandas 中的列

Question

考虑数据帧：

员工：

Employee    City

Ernest      Tel Aviv
Merry       New York
Mason       Cairo

客户：

Client  Words

Ernest  New vacuum Tel
Mason   Tel Aviv is so pretty
Merry   Halo! I live in the city York

我正在尝试从 Dataframe1 (Employees) 的 Pandas 中合并 Dataframe2 (Clients) 中的列，仅当 City 列中的单词之一 ( Employees) 包含在 Clients 的第 Words 列中。

想要的结果应该是这样的：

Employee    City        Words

Ernest      Tel Aviv    New vacuum Tel
Merry       New York    Halo! I live in the city York

试过这样的东西

导入 pandas 作为 pd

data1 = pd.read_csv('..........csv')
data2 = pd.read_csv('..........csv')

output = pd.merge(data1, data2, left_on=  ['City', 'column1'],
                   right_on= ['Words', 'column1'], 
                   how = 'inner')

但并没有真正归结为某种东西。

有什么想法吗？

Answer 1

将 City 和 Words 列拆分为 list 然后 explode() 到生成行
您现在可以merge()获得所需的输出

import pandas as pd
import io

data1 = pd.read_csv(
    io.StringIO("""Employee    City
Ernest      Tel Aviv
Merry       New York
Mason       Cairo"""),sep="\s\s+",engine="python",)

data2 = pd.read_csv(io.StringIO("""Client  Words
Ernest  New vacuum Tel
Mason   Tel Aviv is so pretty
Merry   Halo! I live in the city York"""),sep="\s\s+",engine="python",)

data1.assign(tokens=data1["City"].str.split(" ")).explode("tokens").merge(
    data2.assign(tokens=data2["Words"].str.split(" ")).explode("tokens"),
    left_on=["Employee", "tokens"],
    right_on=["Client", "tokens"],
).drop(columns="tokens").drop_duplicates()

	Employee	City	Client	Words
0	Ernest	Tel Aviv	Ernest	New vacuum Tel
1	Merry	New York	Merry	Halo! I live in the city York

Answer 2

复杂连接；

#提取客户话中的最后一个词

 Clients['joinword']=Clients['Words'].str.extract("(\w+$)")

#让它成为or

用|分隔的搜索词

 s='|'.join(Clients['joinword'].to_list())

#在员工城s找到

Employees['joinword']=Employees['City'].str.findall(f'{s}').str[0]

#现在合并如下

 pd.merge(Employees,Clients, right_on=['Client','joinword'],left_on=['Employee','joinword'], how='inner')

Employee      City joinword  Client                          Words
0   Ernest  Tel Aviv      Tel  Ernest                 New vacuum Tel
1    Merry  New York     York   Merry  Halo! I live in the city York

仅当特定列至少包含另一列的单词之一时，才从 Dataframe2 合并 Dataframe1 的 Python/Pandas 中的列

Merge columns in Python/Pandas of Dataframe1 from Dataframe2 only if specific column contains at least one of the words of the other column

python

merge

dataframe

pandas

data-munging