仅当特定列至少包含另一列的单词之一时,才从 Dataframe2 合并 Dataframe1 的 Python/Pandas 中的列

Merge columns in Python/Pandas of Dataframe1 from Dataframe2 only if specific column contains at least one of the words of the other column



Employee    City

Ernest      Tel Aviv
Merry       New York
Mason       Cairo


Client  Words

Ernest  New vacuum Tel
Mason   Tel Aviv is so pretty
Merry   Halo! I live in the city York

我正在尝试从 Dataframe1 (Employees) 的 Pandas 中合并 Dataframe2 (Clients) 中的列,仅当 City 列中的单词之一 ( Employees) 包含在 Clients 的第 Words 列中。


Employee    City        Words

Ernest      Tel Aviv    New vacuum Tel
Merry       New York    Halo! I live in the city York


导入 pandas 作为 pd

data1 = pd.read_csv('..........csv')
data2 = pd.read_csv('..........csv')

output = pd.merge(data1, data2, left_on=  ['City', 'column1'],
                   right_on= ['Words', 'column1'], 
                   how = 'inner')



  • CityWords 列拆分为 list 然后 explode() 到生成行
  • 您现在可以merge()获得所需的输出
import pandas as pd
import io

data1 = pd.read_csv(
    io.StringIO("""Employee    City
Ernest      Tel Aviv
Merry       New York
Mason       Cairo"""),sep="\s\s+",engine="python",)

data2 = pd.read_csv(io.StringIO("""Client  Words
Ernest  New vacuum Tel
Mason   Tel Aviv is so pretty
Merry   Halo! I live in the city York"""),sep="\s\s+",engine="python",)

data1.assign(tokens=data1["City"].str.split(" ")).explode("tokens").merge(
    data2.assign(tokens=data2["Words"].str.split(" ")).explode("tokens"),
    left_on=["Employee", "tokens"],
    right_on=["Client", "tokens"],
Employee City Client Words
0 Ernest Tel Aviv Ernest New vacuum Tel
1 Merry New York Merry Halo! I live in the city York









 pd.merge(Employees,Clients, right_on=['Client','joinword'],left_on=['Employee','joinword'], how='inner')

Employee      City joinword  Client                          Words
0   Ernest  Tel Aviv      Tel  Ernest                 New vacuum Tel
1    Merry  New York     York   Merry  Halo! I live in the city York