仅当特定列至少包含另一列的单词之一时,才从 Dataframe2 合并 Dataframe1 的 Python/Pandas 中的列
Merge columns in Python/Pandas of Dataframe1 from Dataframe2 only if specific column contains at least one of the words of the other column
考虑数据帧:
员工:
Employee City
Ernest Tel Aviv
Merry New York
Mason Cairo
客户:
Client Words
Ernest New vacuum Tel
Mason Tel Aviv is so pretty
Merry Halo! I live in the city York
我正在尝试从 Dataframe1 (Employees
) 的 Pandas 中合并 Dataframe2 (Clients
) 中的列,仅当 City
列中的单词之一 ( Employees
) 包含在 Clients
的第 Words
列中。
想要的结果应该是这样的:
Employee City Words
Ernest Tel Aviv New vacuum Tel
Merry New York Halo! I live in the city York
试过这样的东西
导入 pandas 作为 pd
data1 = pd.read_csv('..........csv')
data2 = pd.read_csv('..........csv')
output = pd.merge(data1, data2, left_on= ['City', 'column1'],
right_on= ['Words', 'column1'],
how = 'inner')
但并没有真正归结为某种东西。
有什么想法吗?
- 将 City 和 Words 列拆分为 list 然后
explode()
到生成行
- 您现在可以
merge()
获得所需的输出
import pandas as pd
import io
data1 = pd.read_csv(
io.StringIO("""Employee City
Ernest Tel Aviv
Merry New York
Mason Cairo"""),sep="\s\s+",engine="python",)
data2 = pd.read_csv(io.StringIO("""Client Words
Ernest New vacuum Tel
Mason Tel Aviv is so pretty
Merry Halo! I live in the city York"""),sep="\s\s+",engine="python",)
data1.assign(tokens=data1["City"].str.split(" ")).explode("tokens").merge(
data2.assign(tokens=data2["Words"].str.split(" ")).explode("tokens"),
left_on=["Employee", "tokens"],
right_on=["Client", "tokens"],
).drop(columns="tokens").drop_duplicates()
Employee
City
Client
Words
0
Ernest
Tel Aviv
Ernest
New vacuum Tel
1
Merry
New York
Merry
Halo! I live in the city York
复杂连接;
#提取客户话中的最后一个词
Clients['joinword']=Clients['Words'].str.extract("(\w+$)")
#让它成为or
用|
分隔的搜索词
s='|'.join(Clients['joinword'].to_list())
#在员工城s
找到
Employees['joinword']=Employees['City'].str.findall(f'{s}').str[0]
#现在合并如下
pd.merge(Employees,Clients, right_on=['Client','joinword'],left_on=['Employee','joinword'], how='inner')
Employee City joinword Client Words
0 Ernest Tel Aviv Tel Ernest New vacuum Tel
1 Merry New York York Merry Halo! I live in the city York
考虑数据帧:
员工:
Employee City
Ernest Tel Aviv
Merry New York
Mason Cairo
客户:
Client Words
Ernest New vacuum Tel
Mason Tel Aviv is so pretty
Merry Halo! I live in the city York
我正在尝试从 Dataframe1 (Employees
) 的 Pandas 中合并 Dataframe2 (Clients
) 中的列,仅当 City
列中的单词之一 ( Employees
) 包含在 Clients
的第 Words
列中。
想要的结果应该是这样的:
Employee City Words
Ernest Tel Aviv New vacuum Tel
Merry New York Halo! I live in the city York
试过这样的东西
导入 pandas 作为 pd
data1 = pd.read_csv('..........csv')
data2 = pd.read_csv('..........csv')
output = pd.merge(data1, data2, left_on= ['City', 'column1'],
right_on= ['Words', 'column1'],
how = 'inner')
但并没有真正归结为某种东西。
有什么想法吗?
- 将 City 和 Words 列拆分为 list 然后
explode()
到生成行 - 您现在可以
merge()
获得所需的输出
import pandas as pd
import io
data1 = pd.read_csv(
io.StringIO("""Employee City
Ernest Tel Aviv
Merry New York
Mason Cairo"""),sep="\s\s+",engine="python",)
data2 = pd.read_csv(io.StringIO("""Client Words
Ernest New vacuum Tel
Mason Tel Aviv is so pretty
Merry Halo! I live in the city York"""),sep="\s\s+",engine="python",)
data1.assign(tokens=data1["City"].str.split(" ")).explode("tokens").merge(
data2.assign(tokens=data2["Words"].str.split(" ")).explode("tokens"),
left_on=["Employee", "tokens"],
right_on=["Client", "tokens"],
).drop(columns="tokens").drop_duplicates()
Employee | City | Client | Words | |
---|---|---|---|---|
0 | Ernest | Tel Aviv | Ernest | New vacuum Tel |
1 | Merry | New York | Merry | Halo! I live in the city York |
复杂连接;
#提取客户话中的最后一个词
Clients['joinword']=Clients['Words'].str.extract("(\w+$)")
#让它成为or
|
分隔的搜索词
s='|'.join(Clients['joinword'].to_list())
#在员工城s
找到
Employees['joinword']=Employees['City'].str.findall(f'{s}').str[0]
#现在合并如下
pd.merge(Employees,Clients, right_on=['Client','joinword'],left_on=['Employee','joinword'], how='inner')
Employee City joinword Client Words
0 Ernest Tel Aviv Tel Ernest New vacuum Tel
1 Merry New York York Merry Halo! I live in the city York