Pandas 像 vlookup 一样在第一次匹配时停止合并而不是复制

Question

我有两个table，PO数据和商品编码数据。一些天才决定某些 material 组代码应该相同，因为它们在较低级别上由 GL 帐户区分。因此，我无法在 material 组 ps 上合并，因为我会得到重复的行。

假设如下：

import pandas as pd

d1 = {'PO':[123456,654321,971358], 'matgrp': ["1001",'803A',"803B"]}
d2 = {'matgrp':["1001", "1001", "803A", "803B"], 'commodity':['foo - 10001', 'bar - 10002', 'spam - 100003','eggs - 10003']}

pos = pd.DataFrame(data=d1)
mat_grp = pd.DataFrame(data=d2)

merged = pd.merge(pos, mat_grp, how='left', on='matgrp')
merged.head()
      PO    matgrp  commodity
0   123456  1001    foo - 10001
1   123456  1001    bar - 10002
2   654321  803A    spam - 100003
3   971358  803B    eggs - 10003

如您所见，PO 123456 出现了两次，因为 material 1001 在 material 组ps table.[=12 中有多行=]

期望的行为是 merge 只合并一次，找到 material 组的第一个条目，添加它，没有别的，就像 vlookup 的工作方式一样。长商品代码在某些情况下可能不正确（总是显示第一个），这是acceptable错误。

ps.: 虽然欢迎提出如何解决这个问题范围之外的问题的建议（比如在 GL 帐户上合并，由于其他原因这是不可行的）假设如下：可用数据是来自 SAP ME81N 的 PO 列表和包含 material groups/commodity 代码列表的 Excel 文件。

Answer 1

pandas' merge 的行为（大部分）类似于 SQL 合并，并将提供匹配键的所有组合。如果您只想要第一项，只需将其从您提供的数据中删除即可合并。

在 mat_grp 上使用 drop_duplicates:

merged = pd.merge(pos, mat_grp.drop_duplicates('matgrp'), how='left', on='matgrp')

输出：

       PO matgrp      commodity
0  123456   1001    foo - 10001
1  654321   803A  spam - 100003
2  971358   803B   eggs - 10003

Pandas 像 vlookup 一样在第一次匹配时停止合并而不是复制

Pandas merge stop at first match like vlookup instead of duplicating

python

merge

dataframe

pandas