使用两列连接，从其他四个数据帧中填充一个 pandas 数据帧中的列

Question

最终结果 Pandas 数据框需要看起来像这样。

        aggregate_FID   jurisdiction    FID       name           rate
2217    750             municipal       405       Auburn         0.093
2218    751             municipal       81        Bonney Lake    0.088
2219    752             municipal       405       Auburn         0.093
2220    753             municipal       171       Steilacoom     0.094
2221    754             municipal       235       Lakewood       0.094
2222    755             municipal       176       Fircrest       0.094
2223    750             state           1         Washington     0.065
2224    751             state           1         Washington     0.065

起点是具有这种结构的数据帧。

        aggregate_FID   jurisdiction    FID
2217    750             municipal       405
2218    751             municipal       81
2219    752             municipal       405
2220    753             municipal       171
2221    754             municipal       235
2222    755             municipal       176
2223    750             state           1
2224    751             state           1

...以及我需要用于填充名称和税率字段的多个数据框。

    FID name        rate    jurisdiction
0   1   Waterville  0.082   municipal
1   2   Riverside   0.081   municipal
2   3   Pierce HBZ  0.079   municipal
3   4   Cle Elum    0.080   municipal
4   5   Pacific     0.095   municipal

    FID name        rate    jurisdiction
0   1   Washington  0.065   state

我需要根据 jurisdiction 和 FID 列将后面的数据帧与第一个数据帧匹配，并填充 name 和 rate 列。我已经设法创建了一个单一的数据帧，并使用...

与后面的数据帧之一合并

df_merge = pd.merge(left=df_aggregate, right=df_jurisdiction, how='left', on=['FID', 'jurisdiction'])

...但这仅适用于其中一个表。不幸的是，我需要为少至一张但多至七张桌子执行此操作。这已经痛苦了两天多了。如果我的问题不够清楚，请随时要求进一步说明，在此先感谢您的帮助。

Answer 1

可以先将所有的管辖表拼接起来再使用merge。它看起来像这样。

j_all = pd.concat([j1, j2, j3, j4, j5, j6, j7])
df_merge = pd.merge(left=df_aggregate, right=j_all, how='left', on=['FID', 'jurisdiction'])

使用两列连接，从其他四个数据帧中填充一个 pandas 数据帧中的列

using two column join, populate columns in one pandas dataframe from four other dataframes

python

calculated-columns

dataframe

pandas