如何合并（使用 DataFrame）具有相同输入但顺序不同的两个数据集

Question

我有两个数据集，其中一个基本上可以被认为是描述符集，另一个包含信息。

我有一个简单的例子来说明我的意思。

import pandas as pd

第一个数据集，即描述符：

df1 = pd.DataFrame({"color": ["blue", "yellow", "red"],
                    "abbv": ["b", "y", "r"]})

第二个数据集：

df2 = pd.DataFrame({"color_1": ["blue", "red", "yellow"],
                    "color_2": ["yellow", "blue", "red"],
                    "total": ["green", "purple", "orange"]})

我想做的是使用 pd.merge 合并两个数据集，使最终数据集看起来像这样：

 | color_1 | color_2 | total | abbv_1 | abbv_2 |
 | ------- | ------- | ----- | ------ | ------ |
 |  blue   |  yellow | green |   b    |   y    |
      .          .       .       .        .
      .          .       .       .        .

Answer 1

映射系列可以从 df1 和 set_index. Then the new columns can be added to df2 with Series.map:

创建

# Create the Mapping Series
mapper = df1.set_index('color')['abbv']
# Add the New Columns
df2['abbv_1'] = df2['color_1'].map(mapper)
df2['abbv_2'] = df2['color_2'].map(mapper)

或通过使用 str.contains:

过滤列来迭代所有颜色列

mapper = df1.set_index('color')['abbv']
for c in df2.columns[df2.columns.str.contains('color')]:
    df2[f'abbv_{c.rsplit("_", 1)[-1]}'] = df2[c].map(mapper)

df2:

  color_1 color_2   total abbv_1 abbv_2
0    blue  yellow   green      b      y
1     red    blue  purple      r      b
2  yellow     red  orange      y      r

如何合并（使用 DataFrame）具有相同输入但顺序不同的两个数据集

How to merge (using DataFrame) two data sets with the same inputs but in a different order

merge

dataframe

pandas