如何合并具有不同索引但有一个共同 ID 因子的两个数据集?

How to Merge two datasets with different indexes but one common ID factor?

我正在使用两个不同的数据集:一个关于 COVID-19 统计数据,另一个关于城市的人口特征。

covid19,即 covid.df 如下所示:

注:Date, City ID, City, State 都是索引

Date City ID City State Population mean Population_2019 mean Confirmed_rate_100k mean Confirmed_rate_100k std death_rate mean death_rate std new_confirmed new_deaths
2020-02 120385 Los Angeles CA 9559699 45959669 0.653 0.556 0.6 0.01 33 5
2020-02 120054 Houtson Texas 3304040 3343560 0.543 0.043 22.34 1.6 60 9
... ... .... ... ... ... ... ... ... ... ... ...
2022-05 120385 Los Angeles CA 9559483 45966549 0.672 0.032 2.3 0.5 22 12

有人口统计信息的,demo.df包括以下

City ID HDI Education Mobility Poverty
120385 0.54 72.5 55.522 33.21
120054 0.33 66.2 76.433 12.504

我想在 covid.df 上包含来自 demo.df 的信息,但是,考虑到两个数据集的索引不同,concat() 函数一直让我很为难.

如何合并两个这样的数据集,使 covid.df 看起来像这样:

Date City ID City State HDI Education Mobility Poverty Population mean Population_2019 mean Confirmed_rate_100k mean Confirmed_rate_100k std death_rate mean death_rate std new_confirmed new_deaths
2020-02 120385 Los Angeles CA 0.54 72.5 55.522 33.21 9559699 45959669 0.653 0.556 0.6 0.01 33 5
2020-02 120054 Houston TX 0.33 66.2 76.433 12.504 3304040 3343560 0.543 0.043 22.34 1.6 60 9
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2022-05 120385 Los Angeles CA 0.54 72.5 55.522 33.21 9559483 45966549 0.672 0.032 2.3 0.5 22 12

谢谢!

你只需要这个:

covid = covid.merge(demo, how='left', on='City ID')

例如,假设我们有这个输入(注意 88, 99'fish', 'fowl' 的不同索引):

covid.df:
       Date  City ID          City    State  Population mean  Population_2019 mean  Confirmed_rate_100k mean  Confirmed_rate_100k std  death_rate mean  death_rate std  new_confirmed  new_deaths
88  2020-02   120385  Los Angeles       CA           9559699              45959669                     0.653                    0.556             0.60            0.01             33           5
99  2020-02   120054      Houtson    Texas           3304040               3343560                     0.543                    0.043            22.34            1.60             60           9
demo.df:
      City ID   HDI  Education  Mobility  Poverty
fish   120385  0.54       72.5    55.522   33.210
fowl   120054  0.33       66.2    76.433   12.50

输出将是

      Date  City ID          City    State  Population mean  Population_2019 mean  Confirmed_rate_100k mean  ...  death_rate std  new_confirmed  new_deaths   HDI  Education  Mobility  Poverty
0  2020-02   120385  Los Angeles       CA           9559699              45959669                     0.653  ...            0.01             33           5  0.54       72.5    55.522   33.210
1  2020-02   120054      Houtson    Texas           3304040               3343560                     0.543  ...            1.60             60           9  0.33       66.2    76.433   12.504

[2 rows x 16 columns]