如何(正确)合并 2 Pandas DataFrames 和散点图

How to (correctly) merge 2 Pandas DataFrames and scatter-plot

提前感谢您的回答。

我的最终目标是制作一个散点图——腐败作为解释变量(x 轴,来自 DataFrame 'corr'),不平等作为因变量(y 轴,来自 DataFrame [=23] =]). 非常感谢通过加入这两个 Dataframes 来生成信息 table (DataFrame) 的提示 我有一个数据框 'inq' 用于国家不平等(GINI 指数),另一个数据框 'corr' 用于国家腐败指数。

pd.DataFrame(
    {
        "country": {0: "Angola", 1: "Albania", 2: "United Arab Emirates"},
        "1975": {0: nan, 1: nan, 2: nan},
        "1976": {0: nan, 1: nan, 2: nan},
        "2017": {0: nan, 1: 33.2, 2: nan},
        "2018": {0: 51.3, 1: nan, 2: nan},
    }
)

pd.DataFrame(
    {
        "country": {0: "Afghanistan", 1: "Angola", 2: "Albania"},
        "1975": {0: 44.8, 1: 48.1, 2: 75.1},
        "1976": {0: 44.8, 1: 48.1, 2: 75.1},
        "2018": {0: 24.2, 1: 40.4, 2: 28.4},
        "2019": {0: 40.5, 1: 37.6, 2: 35.9},
    }
)

我连接并操作并得到

cm = pd.concat([inq, corr], axis=0, keys=["Inequality", "Corruption"]).reset_index(
    level=1, drop=True
)

一个新的数据框

pd.DataFrame(
    {
        "indicator": {0: "Inequality", 1: "Inequality", 2: "Inequality"},
        "country": {0: "Angola", 1: "Albania", 2: "United Arab Emirates"},
        "1967": {0: nan, 1: nan, 2: nan},
        "1969": {0: nan, 1: nan, 2: nan},
        "2018": {0: 51.3, 1: nan, 2: nan},
        "2019": {0: nan, 1: nan, 2: nan},
    }
)

您应该以不同的方式连接您的数据框:

df = (pd.concat([inq.set_index('country'),
                 corr.set_index('country')],
                 axis=1,
                 keys=["Inequality", "Corruption"]
                )
        .stack(level=1)
     )
                  Inequality  Corruption
country                                 
Angola      1975         NaN        48.1
            1976         NaN        48.1
            2018        51.3        40.4
            2019         NaN        37.6
Albania     1975         NaN        75.1
            1976         NaN        75.1
            2017        33.2         NaN
            2018         NaN        28.4
            2019         NaN        35.9
Afghanistan 1975         NaN        44.8
            1976         NaN        44.8
            2018         NaN        24.2
            2019         NaN        40.5

然后绘制:

df.plot.scatter(x='Corruption', y='Inequality')

注意。只有一点,因为你的大部分数据都是 NaN