如何(正确)合并 2 Pandas DataFrames 和散点图
How to (correctly) merge 2 Pandas DataFrames and scatter-plot
提前感谢您的回答。
我的最终目标是制作一个散点图——腐败作为解释变量(x 轴,来自 DataFrame 'corr'),不平等作为因变量(y 轴,来自 DataFrame [=23] =]).
非常感谢通过加入这两个 Dataframes 来生成信息 table (DataFrame) 的提示
我有一个数据框 'inq' 用于国家不平等(GINI 指数),另一个数据框 'corr' 用于国家腐败指数。
pd.DataFrame(
{
"country": {0: "Angola", 1: "Albania", 2: "United Arab Emirates"},
"1975": {0: nan, 1: nan, 2: nan},
"1976": {0: nan, 1: nan, 2: nan},
"2017": {0: nan, 1: 33.2, 2: nan},
"2018": {0: 51.3, 1: nan, 2: nan},
}
)
pd.DataFrame(
{
"country": {0: "Afghanistan", 1: "Angola", 2: "Albania"},
"1975": {0: 44.8, 1: 48.1, 2: 75.1},
"1976": {0: 44.8, 1: 48.1, 2: 75.1},
"2018": {0: 24.2, 1: 40.4, 2: 28.4},
"2019": {0: 40.5, 1: 37.6, 2: 35.9},
}
)
我连接并操作并得到
cm = pd.concat([inq, corr], axis=0, keys=["Inequality", "Corruption"]).reset_index(
level=1, drop=True
)
一个新的数据框
pd.DataFrame(
{
"indicator": {0: "Inequality", 1: "Inequality", 2: "Inequality"},
"country": {0: "Angola", 1: "Albania", 2: "United Arab Emirates"},
"1967": {0: nan, 1: nan, 2: nan},
"1969": {0: nan, 1: nan, 2: nan},
"2018": {0: 51.3, 1: nan, 2: nan},
"2019": {0: nan, 1: nan, 2: nan},
}
)
您应该以不同的方式连接您的数据框:
df = (pd.concat([inq.set_index('country'),
corr.set_index('country')],
axis=1,
keys=["Inequality", "Corruption"]
)
.stack(level=1)
)
Inequality Corruption
country
Angola 1975 NaN 48.1
1976 NaN 48.1
2018 51.3 40.4
2019 NaN 37.6
Albania 1975 NaN 75.1
1976 NaN 75.1
2017 33.2 NaN
2018 NaN 28.4
2019 NaN 35.9
Afghanistan 1975 NaN 44.8
1976 NaN 44.8
2018 NaN 24.2
2019 NaN 40.5
然后绘制:
df.plot.scatter(x='Corruption', y='Inequality')
注意。只有一点,因为你的大部分数据都是 NaN
提前感谢您的回答。
我的最终目标是制作一个散点图——腐败作为解释变量(x 轴,来自 DataFrame 'corr'),不平等作为因变量(y 轴,来自 DataFrame [=23] =]). 非常感谢通过加入这两个 Dataframes 来生成信息 table (DataFrame) 的提示 我有一个数据框 'inq' 用于国家不平等(GINI 指数),另一个数据框 'corr' 用于国家腐败指数。
pd.DataFrame(
{
"country": {0: "Angola", 1: "Albania", 2: "United Arab Emirates"},
"1975": {0: nan, 1: nan, 2: nan},
"1976": {0: nan, 1: nan, 2: nan},
"2017": {0: nan, 1: 33.2, 2: nan},
"2018": {0: 51.3, 1: nan, 2: nan},
}
)
pd.DataFrame(
{
"country": {0: "Afghanistan", 1: "Angola", 2: "Albania"},
"1975": {0: 44.8, 1: 48.1, 2: 75.1},
"1976": {0: 44.8, 1: 48.1, 2: 75.1},
"2018": {0: 24.2, 1: 40.4, 2: 28.4},
"2019": {0: 40.5, 1: 37.6, 2: 35.9},
}
)
我连接并操作并得到
cm = pd.concat([inq, corr], axis=0, keys=["Inequality", "Corruption"]).reset_index(
level=1, drop=True
)
一个新的数据框
pd.DataFrame(
{
"indicator": {0: "Inequality", 1: "Inequality", 2: "Inequality"},
"country": {0: "Angola", 1: "Albania", 2: "United Arab Emirates"},
"1967": {0: nan, 1: nan, 2: nan},
"1969": {0: nan, 1: nan, 2: nan},
"2018": {0: 51.3, 1: nan, 2: nan},
"2019": {0: nan, 1: nan, 2: nan},
}
)
您应该以不同的方式连接您的数据框:
df = (pd.concat([inq.set_index('country'),
corr.set_index('country')],
axis=1,
keys=["Inequality", "Corruption"]
)
.stack(level=1)
)
Inequality Corruption
country
Angola 1975 NaN 48.1
1976 NaN 48.1
2018 51.3 40.4
2019 NaN 37.6
Albania 1975 NaN 75.1
1976 NaN 75.1
2017 33.2 NaN
2018 NaN 28.4
2019 NaN 35.9
Afghanistan 1975 NaN 44.8
1976 NaN 44.8
2018 NaN 24.2
2019 NaN 40.5
然后绘制:
df.plot.scatter(x='Corruption', y='Inequality')
注意。只有一点,因为你的大部分数据都是 NaN