将 pandas 个系列分组到 bin 中
Grouping pandas series into bins
我有以下Pandas系列:
Asia China 19.7549
Japan 10.2328
India 14.9691
South Korea 2.27935
Iran 5.70772
North America United States 11.571
Canada 61.9454
Europe United Kingdom 10.6005
Russian Federation 17.2887
Germany 17.9015
France 17.0203
Italy 33.6672
Spain 37.9686
Australia Australia 11.8108
South America Brazil 69.648
Name: % Renewable, dtype: object
我已经分箱这个数据到5箱:
binning = pd.cut(Reducedset['% Renewable'],5)
然后我想计算 个国家 的 每个 个 bins:
df.groupby(binning)['% Renewable'].agg(['count'])
因此,最终的数据框应该只有 'continents' 作为索引,不是国家。
但是,这个公式不起作用。
我当前的输出是这样的:
count
binning
(2.212, 15.753] 7
(15.753, 29.227] 4
(29.227, 42.701] 2
(56.174, 69.648] 2
我想在这里显示 'Continent'...
的索引
有人能帮帮我吗?
确保您不会犯愚蠢的错误,例如使用不正确的数据框名称:
Reducedset.groupby(binning)['% Renewable'].agg(['count'])
据我了解,您有:
- a DataFrame(不是 Series)名为 Reducedset,
- 具有一个名为 % Renewable、
的列
- 具有 2 级多索引(大陆 和国家)。
因为稍后需要对各个行进行装箱,即使经过一些
索引发生变化,最好将binning另存为另一列:
Reducedset['binning'] = pd.cut(Reducedset['% Renewable'], 5)
结果是:
% Renewable binning
continents countries
Asia China 19.75490 (15.753, 29.227]
Japan 10.23280 (2.212, 15.753]
India 14.96910 (2.212, 15.753]
South Korea 2.27935 (2.212, 15.753]
Iran 5.70772 (2.212, 15.753]
North America United States 11.57100 (2.212, 15.753]
Canada 61.94540 (56.174, 69.648]
Europe United Kingdom 10.60050 (2.212, 15.753]
Russian Federation 17.28870 (15.753, 29.227]
Germany 17.90150 (15.753, 29.227]
France 17.02030 (15.753, 29.227]
Italy 33.66720 (29.227, 42.701]
Spain 37.96860 (29.227, 42.701]
Australia Australia 11.81080 (2.212, 15.753]
South America Brazil 69.64800 (56.174, 69.648]
如果你想在索引中只有 continents,你可以 运行:
Reducedset.reset_index('countries', inplace=True)
可以打印出来,按binning排序,结果为:
countries % Renewable binning
continents
Asia Japan 10.23280 (2.212, 15.753]
Asia India 14.96910 (2.212, 15.753]
Asia South Korea 2.27935 (2.212, 15.753]
Asia Iran 5.70772 (2.212, 15.753]
North America United States 11.57100 (2.212, 15.753]
Europe United Kingdom 10.60050 (2.212, 15.753]
Australia Australia 11.81080 (2.212, 15.753]
Asia China 19.75490 (15.753, 29.227]
Europe Russian Federation 17.28870 (15.753, 29.227]
Europe Germany 17.90150 (15.753, 29.227]
Europe France 17.02030 (15.753, 29.227]
Europe Italy 33.66720 (29.227, 42.701]
Europe Spain 37.96860 (29.227, 42.701]
North America Canada 61.94540 (56.174, 69.648]
South America Brazil 69.64800 (56.174, 69.648]
正如您所见,在 (2.212, 15.753] bin 中,您有来自
4 个大洲,因此仍需要有关国家/地区的信息
(尽管您可以将其作为 "regular" 列)。
现在您也可以执行聚合,但略有变化:
Reducedset.groupby('binning')['% Renewable'].agg(['count'])
(注意 Reducedset 而不是 df 和 binning 周围的撇号,
因为它现在是您的 DataFrame 中的 列 )。
我有以下Pandas系列:
Asia China 19.7549
Japan 10.2328
India 14.9691
South Korea 2.27935
Iran 5.70772
North America United States 11.571
Canada 61.9454
Europe United Kingdom 10.6005
Russian Federation 17.2887
Germany 17.9015
France 17.0203
Italy 33.6672
Spain 37.9686
Australia Australia 11.8108
South America Brazil 69.648
Name: % Renewable, dtype: object
我已经分箱这个数据到5箱:
binning = pd.cut(Reducedset['% Renewable'],5)
然后我想计算 个国家 的 每个 个 bins:
df.groupby(binning)['% Renewable'].agg(['count'])
因此,最终的数据框应该只有 'continents' 作为索引,不是国家。
但是,这个公式不起作用。
我当前的输出是这样的:
count
binning
(2.212, 15.753] 7
(15.753, 29.227] 4
(29.227, 42.701] 2
(56.174, 69.648] 2
我想在这里显示 'Continent'...
的索引有人能帮帮我吗?
确保您不会犯愚蠢的错误,例如使用不正确的数据框名称:
Reducedset.groupby(binning)['% Renewable'].agg(['count'])
据我了解,您有:
- a DataFrame(不是 Series)名为 Reducedset,
- 具有一个名为 % Renewable、 的列
- 具有 2 级多索引(大陆 和国家)。
因为稍后需要对各个行进行装箱,即使经过一些 索引发生变化,最好将binning另存为另一列:
Reducedset['binning'] = pd.cut(Reducedset['% Renewable'], 5)
结果是:
% Renewable binning
continents countries
Asia China 19.75490 (15.753, 29.227]
Japan 10.23280 (2.212, 15.753]
India 14.96910 (2.212, 15.753]
South Korea 2.27935 (2.212, 15.753]
Iran 5.70772 (2.212, 15.753]
North America United States 11.57100 (2.212, 15.753]
Canada 61.94540 (56.174, 69.648]
Europe United Kingdom 10.60050 (2.212, 15.753]
Russian Federation 17.28870 (15.753, 29.227]
Germany 17.90150 (15.753, 29.227]
France 17.02030 (15.753, 29.227]
Italy 33.66720 (29.227, 42.701]
Spain 37.96860 (29.227, 42.701]
Australia Australia 11.81080 (2.212, 15.753]
South America Brazil 69.64800 (56.174, 69.648]
如果你想在索引中只有 continents,你可以 运行:
Reducedset.reset_index('countries', inplace=True)
可以打印出来,按binning排序,结果为:
countries % Renewable binning
continents
Asia Japan 10.23280 (2.212, 15.753]
Asia India 14.96910 (2.212, 15.753]
Asia South Korea 2.27935 (2.212, 15.753]
Asia Iran 5.70772 (2.212, 15.753]
North America United States 11.57100 (2.212, 15.753]
Europe United Kingdom 10.60050 (2.212, 15.753]
Australia Australia 11.81080 (2.212, 15.753]
Asia China 19.75490 (15.753, 29.227]
Europe Russian Federation 17.28870 (15.753, 29.227]
Europe Germany 17.90150 (15.753, 29.227]
Europe France 17.02030 (15.753, 29.227]
Europe Italy 33.66720 (29.227, 42.701]
Europe Spain 37.96860 (29.227, 42.701]
North America Canada 61.94540 (56.174, 69.648]
South America Brazil 69.64800 (56.174, 69.648]
正如您所见,在 (2.212, 15.753] bin 中,您有来自 4 个大洲,因此仍需要有关国家/地区的信息 (尽管您可以将其作为 "regular" 列)。
现在您也可以执行聚合,但略有变化:
Reducedset.groupby('binning')['% Renewable'].agg(['count'])
(注意 Reducedset 而不是 df 和 binning 周围的撇号, 因为它现在是您的 DataFrame 中的 列 )。