将 pandas 个系列分组到 bin 中

Grouping pandas series into bins

我有以下Pandas系列:

Asia           China                 19.7549
               Japan                 10.2328
               India                 14.9691
               South Korea           2.27935
               Iran                  5.70772
North America  United States          11.571
               Canada                61.9454
Europe         United Kingdom        10.6005
               Russian Federation    17.2887
               Germany               17.9015
               France                17.0203
               Italy                 33.6672
               Spain                 37.9686
Australia      Australia             11.8108
South America  Brazil                 69.648
Name: % Renewable, dtype: object

我已经分箱这个数据到5箱:

binning = pd.cut(Reducedset['% Renewable'],5)

然后我想计算 个国家 每个 bins:

df.groupby(binning)['% Renewable'].agg(['count'])

因此,最终的数据框应该只有 'continents' 作为索引,不是国家。

但是,这个公式不起作用。

我当前的输出是这样的:

                     count
binning                
(2.212, 15.753]       7
(15.753, 29.227]      4
(29.227, 42.701]      2
(56.174, 69.648]      2

我想在这里显示 'Continent'...

的索引

有人能帮帮我吗?

确保您不会犯愚蠢的错误,例如使用不正确的数据框名称:

Reducedset.groupby(binning)['% Renewable'].agg(['count'])

据我了解,您有:

  • a DataFrame(不是 Series)名为 Reducedset,
  • 具有一个名为 % Renewable
  • 的列
  • 具有 2 级多索引(大陆国家)。

因为稍后需要对各个行进行装箱,即使经过一些 索引发生变化,最好将binning另存为另一列:

Reducedset['binning'] = pd.cut(Reducedset['% Renewable'], 5)

结果是:

                                  % Renewable           binning
continents    countries                                        
Asia          China                  19.75490  (15.753, 29.227]
              Japan                  10.23280   (2.212, 15.753]
              India                  14.96910   (2.212, 15.753]
              South Korea             2.27935   (2.212, 15.753]
              Iran                    5.70772   (2.212, 15.753]
North America United States          11.57100   (2.212, 15.753]
              Canada                 61.94540  (56.174, 69.648]
Europe        United Kingdom         10.60050   (2.212, 15.753]
              Russian Federation     17.28870  (15.753, 29.227]
              Germany                17.90150  (15.753, 29.227]
              France                 17.02030  (15.753, 29.227]
              Italy                  33.66720  (29.227, 42.701]
              Spain                  37.96860  (29.227, 42.701]
Australia     Australia              11.81080   (2.212, 15.753]
South America Brazil                 69.64800  (56.174, 69.648]

如果你想在索引中只有 continents,你可以 运行:

Reducedset.reset_index('countries', inplace=True)

可以打印出来,按binning排序,结果为:

                        countries  % Renewable           binning
continents                                                      
Asia                        Japan     10.23280   (2.212, 15.753]
Asia                        India     14.96910   (2.212, 15.753]
Asia                  South Korea      2.27935   (2.212, 15.753]
Asia                         Iran      5.70772   (2.212, 15.753]
North America       United States     11.57100   (2.212, 15.753]
Europe             United Kingdom     10.60050   (2.212, 15.753]
Australia               Australia     11.81080   (2.212, 15.753]
Asia                        China     19.75490  (15.753, 29.227]
Europe         Russian Federation     17.28870  (15.753, 29.227]
Europe                    Germany     17.90150  (15.753, 29.227]
Europe                     France     17.02030  (15.753, 29.227]
Europe                      Italy     33.66720  (29.227, 42.701]
Europe                      Spain     37.96860  (29.227, 42.701]
North America              Canada     61.94540  (56.174, 69.648]
South America              Brazil     69.64800  (56.174, 69.648]

正如您所见,在 (2.212, 15.753] bin 中,您有来自 4 个大洲,因此仍需要有关国家/地区的信息 (尽管您可以将其作为 "regular" 列)。

现在您也可以执行聚合,但略有变化:

Reducedset.groupby('binning')['% Renewable'].agg(['count'])

(注意 Reducedset 而不是 dfbinning 周围的撇号, 因为它现在是您的 DataFrame 中的 )。