重复加入问题 values/rows
Issue with joining repeated values/rows
python 的新手,似乎无法理解如何进行。
在使用 bin 并编辑我的数据框后,我能够想出这个:
Continents % Renewable Country
0 Asia (15.753, 29.227] China
1 North America (2.212, 15.753] United States
2 Asia (2.212, 15.753] Japan
3 Europe (2.212, 15.753] United Kingdom
4 Europe (15.753, 29.227] Russian Federation
5 North America (56.174, 69.648] Canada
6 Europe (15.753, 29.227] Germany
7 Asia (2.212, 15.753] India
8 Europe (15.753, 29.227] France
9 Asia (2.212, 15.753] South Korea
10 Europe (29.227, 42.701] Italy
11 Europe (29.227, 42.701] Spain
12 Asia (2.212, 15.753] Iran
13 Australia (2.212, 15.753] Australia
14 South America (56.174, 69.648] Brazil
现在,当我使用 :
将大陆和可再生能源百分比设置为 miltiindex 时
Top15 = Top15.groupby(by=['Continents', '% Renewable']).sum()
获得以下内容:
Country
Continents % Renewable
Asia (15.753, 29.227] China
(2.212, 15.753] JapanIndiaSouth KoreaIran
Australia (2.212, 15.753] Australia
Europe (15.753, 29.227] Russian FederationGermanyFrance
(2.212, 15.753] United Kingdom
(29.227, 42.701] ItalySpain
North America (2.212, 15.753] United States
(56.174, 69.648] Canada
South America (56.174, 69.648] Brazil
现在我想有一个列可以给我每个索引中的国家/地区数量,即:
第一排 - 中国 =1 ,
第二行日本印度韩国伊朗是 4
所以最后我想要这样的东西:
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
我就是不知道怎么去。
此外,数字需要按降序排序,同时仍保持索引分组不变。
Top15.groupby(['Continents', '% Renewable']).Country.count()
Continents % Renewable
Asia (15.753, 29.227] 1
(2.212, 15.753] 4
Australia (2.212, 15.753] 1
Europe (15.753, 29.227] 3
(2.212, 15.753] 1
(29.227, 42.701] 2
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
Name: Country, dtype: int64
按您喜欢的顺序排序
Top15_count = Top15.groupby(['Continents', '% Renewable']).Country.count()
Top15_count.reset_index() \
.sort_values(
['Continents', 'Country'],
ascending=[True, False]
).set_index(['Continents', '% Renewable']).Country
Continents % Renewable
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
Australia (2.212, 15.753] 1
Europe (15.753, 29.227] 3
(29.227, 42.701] 2
(2.212, 15.753] 1
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
Name: Country, dtype: int64
size
的解决方案:
print (Top15.groupby(['Continents', '% Renewable']).size())
Name: Country, dtype: int64
Continents % Renewable
Asia (15.753, 29.227] 1
(2.212, 15.753] 4
Australia (2.212, 15.753] 1
Europe (15.753, 29.227] 3
(2.212, 15.753] 1
(29.227, 42.701] 2
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
dtype: int64
使用sort_values
if need change order and for dataframe add reset_index
, last if need MultiIndex
add set_index
:
print (Top15.groupby(['Continents', '% Renewable']) \
.size() \
.reset_index(name='COUNT') \
.sort_values(['Continents', 'COUNT'], ascending=[True, False]) \
.set_index(['Continents','% Renewable']).COUNT)
Continents % Renewable
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
Australia (2.212, 15.753] 1
Europe (15.753, 29.227] 3
(29.227, 42.701] 2
(2.212, 15.753] 1
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
Name: COUNT, dtype: int64
python 的新手,似乎无法理解如何进行。 在使用 bin 并编辑我的数据框后,我能够想出这个:
Continents % Renewable Country
0 Asia (15.753, 29.227] China
1 North America (2.212, 15.753] United States
2 Asia (2.212, 15.753] Japan
3 Europe (2.212, 15.753] United Kingdom
4 Europe (15.753, 29.227] Russian Federation
5 North America (56.174, 69.648] Canada
6 Europe (15.753, 29.227] Germany
7 Asia (2.212, 15.753] India
8 Europe (15.753, 29.227] France
9 Asia (2.212, 15.753] South Korea
10 Europe (29.227, 42.701] Italy
11 Europe (29.227, 42.701] Spain
12 Asia (2.212, 15.753] Iran
13 Australia (2.212, 15.753] Australia
14 South America (56.174, 69.648] Brazil
现在,当我使用 :
将大陆和可再生能源百分比设置为 miltiindex 时Top15 = Top15.groupby(by=['Continents', '% Renewable']).sum()
获得以下内容:
Country
Continents % Renewable
Asia (15.753, 29.227] China
(2.212, 15.753] JapanIndiaSouth KoreaIran
Australia (2.212, 15.753] Australia
Europe (15.753, 29.227] Russian FederationGermanyFrance
(2.212, 15.753] United Kingdom
(29.227, 42.701] ItalySpain
North America (2.212, 15.753] United States
(56.174, 69.648] Canada
South America (56.174, 69.648] Brazil
现在我想有一个列可以给我每个索引中的国家/地区数量,即:
第一排 - 中国 =1 ,
第二行日本印度韩国伊朗是 4
所以最后我想要这样的东西:
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
我就是不知道怎么去。
此外,数字需要按降序排序,同时仍保持索引分组不变。
Top15.groupby(['Continents', '% Renewable']).Country.count()
Continents % Renewable
Asia (15.753, 29.227] 1
(2.212, 15.753] 4
Australia (2.212, 15.753] 1
Europe (15.753, 29.227] 3
(2.212, 15.753] 1
(29.227, 42.701] 2
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
Name: Country, dtype: int64
按您喜欢的顺序排序
Top15_count = Top15.groupby(['Continents', '% Renewable']).Country.count()
Top15_count.reset_index() \
.sort_values(
['Continents', 'Country'],
ascending=[True, False]
).set_index(['Continents', '% Renewable']).Country
Continents % Renewable
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
Australia (2.212, 15.753] 1
Europe (15.753, 29.227] 3
(29.227, 42.701] 2
(2.212, 15.753] 1
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
Name: Country, dtype: int64
size
的解决方案:
print (Top15.groupby(['Continents', '% Renewable']).size())
Name: Country, dtype: int64
Continents % Renewable
Asia (15.753, 29.227] 1
(2.212, 15.753] 4
Australia (2.212, 15.753] 1
Europe (15.753, 29.227] 3
(2.212, 15.753] 1
(29.227, 42.701] 2
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
dtype: int64
使用sort_values
if need change order and for dataframe add reset_index
, last if need MultiIndex
add set_index
:
print (Top15.groupby(['Continents', '% Renewable']) \
.size() \
.reset_index(name='COUNT') \
.sort_values(['Continents', 'COUNT'], ascending=[True, False]) \
.set_index(['Continents','% Renewable']).COUNT)
Continents % Renewable
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
Australia (2.212, 15.753] 1
Europe (15.753, 29.227] 3
(29.227, 42.701] 2
(2.212, 15.753] 1
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
Name: COUNT, dtype: int64