如何为每一列绘制前十名 python
How to plot top ten for each column python
我想根据每列的值绘制前 10 个国家(显示在行索引中)。
列是我用来评估国家的特征:“feature1”,“feature2”,“feature3”,“feature4”,“feature5”,“feature6”,“feature7”,“feature8”,特征 9”,“特征 10”。
所以这将是 10 个排名前 10 的图表。然后我想要考虑到所有列的“全球”前 10 名(假设每列具有相同的系数)。
我正在考虑制作一个新的 df,以显示这些前 10 个 df 中出现次数最多的国家(在“前 10 个”中出现次数最多的国家),但不知道该怎么做。
我很挣扎,所以我开始从名为“data_etude”的原始大型数据集创建新数据帧,我复制了“date_etude_copy”。
对于每个新数据框“data_ind”,我添加了一个新列,以根据我正在分析的每个 feature/column 显示前 10 个(列是特征,行是国家/地区采用的值)。
然后我写了一个脚本来从这些数据框创建另一个数据框,它只显示前 10 名的排名、值和参数。我知道这很费力,作为初学者,我没有设法从这个循环...
原始数据集:
data_etude_copy = data_etude.copy()
每个特征的前 10 个国家的数据帧(但应该做一个循环,这太费力了)
:
data_ind1 = data_etude_copy.sort_values(by=['feature1'], ascending=False).head(10)
data_ind2 = data_etude_copy.sort_values(by=['feature2'], ascending=False).head(10)
data_ind3 = data_etude_copy.sort_values(by=['feature3'], ascending=False).head(10)
data_ind4 = data_etude_copy.sort_values(by=['feature4'], ascending=False).head(10)
data_ind5 = data_etude_copy.sort_values(by=['feature5'], ascending=False).head(10)
data_ind6 = data_etude_copy.sort_values(by=['feature6'], ascending=False).head(10)
data_ind7 = data_etude_copy.sort_values(by=['feature7'], ascending=False).head(10)
data_ind8 = data_etude_copy.sort_values(by=['feature8'], ascending=False).head(10)
data_ind9 = data_etude_copy.sort_values(by=['feature9'], ascending=False).head(10)
data_ind10 = data_etude_copy.sort_values(by=['feature10'], ascending=False).head(10)
和每个功能的前 10 个简化的 dfs(我需要一个我知道的循环...)
data_ind1.drop(data_ind1.loc[:,data_ind1.columns!="feature1"], inplace=True, axis = 1)
data_ind2.drop(data_ind2.loc[:,data_ind2.columns!="feature2"], inplace=True, axis = 1)
data_ind3.drop(data_ind3.loc[:,data_ind3.columns!="feature3"], inplace=True, axis = 1)
data_ind4.drop(data_ind4.loc[:,data_ind4.columns!="feature4"], inplace=True, axis = 1)
data_ind5.drop(data_ind5.loc[:,data_ind5.columns!="feature5"], inplace=True, axis = 1)
data_ind6.drop(data_ind6.loc[:,data_ind6.columns!="feature6"], inplace=True, axis = 1)
data_ind7.drop(data_ind7.loc[:,data_ind7.columns!="feature7"], inplace=True, axis = 1)
data_ind8.drop(data_ind8.loc[:,data_ind8.columns!="feature8"], inplace=True, axis = 1)
data_ind9.drop(data_ind9.loc[:,data_ind9.columns!="feature9"], inplace=True, axis = 1)
data_ind10.drop(data_ind3.loc[:,data_ind10.columns!="feature10"], inplace=True, axis = 1)
我怎样才能把它变成一个循环并绘制出目标结果?也就是说:
-为每个特征绘制前 10 个国家
-然后考虑所有 10 个特征的最终“前 10 个国家”(每个 df 中出现最多的国家或如果所有特征具有相同的系数值则排名最好的国家)?
我想这就是您要的?我将您的代码放入 for 循环形式并添加了用于对国家/地区进行总体排名的代码。总体排名基于所有功能,而不仅仅是前 10 个列表,但如果您喜欢其他方式,则只需在第一个 for 循环中切换注释块的顺序。我也不确定你想如何显示它,所以目前它只打印最终的数据帧。它可能不是有史以来最干净的代码,但我希望它能有所帮助!
import pandas as pd
import numpy as np
data = np.random.randint(100,size=(12,10))
countries = [
'Country1',
'Country2',
'Country3',
'Country4',
'Country5',
'Country6',
'Country7',
'Country8',
'Country9',
'Country10',
'Country11',
'Country12',
]
feature_names_weights = {
'feature1' :1.0,
'feature2' :1.0,
'feature3' :1.0,
'feature4' :1.0,
'feature5' :1.0,
'feature6' :1.0,
'feature7' :1.0,
'feature8' :1.0,
'feature9' :1.0,
'feature10' :1.0,
}
feature_names = list(feature_names_weights.keys())
df = pd.DataFrame(data=data, index=countries, columns=feature_names)
data_etude_copy = df
data_sorted_by_feature = {}
country_scores = (pd.DataFrame(data=np.zeros(len(countries)),index=countries))[0]
for feature in feature_names:
#Adds to each country's score and multiplies by weight factor for each feature
for country in countries:
country_scores[country] += data_etude_copy[feature][country]*(feature_names_weights[feature])
#Sorts the countries by feature (your code in loop form)
data_sorted_by_feature[feature] = data_etude_copy.sort_values(by=[feature], ascending=False).head(10)
data_sorted_by_feature[feature].drop(data_sorted_by_feature[feature].loc[:,data_sorted_by_feature[feature].columns!=feature], inplace=True, axis = 1)
#sort country total scores
ranked_countries = country_scores.sort_values(ascending=False).head(10)
##Put everything into one DataFrame
#Create empty DataFrame
empty_data=np.empty((10,11),str)
outputDF = pd.DataFrame(data=empty_data,columns=((feature_names)+['Overall']))
#Add entries for all features
for feature in feature_names:
for index in range(10):
country = list(data_sorted_by_feature[feature].index)[index]
outputDF[feature][index] = f'{country}: {data_sorted_by_feature[feature][feature][country]}'
#Add column for overall country score
for index in range(10):
country = list(ranked_countries.index)[index]
outputDF['Overall'][index] = f'{country}: {ranked_countries[country]}'
#Print DataFrame
print(outputDF)
示例数据在:
feature1 feature2 feature3 feature4 feature5 feature6 feature7 feature8 feature9 feature10
Country1 40 31 5 6 4 67 65 57 52 96
Country2 93 20 41 65 44 21 91 25 43 75
Country3 93 34 87 69 0 25 65 71 17 91
Country4 24 20 41 68 46 1 94 87 11 97
Country5 90 21 93 0 72 20 44 87 16 42
Country6 93 17 33 40 96 53 1 97 51 20
Country7 82 50 34 27 44 38 49 85 7 70
Country8 33 81 14 5 72 13 13 53 39 47
Country9 18 38 20 32 52 96 51 93 53 16
Country10 75 94 91 59 39 24 7 0 96 57
Country11 62 9 33 89 5 77 37 63 42 29
Country12 7 98 43 71 98 81 48 13 61 69
对应输出:
feature1 feature2 feature3 feature4 feature5 feature6 feature7 feature8 feature9 feature10 Overall
0 Country2: 93 Country12: 98 Country5: 93 Country11: 89 Country12: 98 Country9: 96 Country4: 94 Country6: 97 Country10: 96 Country4: 97 Country12: 589.0
1 Country3: 93 Country10: 94 Country10: 91 Country12: 71 Country6: 96 Country12: 81 Country2: 91 Country9: 93 Country12: 61 Country1: 96 Country3: 552.0
2 Country6: 93 Country8: 81 Country3: 87 Country3: 69 Country5: 72 Country11: 77 Country1: 65 Country4: 87 Country9: 53 Country3: 91 Country10: 542.0
3 Country5: 90 Country7: 50 Country12: 43 Country4: 68 Country8: 72 Country1: 67 Country3: 65 Country5: 87 Country1: 52 Country2: 75 Country2: 518.0
4 Country7: 82 Country9: 38 Country2: 41 Country2: 65 Country9: 52 Country6: 53 Country9: 51 Country7: 85 Country6: 51 Country7: 70 Country6: 501.0
5 Country10: 75 Country3: 34 Country4: 41 Country10: 59 Country4: 46 Country7: 38 Country7: 49 Country3: 71 Country2: 43 Country12: 69 Country4: 489.0
6 Country11: 62 Country1: 31 Country7: 34 Country6: 40 Country2: 44 Country3: 25 Country12: 48 Country11: 63 Country11: 42 Country10: 57 Country7: 486.0
7 Country1: 40 Country5: 21 Country6: 33 Country9: 32 Country7: 44 Country10: 24 Country5: 44 Country1: 57 Country8: 39 Country8: 47 Country5: 485.0
8 Country8: 33 Country2: 20 Country11: 33 Country7: 27 Country10: 39 Country2: 21 Country11: 37 Country8: 53 Country3: 17 Country5: 42 Country9: 469.0
9 Country4: 24 Country4: 20 Country9: 20 Country1: 6 Country11: 5 Country5: 20 Country8: 13 Country2: 25 Country5: 16 Country11: 29 Country11: 446.0
我想根据每列的值绘制前 10 个国家(显示在行索引中)。
列是我用来评估国家的特征:“feature1”,“feature2”,“feature3”,“feature4”,“feature5”,“feature6”,“feature7”,“feature8”,特征 9”,“特征 10”。 所以这将是 10 个排名前 10 的图表。然后我想要考虑到所有列的“全球”前 10 名(假设每列具有相同的系数)。
我正在考虑制作一个新的 df,以显示这些前 10 个 df 中出现次数最多的国家(在“前 10 个”中出现次数最多的国家),但不知道该怎么做。
我很挣扎,所以我开始从名为“data_etude”的原始大型数据集创建新数据帧,我复制了“date_etude_copy”。 对于每个新数据框“data_ind”,我添加了一个新列,以根据我正在分析的每个 feature/column 显示前 10 个(列是特征,行是国家/地区采用的值)。
然后我写了一个脚本来从这些数据框创建另一个数据框,它只显示前 10 名的排名、值和参数。我知道这很费力,作为初学者,我没有设法从这个循环...
原始数据集:
data_etude_copy = data_etude.copy()
每个特征的前 10 个国家的数据帧(但应该做一个循环,这太费力了) :
data_ind1 = data_etude_copy.sort_values(by=['feature1'], ascending=False).head(10)
data_ind2 = data_etude_copy.sort_values(by=['feature2'], ascending=False).head(10)
data_ind3 = data_etude_copy.sort_values(by=['feature3'], ascending=False).head(10)
data_ind4 = data_etude_copy.sort_values(by=['feature4'], ascending=False).head(10)
data_ind5 = data_etude_copy.sort_values(by=['feature5'], ascending=False).head(10)
data_ind6 = data_etude_copy.sort_values(by=['feature6'], ascending=False).head(10)
data_ind7 = data_etude_copy.sort_values(by=['feature7'], ascending=False).head(10)
data_ind8 = data_etude_copy.sort_values(by=['feature8'], ascending=False).head(10)
data_ind9 = data_etude_copy.sort_values(by=['feature9'], ascending=False).head(10)
data_ind10 = data_etude_copy.sort_values(by=['feature10'], ascending=False).head(10)
和每个功能的前 10 个简化的 dfs(我需要一个我知道的循环...)
data_ind1.drop(data_ind1.loc[:,data_ind1.columns!="feature1"], inplace=True, axis = 1)
data_ind2.drop(data_ind2.loc[:,data_ind2.columns!="feature2"], inplace=True, axis = 1)
data_ind3.drop(data_ind3.loc[:,data_ind3.columns!="feature3"], inplace=True, axis = 1)
data_ind4.drop(data_ind4.loc[:,data_ind4.columns!="feature4"], inplace=True, axis = 1)
data_ind5.drop(data_ind5.loc[:,data_ind5.columns!="feature5"], inplace=True, axis = 1)
data_ind6.drop(data_ind6.loc[:,data_ind6.columns!="feature6"], inplace=True, axis = 1)
data_ind7.drop(data_ind7.loc[:,data_ind7.columns!="feature7"], inplace=True, axis = 1)
data_ind8.drop(data_ind8.loc[:,data_ind8.columns!="feature8"], inplace=True, axis = 1)
data_ind9.drop(data_ind9.loc[:,data_ind9.columns!="feature9"], inplace=True, axis = 1)
data_ind10.drop(data_ind3.loc[:,data_ind10.columns!="feature10"], inplace=True, axis = 1)
我怎样才能把它变成一个循环并绘制出目标结果?也就是说:
-为每个特征绘制前 10 个国家
-然后考虑所有 10 个特征的最终“前 10 个国家”(每个 df 中出现最多的国家或如果所有特征具有相同的系数值则排名最好的国家)?
我想这就是您要的?我将您的代码放入 for 循环形式并添加了用于对国家/地区进行总体排名的代码。总体排名基于所有功能,而不仅仅是前 10 个列表,但如果您喜欢其他方式,则只需在第一个 for 循环中切换注释块的顺序。我也不确定你想如何显示它,所以目前它只打印最终的数据帧。它可能不是有史以来最干净的代码,但我希望它能有所帮助!
import pandas as pd
import numpy as np
data = np.random.randint(100,size=(12,10))
countries = [
'Country1',
'Country2',
'Country3',
'Country4',
'Country5',
'Country6',
'Country7',
'Country8',
'Country9',
'Country10',
'Country11',
'Country12',
]
feature_names_weights = {
'feature1' :1.0,
'feature2' :1.0,
'feature3' :1.0,
'feature4' :1.0,
'feature5' :1.0,
'feature6' :1.0,
'feature7' :1.0,
'feature8' :1.0,
'feature9' :1.0,
'feature10' :1.0,
}
feature_names = list(feature_names_weights.keys())
df = pd.DataFrame(data=data, index=countries, columns=feature_names)
data_etude_copy = df
data_sorted_by_feature = {}
country_scores = (pd.DataFrame(data=np.zeros(len(countries)),index=countries))[0]
for feature in feature_names:
#Adds to each country's score and multiplies by weight factor for each feature
for country in countries:
country_scores[country] += data_etude_copy[feature][country]*(feature_names_weights[feature])
#Sorts the countries by feature (your code in loop form)
data_sorted_by_feature[feature] = data_etude_copy.sort_values(by=[feature], ascending=False).head(10)
data_sorted_by_feature[feature].drop(data_sorted_by_feature[feature].loc[:,data_sorted_by_feature[feature].columns!=feature], inplace=True, axis = 1)
#sort country total scores
ranked_countries = country_scores.sort_values(ascending=False).head(10)
##Put everything into one DataFrame
#Create empty DataFrame
empty_data=np.empty((10,11),str)
outputDF = pd.DataFrame(data=empty_data,columns=((feature_names)+['Overall']))
#Add entries for all features
for feature in feature_names:
for index in range(10):
country = list(data_sorted_by_feature[feature].index)[index]
outputDF[feature][index] = f'{country}: {data_sorted_by_feature[feature][feature][country]}'
#Add column for overall country score
for index in range(10):
country = list(ranked_countries.index)[index]
outputDF['Overall'][index] = f'{country}: {ranked_countries[country]}'
#Print DataFrame
print(outputDF)
示例数据在:
feature1 feature2 feature3 feature4 feature5 feature6 feature7 feature8 feature9 feature10
Country1 40 31 5 6 4 67 65 57 52 96
Country2 93 20 41 65 44 21 91 25 43 75
Country3 93 34 87 69 0 25 65 71 17 91
Country4 24 20 41 68 46 1 94 87 11 97
Country5 90 21 93 0 72 20 44 87 16 42
Country6 93 17 33 40 96 53 1 97 51 20
Country7 82 50 34 27 44 38 49 85 7 70
Country8 33 81 14 5 72 13 13 53 39 47
Country9 18 38 20 32 52 96 51 93 53 16
Country10 75 94 91 59 39 24 7 0 96 57
Country11 62 9 33 89 5 77 37 63 42 29
Country12 7 98 43 71 98 81 48 13 61 69
对应输出:
feature1 feature2 feature3 feature4 feature5 feature6 feature7 feature8 feature9 feature10 Overall
0 Country2: 93 Country12: 98 Country5: 93 Country11: 89 Country12: 98 Country9: 96 Country4: 94 Country6: 97 Country10: 96 Country4: 97 Country12: 589.0
1 Country3: 93 Country10: 94 Country10: 91 Country12: 71 Country6: 96 Country12: 81 Country2: 91 Country9: 93 Country12: 61 Country1: 96 Country3: 552.0
2 Country6: 93 Country8: 81 Country3: 87 Country3: 69 Country5: 72 Country11: 77 Country1: 65 Country4: 87 Country9: 53 Country3: 91 Country10: 542.0
3 Country5: 90 Country7: 50 Country12: 43 Country4: 68 Country8: 72 Country1: 67 Country3: 65 Country5: 87 Country1: 52 Country2: 75 Country2: 518.0
4 Country7: 82 Country9: 38 Country2: 41 Country2: 65 Country9: 52 Country6: 53 Country9: 51 Country7: 85 Country6: 51 Country7: 70 Country6: 501.0
5 Country10: 75 Country3: 34 Country4: 41 Country10: 59 Country4: 46 Country7: 38 Country7: 49 Country3: 71 Country2: 43 Country12: 69 Country4: 489.0
6 Country11: 62 Country1: 31 Country7: 34 Country6: 40 Country2: 44 Country3: 25 Country12: 48 Country11: 63 Country11: 42 Country10: 57 Country7: 486.0
7 Country1: 40 Country5: 21 Country6: 33 Country9: 32 Country7: 44 Country10: 24 Country5: 44 Country1: 57 Country8: 39 Country8: 47 Country5: 485.0
8 Country8: 33 Country2: 20 Country11: 33 Country7: 27 Country10: 39 Country2: 21 Country11: 37 Country8: 53 Country3: 17 Country5: 42 Country9: 469.0
9 Country4: 24 Country4: 20 Country9: 20 Country1: 6 Country11: 5 Country5: 20 Country8: 13 Country2: 25 Country5: 16 Country11: 29 Country11: 446.0