如何为每一列绘制前十名 python

Question

我想根据每列的值绘制前 10 个国家（显示在行索引中）。

列是我用来评估国家的特征：“feature1”，“feature2”，“feature3”，“feature4”，“feature5”，“feature6”，“feature7”，“feature8”，特征 9”，“特征 10”。所以这将是 10 个排名前 10 的图表。然后我想要考虑到所有列的“全球”前 10 名（假设每列具有相同的系数）。

我正在考虑制作一个新的 df，以显示这些前 10 个 df 中出现次数最多的国家（在“前 10 个”中出现次数最多的国家），但不知道该怎么做。

我很挣扎，所以我开始从名为“data_etude”的原始大型数据集创建新数据帧，我复制了“date_etude_copy”。对于每个新数据框“data_ind”，我添加了一个新列，以根据我正在分析的每个 feature/column 显示前 10 个（列是特征，行是国家/地区采用的值）。

然后我写了一个脚本来从这些数据框创建另一个数据框，它只显示前 10 名的排名、值和参数。我知道这很费力，作为初学者，我没有设法从这个循环... 原始数据集：

    data_etude_copy = data_etude.copy()

每个特征的前 10 个国家的数据帧（但应该做一个循环，这太费力了） :

            data_ind1 = data_etude_copy.sort_values(by=['feature1'], ascending=False).head(10)

            data_ind2 = data_etude_copy.sort_values(by=['feature2'], ascending=False).head(10)

            data_ind3 = data_etude_copy.sort_values(by=['feature3'], ascending=False).head(10)
            
            data_ind4 = data_etude_copy.sort_values(by=['feature4'], ascending=False).head(10)
            
            data_ind5 = data_etude_copy.sort_values(by=['feature5'], ascending=False).head(10)
            
            data_ind6 = data_etude_copy.sort_values(by=['feature6'], ascending=False).head(10)
            
            data_ind7 = data_etude_copy.sort_values(by=['feature7'], ascending=False).head(10)
            
            data_ind8 = data_etude_copy.sort_values(by=['feature8'], ascending=False).head(10)
            
            data_ind9 = data_etude_copy.sort_values(by=['feature9'], ascending=False).head(10)
            
            data_ind10 = data_etude_copy.sort_values(by=['feature10'], ascending=False).head(10)

和每个功能的前 10 个简化的 dfs（我需要一个我知道的循环...）

data_ind1.drop(data_ind1.loc[:,data_ind1.columns!="feature1"], inplace=True, axis = 1)            

data_ind2.drop(data_ind2.loc[:,data_ind2.columns!="feature2"], inplace=True, axis = 1)
           
data_ind3.drop(data_ind3.loc[:,data_ind3.columns!="feature3"], inplace=True, axis = 1)
    
            
data_ind4.drop(data_ind4.loc[:,data_ind4.columns!="feature4"], inplace=True, axis = 1)
    

data_ind5.drop(data_ind5.loc[:,data_ind5.columns!="feature5"], inplace=True, axis = 1)
        
                
data_ind6.drop(data_ind6.loc[:,data_ind6.columns!="feature6"], inplace=True, axis = 1)
    
            
data_ind7.drop(data_ind7.loc[:,data_ind7.columns!="feature7"], inplace=True, axis = 1)
    
            
data_ind8.drop(data_ind8.loc[:,data_ind8.columns!="feature8"], inplace=True, axis = 1)
    
            
data_ind9.drop(data_ind9.loc[:,data_ind9.columns!="feature9"], inplace=True, axis = 1)

data_ind10.drop(data_ind3.loc[:,data_ind10.columns!="feature10"], inplace=True, axis = 1)

我怎样才能把它变成一个循环并绘制出目标结果？也就是说：

-为每个特征绘制前 10 个国家

-然后考虑所有 10 个特征的最终“前 10 个国家”（每个 df 中出现最多的国家或如果所有特征具有相同的系数值则排名最好的国家）？

Answer 1

我想这就是您要的？我将您的代码放入 for 循环形式并添加了用于对国家/地区进行总体排名的代码。总体排名基于所有功能，而不仅仅是前 10 个列表，但如果您喜欢其他方式，则只需在第一个 for 循环中切换注释块的顺序。我也不确定你想如何显示它，所以目前它只打印最终的数据帧。它可能不是有史以来最干净的代码，但我希望它能有所帮助！

import pandas as pd
import numpy as np

data = np.random.randint(100,size=(12,10))

countries = [
    'Country1',
    'Country2',
    'Country3',
    'Country4',
    'Country5',
    'Country6',
    'Country7',
    'Country8',
    'Country9',
    'Country10',
    'Country11',
    'Country12',
]
feature_names_weights = {
    'feature1'  :1.0,
    'feature2'  :1.0,
    'feature3'  :1.0,
    'feature4'  :1.0,
    'feature5'  :1.0,
    'feature6'  :1.0,
    'feature7'  :1.0,
    'feature8'  :1.0,
    'feature9'  :1.0,
    'feature10' :1.0,
}
feature_names = list(feature_names_weights.keys())

df = pd.DataFrame(data=data, index=countries, columns=feature_names)
data_etude_copy = df

data_sorted_by_feature = {}
country_scores = (pd.DataFrame(data=np.zeros(len(countries)),index=countries))[0]

for feature in feature_names:
    #Adds to each country's score and multiplies by weight factor for each feature
    for country in countries:
        country_scores[country] += data_etude_copy[feature][country]*(feature_names_weights[feature])
    #Sorts the countries by feature (your code in loop form)
    data_sorted_by_feature[feature] = data_etude_copy.sort_values(by=[feature], ascending=False).head(10)
    data_sorted_by_feature[feature].drop(data_sorted_by_feature[feature].loc[:,data_sorted_by_feature[feature].columns!=feature], inplace=True, axis = 1)

#sort country total scores
ranked_countries = country_scores.sort_values(ascending=False).head(10)

##Put everything into one DataFrame
#Create empty DataFrame
empty_data=np.empty((10,11),str)
outputDF = pd.DataFrame(data=empty_data,columns=((feature_names)+['Overall']))
#Add entries for all features
for feature in feature_names:
    for index in range(10):
        country = list(data_sorted_by_feature[feature].index)[index]
        outputDF[feature][index] = f'{country}: {data_sorted_by_feature[feature][feature][country]}'
#Add column for overall country score
for index in range(10):
    country = list(ranked_countries.index)[index]
    outputDF['Overall'][index] = f'{country}: {ranked_countries[country]}'

#Print DataFrame
print(outputDF)

示例数据在：

           feature1  feature2  feature3  feature4  feature5  feature6  feature7  feature8  feature9  feature10
Country1         40        31         5         6         4        67        65        57        52         96
Country2         93        20        41        65        44        21        91        25        43         75
Country3         93        34        87        69         0        25        65        71        17         91
Country4         24        20        41        68        46         1        94        87        11         97
Country5         90        21        93         0        72        20        44        87        16         42
Country6         93        17        33        40        96        53         1        97        51         20
Country7         82        50        34        27        44        38        49        85         7         70
Country8         33        81        14         5        72        13        13        53        39         47
Country9         18        38        20        32        52        96        51        93        53         16
Country10        75        94        91        59        39        24         7         0        96         57
Country11        62         9        33        89         5        77        37        63        42         29
Country12         7        98        43        71        98        81        48        13        61         69

对应输出：

        feature1       feature2       feature3       feature4       feature5       feature6       feature7       feature8       feature9      feature10           Overall
0   Country2: 93  Country12: 98   Country5: 93  Country11: 89  Country12: 98   Country9: 96   Country4: 94   Country6: 97  Country10: 96   Country4: 97  Country12: 589.0
1   Country3: 93  Country10: 94  Country10: 91  Country12: 71   Country6: 96  Country12: 81   Country2: 91   Country9: 93  Country12: 61   Country1: 96   Country3: 552.0
2   Country6: 93   Country8: 81   Country3: 87   Country3: 69   Country5: 72  Country11: 77   Country1: 65   Country4: 87   Country9: 53   Country3: 91  Country10: 542.0
3   Country5: 90   Country7: 50  Country12: 43   Country4: 68   Country8: 72   Country1: 67   Country3: 65   Country5: 87   Country1: 52   Country2: 75   Country2: 518.0
4   Country7: 82   Country9: 38   Country2: 41   Country2: 65   Country9: 52   Country6: 53   Country9: 51   Country7: 85   Country6: 51   Country7: 70   Country6: 501.0
5  Country10: 75   Country3: 34   Country4: 41  Country10: 59   Country4: 46   Country7: 38   Country7: 49   Country3: 71   Country2: 43  Country12: 69   Country4: 489.0
6  Country11: 62   Country1: 31   Country7: 34   Country6: 40   Country2: 44   Country3: 25  Country12: 48  Country11: 63  Country11: 42  Country10: 57   Country7: 486.0
7   Country1: 40   Country5: 21   Country6: 33   Country9: 32   Country7: 44  Country10: 24   Country5: 44   Country1: 57   Country8: 39   Country8: 47   Country5: 485.0
8   Country8: 33   Country2: 20  Country11: 33   Country7: 27  Country10: 39   Country2: 21  Country11: 37   Country8: 53   Country3: 17   Country5: 42   Country9: 469.0
9   Country4: 24   Country4: 20   Country9: 20    Country1: 6   Country11: 5   Country5: 20   Country8: 13   Country2: 25   Country5: 16  Country11: 29  Country11: 446.0

如何为每一列绘制前十名 python

How to plot top ten for each column python

python

visualization

loops

dataframe

pandas