在 python 中使用数据框实现功能

Question

这个问题让我卡了很多天。

我有这个功能:

def cal_score(research, citations, teaching, international, income):
     return .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income

其中“research”、“citations”、“teaching”、“international”和“income”是数据集的列。我想在数据集中添加一个新列，其值应根据上述函数计算。我尝试了不同的程序，但 none 有效。

示例：如果我们有如下一行

university_name  Indian Institute of Technology Bombay


teaching  43.8

international  14.3

research  24.2

citations  8,327

income   14.9

Total Score Ranking

那么总分应该计算为

Total Score =  .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income.

这应该适用于数据集中的所有行。

谁能帮我实现这个要求。我现在被困在这个问题上已经有一段时间了。 :-(

Indian_univ.head(10).to_dict()

{'citations': {510: 38.799999999999997,
  832: 39.0,
  856: 45.600000000000001,
  959: 45.799999999999997,
  1232: 84.700000000000003,
  1360: 38.5,
  1361: 41.799999999999997,
  1362: 35.299999999999997,
  1363: 53.600000000000001,
  1679: 51.600000000000001},
 'country': {510: 'India',
  832: 'India',
  856: 'India',
  959: 'India',
  1232: 'India',
  1360: 'India',
  1361: 'India',
  1362: 'India',
  1363: 'India',
  1679: 'India'},
 'female_male_ratio': {510: '16 : 84',
  832: '15 : 85',
  856: '16 : 84',
  959: '17 : 83',
  1232: '46 : 54',
  1360: '18 : 82',
  1361: '13 : 87',
  1362: '15 : 85',
  1363: '17 : 83',
  1679: '19 : 81'},
 'income': {510: '24.2',
  832: '72.4',
  856: '52.7',
  959: '70.4',
  1232: '28.4',
  1360: '-',
  1361: '42.4',
  1362: '-',
  1363: '64.8',
  1679: '37.9'},
 'international': {510: '14.3',
  832: '16.1',
  856: '19.9',
  959: '15.6',
  1232: '29.3',
  1360: '15.3',
  1361: '17.3',
  1362: '14.7',
  1363: '15.6',
  1679: '18.2'},
 'international_students': {510: '1%',
  832: '0%',
  856: '1%',
  959: '1%',
  1232: '1%',
  1360: '1%',
  1361: '0%',
  1362: '0%',
  1363: '1%',
  1679: '1%'},
 'num_students': {510: '8,327',
  832: '9,928',
  856: '8,327',
  959: '8,061',
  1232: '16,691',
  1360: '8,371',
  1361: '6,167',
  1362: '9,928',
  1363: '8,061',
  1679: '3,318'},
 'research': {510: 15.699999999999999,
  832: 45.299999999999997,
  856: 33.100000000000001,
  959: 13.699999999999999,
  1232: 14.0,
  1360: 23.0,
  1361: 25.199999999999999,
  1362: 30.0,
  1363: 12.300000000000001,
  1679: 39.5},
 'student_staff_ratio': {510: 14.9,
  832: 17.5,
  856: 14.9,
  959: 18.699999999999999,
  1232: 23.899999999999999,
  1360: 17.300000000000001,
  1361: 12.199999999999999,
  1362: 17.5,
  1363: 18.699999999999999,
  1679: 8.1999999999999993},
 'teaching': {510: 43.799999999999997,
  832: 44.200000000000003,
  856: 47.299999999999997,
  959: 30.399999999999999,
  1232: 25.800000000000001,
  1360: 33.799999999999997,
  1361: 31.300000000000001,
  1362: 39.299999999999997,
  1363: 25.100000000000001,
  1679: 32.600000000000001},
 'total_score': {510: 29.489999999999995,
  832: 38.549999999999997,
  856: 37.799999999999997,
  959: 26.969999999999999,
  1232: 37.350000000000001,
  1360: 28.589999999999996,
  1361: 29.489999999999998,
  1362: 31.379999999999995,
  1363: 27.299999999999997,
  1679: 37.109999999999999},
 'university_name': {510: 'Indian Institute of Technology Bombay',
  832: 'Indian Institute of Technology Kharagpur',
  856: 'Indian Institute of Technology Bombay',
  959: 'Indian Institute of Technology Roorkee',
  1232: 'Panjab University',
  1360: 'Indian Institute of Technology Delhi',
  1361: 'Indian Institute of Technology Kanpur',
  1362: 'Indian Institute of Technology Kharagpur',
  1363: 'Indian Institute of Technology Roorkee',
  1679: 'Indian Institute of Science'},
 'world_rank': {510: '301-350',
  832: '226-250',
  856: '251-275',
  959: '351-400',
  1232: '226-250',
  1360: '351-400',
  1361: '351-400',
  1362: '351-400',
  1363: '351-400',
  1679: '276-300'},
 'year': {510: 2012,
  832: 2013,
  856: 2013,
  959: 2013,
  1232: 2014,
  1360: 2014,
  1361: 2014,
  1362: 2014,
  1363: 2014,
  1679: 2015}}

Answer 1

我认为你可以使用：

df['Total Score'] = .3 **df.research + 
                    .3 **df.citations + 
                    .3 **df.teaching + 
                    .075 **df.international + 
                    .025 **df.income

如果需要apply功能，通常比较慢：

def cal_score(x):
     return .3 **x.research + 
            .3 **x.citations + 
            .3 **x.teaching +
            .075 **x.international + 
            .025 **x.income

df['Total Score'] = df.apply(cal_score, axis=1)

编辑数据：

你首先需要replace columns num_students and income and then convert to float by astype:

EDIT2 数据样本：

import pandas as pd

df = pd.DataFrame({'citations': {510: 38.799999999999997, 832: 39.0, 856: 45.600000000000001, 959: 45.799999999999997, 1232: 84.700000000000003, 1360: 38.5, 1361: 41.799999999999997, 1362: 35.299999999999997, 1363: 53.600000000000001, 1679: 51.600000000000001}, 'country': {510: 'India', 832: 'India', 856: 'India', 959: 'India', 1232: 'India', 1360: 'India', 1361: 'India', 1362: 'India', 1363: 'India', 1679: 'India'}, 'female_male_ratio': {510: '16 : 84', 832: '15 : 85', 856: '16 : 84', 959: '17 : 83', 1232: '46 : 54', 1360: '18 : 82', 1361: '13 : 87', 1362: '15 : 85', 1363: '17 : 83', 1679: '19 : 81'}, 'income': {510: '24.2', 832: '72.4', 856: '52.7', 959: '70.4', 1232: '28.4', 1360: '-', 1361: '42.4', 1362: '-', 1363: '64.8', 1679: '37.9'}, 'international': {510: '14.3', 832: '16.1', 856: '19.9', 959: '15.6', 1232: '29.3', 1360: '15.3', 1361: '17.3', 1362: '14.7', 1363: '15.6', 1679: '18.2'}, 'international_students': {510: '1%', 832: '0%', 856: '1%', 959: '1%', 1232: '1%', 1360: '1%', 1361: '0%', 1362: '0%', 1363: '1%', 1679: '1%'}, 'num_students': {510: '8,327', 832: '9,928', 856: '8,327', 959: '8,061', 1232: '16,691', 1360: '8,371', 1361: '6,167', 1362: '9,928', 1363: '8,061', 1679: '3,318'}, 'research': {510: 15.699999999999999, 832: 45.299999999999997, 856: 33.100000000000001, 959: 13.699999999999999, 1232: 14.0, 1360: 23.0, 1361: 25.199999999999999, 1362: 30.0, 1363: 12.300000000000001, 1679: 39.5}, 'student_staff_ratio': {510: 14.9, 832: 17.5, 856: 14.9, 959: 18.699999999999999, 1232: 23.899999999999999, 1360: 17.300000000000001, 1361: 12.199999999999999, 1362: 17.5, 1363: 18.699999999999999, 1679: 8.1999999999999993}, 'teaching': {510: 43.799999999999997, 832: 44.200000000000003, 856: 47.299999999999997, 959: 30.399999999999999, 1232: 25.800000000000001, 1360: 33.799999999999997, 1361: 31.300000000000001, 1362: 39.299999999999997, 1363: 25.100000000000001, 1679: 32.600000000000001}, 'total_score': {510: 29.489999999999995, 832: 38.549999999999997, 856: 37.799999999999997, 959: 26.969999999999999, 1232: 37.350000000000001, 1360: 28.589999999999996, 1361: 29.489999999999998, 1362: 31.379999999999995, 1363: 27.299999999999997, 1679: 37.109999999999999}, 'university_name': {510: 'Indian Institute of Technology Bombay', 832: 'Indian Institute of Technology Kharagpur', 856: 'Indian Institute of Technology Bombay', 959: 'Indian Institute of Technology Roorkee', 1232: 'Panjab University', 1360: 'Indian Institute of Technology Delhi', 1361: 'Indian Institute of Technology Kanpur', 1362: 'Indian Institute of Technology Kharagpur', 1363: 'Indian Institute of Technology Roorkee', 1679: 'Indian Institute of Science'}, 'world_rank': {510: '301-350', 832: '226-250', 856: '251-275', 959: '351-400', 1232: '226-250', 1360: '351-400', 1361: '351-400', 1362: '351-400', 1363: '351-400', 1679: '276-300'}, 'year': {510: 2012, 832: 2013, 856: 2013, 959: 2013, 1232: 2014, 1360: 2014, 1361: 2014, 1362: 2014, 1363: 2014, 1679: 2015}})

#replace , to empty string
df['num_students'] = df.num_students.str.replace(',', '')
#replace - to '0'
df['income'] = df['income'].str.replace('-', '0')

#convert columns to float
df[['teaching', 'international', 'research', 'citations', 'income']] = 
df[['teaching', 'international', 'research', 'citations', 'income']].astype(float)

df['Total Score'] = .3 **df.research + 
                    .3 **df.citations +  
                    .3 **df.teaching +  
                    .075 **df.international +  
                    .025 **df.income

print (df)

      citations country female_male_ratio  income  international  \
510        38.8   India           16 : 84    24.2           14.3   
832        39.0   India           15 : 85    72.4           16.1   
856        45.6   India           16 : 84    52.7           19.9   
959        45.8   India           17 : 83    70.4           15.6   
1232       84.7   India           46 : 54    28.4           29.3   
1360       38.5   India           18 : 82     0.0           15.3   
1361       41.8   India           13 : 87    42.4           17.3   
1362       35.3   India           15 : 85     0.0           14.7   
1363       53.6   India           17 : 83    64.8           15.6   
1679       51.6   India           19 : 81    37.9           18.2   

     international_students num_students  research  student_staff_ratio  \
510                      1%         8327      15.7                 14.9   
832                      0%         9928      45.3                 17.5   
856                      1%         8327      33.1                 14.9   
959                      1%         8061      13.7                 18.7   
1232                     1%        16691      14.0                 23.9   
1360                     1%         8371      23.0                 17.3   
1361                     0%         6167      25.2                 12.2   
1362                     0%         9928      30.0                 17.5   
1363                     1%         8061      12.3                 18.7   
1679                     1%         3318      39.5                  8.2   

      teaching  total_score                           university_name  \
510       43.8        29.49     Indian Institute of Technology Bombay   
832       44.2        38.55  Indian Institute of Technology Kharagpur   
856       47.3        37.80     Indian Institute of Technology Bombay   
959       30.4        26.97    Indian Institute of Technology Roorkee   
1232      25.8        37.35                         Panjab University   
1360      33.8        28.59      Indian Institute of Technology Delhi   
1361      31.3        29.49     Indian Institute of Technology Kanpur   
1362      39.3        31.38  Indian Institute of Technology Kharagpur   
1363      25.1        27.30    Indian Institute of Technology Roorkee   
1679      32.6        37.11               Indian Institute of Science   

     world_rank  year   Total Score  
510     301-350  2012  6.177371e-09  
832     226-250  2013  7.776087e-19  
856     251-275  2013  4.928529e-18  
959     351-400  2013  6.863746e-08  
1232    226-250  2014  4.782972e-08  
1360    351-400  2014  1.000000e+00  
1361    351-400  2014  6.664022e-14  
1362    351-400  2014  1.000000e+00  
1363    351-400  2014  3.703322e-07  
1679    276-300  2015  9.003721e-18

Answer 2

这里是最直接的方式：

df.assign(TotalScore=.3 **df.research + .3 **df.citations + .3 **df.teaching +.075 **df.international + .025 **df.income)

在 python 中使用数据框实现功能

Implementing functions with dataframes in python

python

dataset

dataframe

pandas

data-science