在 python 中使用数据框实现功能
Implementing functions with dataframes in python
这个问题让我卡了很多天。
我有这个功能:
def cal_score(research, citations, teaching, international, income):
return .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income
其中“research”、“citations”、“teaching”、“international”和“income”是数据集的列。我想在数据集中添加一个新列,其值应根据上述函数计算。我尝试了不同的程序,但 none 有效。
示例:如果我们有如下一行
university_name Indian Institute of Technology Bombay
teaching 43.8
international 14.3
research 24.2
citations 8,327
income 14.9
Total Score Ranking
那么总分应该计算为
Total Score = .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income.
这应该适用于数据集中的所有行。
谁能帮我实现这个要求。我现在被困在这个问题上已经有一段时间了。 :-(
Indian_univ.head(10).to_dict()
{'citations': {510: 38.799999999999997,
832: 39.0,
856: 45.600000000000001,
959: 45.799999999999997,
1232: 84.700000000000003,
1360: 38.5,
1361: 41.799999999999997,
1362: 35.299999999999997,
1363: 53.600000000000001,
1679: 51.600000000000001},
'country': {510: 'India',
832: 'India',
856: 'India',
959: 'India',
1232: 'India',
1360: 'India',
1361: 'India',
1362: 'India',
1363: 'India',
1679: 'India'},
'female_male_ratio': {510: '16 : 84',
832: '15 : 85',
856: '16 : 84',
959: '17 : 83',
1232: '46 : 54',
1360: '18 : 82',
1361: '13 : 87',
1362: '15 : 85',
1363: '17 : 83',
1679: '19 : 81'},
'income': {510: '24.2',
832: '72.4',
856: '52.7',
959: '70.4',
1232: '28.4',
1360: '-',
1361: '42.4',
1362: '-',
1363: '64.8',
1679: '37.9'},
'international': {510: '14.3',
832: '16.1',
856: '19.9',
959: '15.6',
1232: '29.3',
1360: '15.3',
1361: '17.3',
1362: '14.7',
1363: '15.6',
1679: '18.2'},
'international_students': {510: '1%',
832: '0%',
856: '1%',
959: '1%',
1232: '1%',
1360: '1%',
1361: '0%',
1362: '0%',
1363: '1%',
1679: '1%'},
'num_students': {510: '8,327',
832: '9,928',
856: '8,327',
959: '8,061',
1232: '16,691',
1360: '8,371',
1361: '6,167',
1362: '9,928',
1363: '8,061',
1679: '3,318'},
'research': {510: 15.699999999999999,
832: 45.299999999999997,
856: 33.100000000000001,
959: 13.699999999999999,
1232: 14.0,
1360: 23.0,
1361: 25.199999999999999,
1362: 30.0,
1363: 12.300000000000001,
1679: 39.5},
'student_staff_ratio': {510: 14.9,
832: 17.5,
856: 14.9,
959: 18.699999999999999,
1232: 23.899999999999999,
1360: 17.300000000000001,
1361: 12.199999999999999,
1362: 17.5,
1363: 18.699999999999999,
1679: 8.1999999999999993},
'teaching': {510: 43.799999999999997,
832: 44.200000000000003,
856: 47.299999999999997,
959: 30.399999999999999,
1232: 25.800000000000001,
1360: 33.799999999999997,
1361: 31.300000000000001,
1362: 39.299999999999997,
1363: 25.100000000000001,
1679: 32.600000000000001},
'total_score': {510: 29.489999999999995,
832: 38.549999999999997,
856: 37.799999999999997,
959: 26.969999999999999,
1232: 37.350000000000001,
1360: 28.589999999999996,
1361: 29.489999999999998,
1362: 31.379999999999995,
1363: 27.299999999999997,
1679: 37.109999999999999},
'university_name': {510: 'Indian Institute of Technology Bombay',
832: 'Indian Institute of Technology Kharagpur',
856: 'Indian Institute of Technology Bombay',
959: 'Indian Institute of Technology Roorkee',
1232: 'Panjab University',
1360: 'Indian Institute of Technology Delhi',
1361: 'Indian Institute of Technology Kanpur',
1362: 'Indian Institute of Technology Kharagpur',
1363: 'Indian Institute of Technology Roorkee',
1679: 'Indian Institute of Science'},
'world_rank': {510: '301-350',
832: '226-250',
856: '251-275',
959: '351-400',
1232: '226-250',
1360: '351-400',
1361: '351-400',
1362: '351-400',
1363: '351-400',
1679: '276-300'},
'year': {510: 2012,
832: 2013,
856: 2013,
959: 2013,
1232: 2014,
1360: 2014,
1361: 2014,
1362: 2014,
1363: 2014,
1679: 2015}}
我认为你可以使用:
df['Total Score'] = .3 **df.research +
.3 **df.citations +
.3 **df.teaching +
.075 **df.international +
.025 **df.income
如果需要apply
功能,通常比较慢:
def cal_score(x):
return .3 **x.research +
.3 **x.citations +
.3 **x.teaching +
.075 **x.international +
.025 **x.income
df['Total Score'] = df.apply(cal_score, axis=1)
编辑数据:
你首先需要replace
columns num_students
and income
and then convert to float
by astype
:
EDIT2 数据样本:
import pandas as pd
df = pd.DataFrame({'citations': {510: 38.799999999999997, 832: 39.0, 856: 45.600000000000001, 959: 45.799999999999997, 1232: 84.700000000000003, 1360: 38.5, 1361: 41.799999999999997, 1362: 35.299999999999997, 1363: 53.600000000000001, 1679: 51.600000000000001}, 'country': {510: 'India', 832: 'India', 856: 'India', 959: 'India', 1232: 'India', 1360: 'India', 1361: 'India', 1362: 'India', 1363: 'India', 1679: 'India'}, 'female_male_ratio': {510: '16 : 84', 832: '15 : 85', 856: '16 : 84', 959: '17 : 83', 1232: '46 : 54', 1360: '18 : 82', 1361: '13 : 87', 1362: '15 : 85', 1363: '17 : 83', 1679: '19 : 81'}, 'income': {510: '24.2', 832: '72.4', 856: '52.7', 959: '70.4', 1232: '28.4', 1360: '-', 1361: '42.4', 1362: '-', 1363: '64.8', 1679: '37.9'}, 'international': {510: '14.3', 832: '16.1', 856: '19.9', 959: '15.6', 1232: '29.3', 1360: '15.3', 1361: '17.3', 1362: '14.7', 1363: '15.6', 1679: '18.2'}, 'international_students': {510: '1%', 832: '0%', 856: '1%', 959: '1%', 1232: '1%', 1360: '1%', 1361: '0%', 1362: '0%', 1363: '1%', 1679: '1%'}, 'num_students': {510: '8,327', 832: '9,928', 856: '8,327', 959: '8,061', 1232: '16,691', 1360: '8,371', 1361: '6,167', 1362: '9,928', 1363: '8,061', 1679: '3,318'}, 'research': {510: 15.699999999999999, 832: 45.299999999999997, 856: 33.100000000000001, 959: 13.699999999999999, 1232: 14.0, 1360: 23.0, 1361: 25.199999999999999, 1362: 30.0, 1363: 12.300000000000001, 1679: 39.5}, 'student_staff_ratio': {510: 14.9, 832: 17.5, 856: 14.9, 959: 18.699999999999999, 1232: 23.899999999999999, 1360: 17.300000000000001, 1361: 12.199999999999999, 1362: 17.5, 1363: 18.699999999999999, 1679: 8.1999999999999993}, 'teaching': {510: 43.799999999999997, 832: 44.200000000000003, 856: 47.299999999999997, 959: 30.399999999999999, 1232: 25.800000000000001, 1360: 33.799999999999997, 1361: 31.300000000000001, 1362: 39.299999999999997, 1363: 25.100000000000001, 1679: 32.600000000000001}, 'total_score': {510: 29.489999999999995, 832: 38.549999999999997, 856: 37.799999999999997, 959: 26.969999999999999, 1232: 37.350000000000001, 1360: 28.589999999999996, 1361: 29.489999999999998, 1362: 31.379999999999995, 1363: 27.299999999999997, 1679: 37.109999999999999}, 'university_name': {510: 'Indian Institute of Technology Bombay', 832: 'Indian Institute of Technology Kharagpur', 856: 'Indian Institute of Technology Bombay', 959: 'Indian Institute of Technology Roorkee', 1232: 'Panjab University', 1360: 'Indian Institute of Technology Delhi', 1361: 'Indian Institute of Technology Kanpur', 1362: 'Indian Institute of Technology Kharagpur', 1363: 'Indian Institute of Technology Roorkee', 1679: 'Indian Institute of Science'}, 'world_rank': {510: '301-350', 832: '226-250', 856: '251-275', 959: '351-400', 1232: '226-250', 1360: '351-400', 1361: '351-400', 1362: '351-400', 1363: '351-400', 1679: '276-300'}, 'year': {510: 2012, 832: 2013, 856: 2013, 959: 2013, 1232: 2014, 1360: 2014, 1361: 2014, 1362: 2014, 1363: 2014, 1679: 2015}})
#replace , to empty string
df['num_students'] = df.num_students.str.replace(',', '')
#replace - to '0'
df['income'] = df['income'].str.replace('-', '0')
#convert columns to float
df[['teaching', 'international', 'research', 'citations', 'income']] =
df[['teaching', 'international', 'research', 'citations', 'income']].astype(float)
df['Total Score'] = .3 **df.research +
.3 **df.citations +
.3 **df.teaching +
.075 **df.international +
.025 **df.income
print (df)
citations country female_male_ratio income international \
510 38.8 India 16 : 84 24.2 14.3
832 39.0 India 15 : 85 72.4 16.1
856 45.6 India 16 : 84 52.7 19.9
959 45.8 India 17 : 83 70.4 15.6
1232 84.7 India 46 : 54 28.4 29.3
1360 38.5 India 18 : 82 0.0 15.3
1361 41.8 India 13 : 87 42.4 17.3
1362 35.3 India 15 : 85 0.0 14.7
1363 53.6 India 17 : 83 64.8 15.6
1679 51.6 India 19 : 81 37.9 18.2
international_students num_students research student_staff_ratio \
510 1% 8327 15.7 14.9
832 0% 9928 45.3 17.5
856 1% 8327 33.1 14.9
959 1% 8061 13.7 18.7
1232 1% 16691 14.0 23.9
1360 1% 8371 23.0 17.3
1361 0% 6167 25.2 12.2
1362 0% 9928 30.0 17.5
1363 1% 8061 12.3 18.7
1679 1% 3318 39.5 8.2
teaching total_score university_name \
510 43.8 29.49 Indian Institute of Technology Bombay
832 44.2 38.55 Indian Institute of Technology Kharagpur
856 47.3 37.80 Indian Institute of Technology Bombay
959 30.4 26.97 Indian Institute of Technology Roorkee
1232 25.8 37.35 Panjab University
1360 33.8 28.59 Indian Institute of Technology Delhi
1361 31.3 29.49 Indian Institute of Technology Kanpur
1362 39.3 31.38 Indian Institute of Technology Kharagpur
1363 25.1 27.30 Indian Institute of Technology Roorkee
1679 32.6 37.11 Indian Institute of Science
world_rank year Total Score
510 301-350 2012 6.177371e-09
832 226-250 2013 7.776087e-19
856 251-275 2013 4.928529e-18
959 351-400 2013 6.863746e-08
1232 226-250 2014 4.782972e-08
1360 351-400 2014 1.000000e+00
1361 351-400 2014 6.664022e-14
1362 351-400 2014 1.000000e+00
1363 351-400 2014 3.703322e-07
1679 276-300 2015 9.003721e-18
这里是最直接的方式:
df.assign(TotalScore=.3 **df.research + .3 **df.citations + .3 **df.teaching +.075 **df.international + .025 **df.income)
我有这个功能:
def cal_score(research, citations, teaching, international, income):
return .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income
其中“research”、“citations”、“teaching”、“international”和“income”是数据集的列。我想在数据集中添加一个新列,其值应根据上述函数计算。我尝试了不同的程序,但 none 有效。
示例:如果我们有如下一行
university_name Indian Institute of Technology Bombay
teaching 43.8
international 14.3
research 24.2
citations 8,327
income 14.9
Total Score Ranking
那么总分应该计算为
Total Score = .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income.
这应该适用于数据集中的所有行。
谁能帮我实现这个要求。我现在被困在这个问题上已经有一段时间了。 :-(
Indian_univ.head(10).to_dict()
{'citations': {510: 38.799999999999997,
832: 39.0,
856: 45.600000000000001,
959: 45.799999999999997,
1232: 84.700000000000003,
1360: 38.5,
1361: 41.799999999999997,
1362: 35.299999999999997,
1363: 53.600000000000001,
1679: 51.600000000000001},
'country': {510: 'India',
832: 'India',
856: 'India',
959: 'India',
1232: 'India',
1360: 'India',
1361: 'India',
1362: 'India',
1363: 'India',
1679: 'India'},
'female_male_ratio': {510: '16 : 84',
832: '15 : 85',
856: '16 : 84',
959: '17 : 83',
1232: '46 : 54',
1360: '18 : 82',
1361: '13 : 87',
1362: '15 : 85',
1363: '17 : 83',
1679: '19 : 81'},
'income': {510: '24.2',
832: '72.4',
856: '52.7',
959: '70.4',
1232: '28.4',
1360: '-',
1361: '42.4',
1362: '-',
1363: '64.8',
1679: '37.9'},
'international': {510: '14.3',
832: '16.1',
856: '19.9',
959: '15.6',
1232: '29.3',
1360: '15.3',
1361: '17.3',
1362: '14.7',
1363: '15.6',
1679: '18.2'},
'international_students': {510: '1%',
832: '0%',
856: '1%',
959: '1%',
1232: '1%',
1360: '1%',
1361: '0%',
1362: '0%',
1363: '1%',
1679: '1%'},
'num_students': {510: '8,327',
832: '9,928',
856: '8,327',
959: '8,061',
1232: '16,691',
1360: '8,371',
1361: '6,167',
1362: '9,928',
1363: '8,061',
1679: '3,318'},
'research': {510: 15.699999999999999,
832: 45.299999999999997,
856: 33.100000000000001,
959: 13.699999999999999,
1232: 14.0,
1360: 23.0,
1361: 25.199999999999999,
1362: 30.0,
1363: 12.300000000000001,
1679: 39.5},
'student_staff_ratio': {510: 14.9,
832: 17.5,
856: 14.9,
959: 18.699999999999999,
1232: 23.899999999999999,
1360: 17.300000000000001,
1361: 12.199999999999999,
1362: 17.5,
1363: 18.699999999999999,
1679: 8.1999999999999993},
'teaching': {510: 43.799999999999997,
832: 44.200000000000003,
856: 47.299999999999997,
959: 30.399999999999999,
1232: 25.800000000000001,
1360: 33.799999999999997,
1361: 31.300000000000001,
1362: 39.299999999999997,
1363: 25.100000000000001,
1679: 32.600000000000001},
'total_score': {510: 29.489999999999995,
832: 38.549999999999997,
856: 37.799999999999997,
959: 26.969999999999999,
1232: 37.350000000000001,
1360: 28.589999999999996,
1361: 29.489999999999998,
1362: 31.379999999999995,
1363: 27.299999999999997,
1679: 37.109999999999999},
'university_name': {510: 'Indian Institute of Technology Bombay',
832: 'Indian Institute of Technology Kharagpur',
856: 'Indian Institute of Technology Bombay',
959: 'Indian Institute of Technology Roorkee',
1232: 'Panjab University',
1360: 'Indian Institute of Technology Delhi',
1361: 'Indian Institute of Technology Kanpur',
1362: 'Indian Institute of Technology Kharagpur',
1363: 'Indian Institute of Technology Roorkee',
1679: 'Indian Institute of Science'},
'world_rank': {510: '301-350',
832: '226-250',
856: '251-275',
959: '351-400',
1232: '226-250',
1360: '351-400',
1361: '351-400',
1362: '351-400',
1363: '351-400',
1679: '276-300'},
'year': {510: 2012,
832: 2013,
856: 2013,
959: 2013,
1232: 2014,
1360: 2014,
1361: 2014,
1362: 2014,
1363: 2014,
1679: 2015}}
我认为你可以使用:
df['Total Score'] = .3 **df.research +
.3 **df.citations +
.3 **df.teaching +
.075 **df.international +
.025 **df.income
如果需要apply
功能,通常比较慢:
def cal_score(x):
return .3 **x.research +
.3 **x.citations +
.3 **x.teaching +
.075 **x.international +
.025 **x.income
df['Total Score'] = df.apply(cal_score, axis=1)
编辑数据:
你首先需要replace
columns num_students
and income
and then convert to float
by astype
:
EDIT2 数据样本:
import pandas as pd
df = pd.DataFrame({'citations': {510: 38.799999999999997, 832: 39.0, 856: 45.600000000000001, 959: 45.799999999999997, 1232: 84.700000000000003, 1360: 38.5, 1361: 41.799999999999997, 1362: 35.299999999999997, 1363: 53.600000000000001, 1679: 51.600000000000001}, 'country': {510: 'India', 832: 'India', 856: 'India', 959: 'India', 1232: 'India', 1360: 'India', 1361: 'India', 1362: 'India', 1363: 'India', 1679: 'India'}, 'female_male_ratio': {510: '16 : 84', 832: '15 : 85', 856: '16 : 84', 959: '17 : 83', 1232: '46 : 54', 1360: '18 : 82', 1361: '13 : 87', 1362: '15 : 85', 1363: '17 : 83', 1679: '19 : 81'}, 'income': {510: '24.2', 832: '72.4', 856: '52.7', 959: '70.4', 1232: '28.4', 1360: '-', 1361: '42.4', 1362: '-', 1363: '64.8', 1679: '37.9'}, 'international': {510: '14.3', 832: '16.1', 856: '19.9', 959: '15.6', 1232: '29.3', 1360: '15.3', 1361: '17.3', 1362: '14.7', 1363: '15.6', 1679: '18.2'}, 'international_students': {510: '1%', 832: '0%', 856: '1%', 959: '1%', 1232: '1%', 1360: '1%', 1361: '0%', 1362: '0%', 1363: '1%', 1679: '1%'}, 'num_students': {510: '8,327', 832: '9,928', 856: '8,327', 959: '8,061', 1232: '16,691', 1360: '8,371', 1361: '6,167', 1362: '9,928', 1363: '8,061', 1679: '3,318'}, 'research': {510: 15.699999999999999, 832: 45.299999999999997, 856: 33.100000000000001, 959: 13.699999999999999, 1232: 14.0, 1360: 23.0, 1361: 25.199999999999999, 1362: 30.0, 1363: 12.300000000000001, 1679: 39.5}, 'student_staff_ratio': {510: 14.9, 832: 17.5, 856: 14.9, 959: 18.699999999999999, 1232: 23.899999999999999, 1360: 17.300000000000001, 1361: 12.199999999999999, 1362: 17.5, 1363: 18.699999999999999, 1679: 8.1999999999999993}, 'teaching': {510: 43.799999999999997, 832: 44.200000000000003, 856: 47.299999999999997, 959: 30.399999999999999, 1232: 25.800000000000001, 1360: 33.799999999999997, 1361: 31.300000000000001, 1362: 39.299999999999997, 1363: 25.100000000000001, 1679: 32.600000000000001}, 'total_score': {510: 29.489999999999995, 832: 38.549999999999997, 856: 37.799999999999997, 959: 26.969999999999999, 1232: 37.350000000000001, 1360: 28.589999999999996, 1361: 29.489999999999998, 1362: 31.379999999999995, 1363: 27.299999999999997, 1679: 37.109999999999999}, 'university_name': {510: 'Indian Institute of Technology Bombay', 832: 'Indian Institute of Technology Kharagpur', 856: 'Indian Institute of Technology Bombay', 959: 'Indian Institute of Technology Roorkee', 1232: 'Panjab University', 1360: 'Indian Institute of Technology Delhi', 1361: 'Indian Institute of Technology Kanpur', 1362: 'Indian Institute of Technology Kharagpur', 1363: 'Indian Institute of Technology Roorkee', 1679: 'Indian Institute of Science'}, 'world_rank': {510: '301-350', 832: '226-250', 856: '251-275', 959: '351-400', 1232: '226-250', 1360: '351-400', 1361: '351-400', 1362: '351-400', 1363: '351-400', 1679: '276-300'}, 'year': {510: 2012, 832: 2013, 856: 2013, 959: 2013, 1232: 2014, 1360: 2014, 1361: 2014, 1362: 2014, 1363: 2014, 1679: 2015}})
#replace , to empty string
df['num_students'] = df.num_students.str.replace(',', '')
#replace - to '0'
df['income'] = df['income'].str.replace('-', '0')
#convert columns to float
df[['teaching', 'international', 'research', 'citations', 'income']] =
df[['teaching', 'international', 'research', 'citations', 'income']].astype(float)
df['Total Score'] = .3 **df.research +
.3 **df.citations +
.3 **df.teaching +
.075 **df.international +
.025 **df.income
print (df)
citations country female_male_ratio income international \
510 38.8 India 16 : 84 24.2 14.3
832 39.0 India 15 : 85 72.4 16.1
856 45.6 India 16 : 84 52.7 19.9
959 45.8 India 17 : 83 70.4 15.6
1232 84.7 India 46 : 54 28.4 29.3
1360 38.5 India 18 : 82 0.0 15.3
1361 41.8 India 13 : 87 42.4 17.3
1362 35.3 India 15 : 85 0.0 14.7
1363 53.6 India 17 : 83 64.8 15.6
1679 51.6 India 19 : 81 37.9 18.2
international_students num_students research student_staff_ratio \
510 1% 8327 15.7 14.9
832 0% 9928 45.3 17.5
856 1% 8327 33.1 14.9
959 1% 8061 13.7 18.7
1232 1% 16691 14.0 23.9
1360 1% 8371 23.0 17.3
1361 0% 6167 25.2 12.2
1362 0% 9928 30.0 17.5
1363 1% 8061 12.3 18.7
1679 1% 3318 39.5 8.2
teaching total_score university_name \
510 43.8 29.49 Indian Institute of Technology Bombay
832 44.2 38.55 Indian Institute of Technology Kharagpur
856 47.3 37.80 Indian Institute of Technology Bombay
959 30.4 26.97 Indian Institute of Technology Roorkee
1232 25.8 37.35 Panjab University
1360 33.8 28.59 Indian Institute of Technology Delhi
1361 31.3 29.49 Indian Institute of Technology Kanpur
1362 39.3 31.38 Indian Institute of Technology Kharagpur
1363 25.1 27.30 Indian Institute of Technology Roorkee
1679 32.6 37.11 Indian Institute of Science
world_rank year Total Score
510 301-350 2012 6.177371e-09
832 226-250 2013 7.776087e-19
856 251-275 2013 4.928529e-18
959 351-400 2013 6.863746e-08
1232 226-250 2014 4.782972e-08
1360 351-400 2014 1.000000e+00
1361 351-400 2014 6.664022e-14
1362 351-400 2014 1.000000e+00
1363 351-400 2014 3.703322e-07
1679 276-300 2015 9.003721e-18
这里是最直接的方式:
df.assign(TotalScore=.3 **df.research + .3 **df.citations + .3 **df.teaching +.075 **df.international + .025 **df.income)