Pandas 枢轴 table 和排名
Pandas Pivot table and Rank
我有以下数据框
df = pd.DataFrame()
df['SubjectArea'] = ["a","b","a","c","a","s","d","b","s","a","s","c","s","z","a"]
df['Articles'] = [10, 20,5,58,98,15,35,89,47,15,25,145,89,689,25]
df['NoOfReading'] = [30, 40,45,25,35,88,68,98,45,125,255,np.nan,75,125,265]
df
SubjectArea Articles NoOfReading
0 a 10 30.0
1 b 20 40.0
2 a 5 45.0
3 c 58 25.0
4 a 98 35.0
5 s 15 88.0
6 d 35 68.0
7 b 89 98.0
8 s 47 45.0
9 a 15 125.0
10 s 25 255.0
11 c 145 NaN
12 s 89 75.0
13 z 689 125.0
14 a 25 265.0
我想为每个主题领域创建一个如下所示的数据框,并根据加权平均值给出排名。
df.fillna(0, inplace=True)
df["weightedAverage"] = df["Articles"]*0.35 + df["NoOfReading"]*0.65
df2 = df[df["SubjectArea"]=="a"]
##df2["weightedAverage"] = df2["Articles"]*0.35 + df2["NoOfReading"]*0.65
df2 = df2.sort_values(by="weightedAverage",ascending=[False])
df2['Rank'] = df2['weightedAverage'].rank(method='dense', ascending=False)
df2.index = range(len(df2))
df2
SubjectArea Articles NoOfReading weightedAverage Rank
0 a 25 265.0 181.00 1.0
1 a 15 125.0 86.50 2.0
2 a 98 35.0 57.05 3.0
3 a 5 45.0 31.00 4.0
4 a 10 30.0 23.00 5.0
所以,我想为所有“subjectArea”创建一个数据框内容,其排名和加权平均值低于 1。
SubjectArea Articles NoOfReading weightedAverage Rank
0 a 25 265.0 181.00 1.0
1 a 15 125.0 86.50 2.0
2 a 98 35.0 57.05 3.0
3 a 5 45.0 31.00 4.0
4 a 10 30.0 23.00 5.0
SubjectArea Articles NoOfReading weightedAverage Rank
0 b 89 98.0 94.85 1.0
1 b 20 40.0 33.00 2.0
.
.
.
.
.
是否可以使用 pandas 枢轴 table 和 Rank 创建一个类似的?或任何其他方法?
如有任何帮助,我们将不胜感激。提前致谢。!!
您可以在不分隔主题的情况下分配等级 groupby
:
df["weightedAverage"] = df["Articles"]*0.35 + df["NoOfReading"]*0.65
df['Rank'] = df.groupby('SubjectArea')['weightedAverage'].rank()
df = df.sort_values(['SubjectArea', 'Rank'])
输出:
SubjectArea Articles NoOfReading weightedAverage Rank
0 a 10 30.0 23.00 1.0
2 a 5 45.0 31.00 2.0
4 a 98 35.0 57.05 3.0
9 a 15 125.0 86.50 4.0
14 a 25 265.0 181.00 5.0
1 b 20 40.0 33.00 1.0
7 b 89 98.0 94.85 2.0
3 c 58 25.0 36.55 1.0
11 c 145 NaN NaN NaN
6 d 35 68.0 56.45 1.0
8 s 47 45.0 45.70 1.0
5 s 15 88.0 62.45 2.0
12 s 89 75.0 79.90 3.0
10 s 25 255.0 174.50 4.0
13 z 689 125.0 322.40 1.0
注意:一般来说,如果要按列访问子数据框,使用groupby
:
循环会更快
for subject, data in df.groupby('SubjectArea'):
# do something with `data`
我有以下数据框
df = pd.DataFrame()
df['SubjectArea'] = ["a","b","a","c","a","s","d","b","s","a","s","c","s","z","a"]
df['Articles'] = [10, 20,5,58,98,15,35,89,47,15,25,145,89,689,25]
df['NoOfReading'] = [30, 40,45,25,35,88,68,98,45,125,255,np.nan,75,125,265]
df
SubjectArea Articles NoOfReading
0 a 10 30.0
1 b 20 40.0
2 a 5 45.0
3 c 58 25.0
4 a 98 35.0
5 s 15 88.0
6 d 35 68.0
7 b 89 98.0
8 s 47 45.0
9 a 15 125.0
10 s 25 255.0
11 c 145 NaN
12 s 89 75.0
13 z 689 125.0
14 a 25 265.0
我想为每个主题领域创建一个如下所示的数据框,并根据加权平均值给出排名。
df.fillna(0, inplace=True)
df["weightedAverage"] = df["Articles"]*0.35 + df["NoOfReading"]*0.65
df2 = df[df["SubjectArea"]=="a"]
##df2["weightedAverage"] = df2["Articles"]*0.35 + df2["NoOfReading"]*0.65
df2 = df2.sort_values(by="weightedAverage",ascending=[False])
df2['Rank'] = df2['weightedAverage'].rank(method='dense', ascending=False)
df2.index = range(len(df2))
df2
SubjectArea Articles NoOfReading weightedAverage Rank
0 a 25 265.0 181.00 1.0
1 a 15 125.0 86.50 2.0
2 a 98 35.0 57.05 3.0
3 a 5 45.0 31.00 4.0
4 a 10 30.0 23.00 5.0
所以,我想为所有“subjectArea”创建一个数据框内容,其排名和加权平均值低于 1。
SubjectArea Articles NoOfReading weightedAverage Rank
0 a 25 265.0 181.00 1.0
1 a 15 125.0 86.50 2.0
2 a 98 35.0 57.05 3.0
3 a 5 45.0 31.00 4.0
4 a 10 30.0 23.00 5.0
SubjectArea Articles NoOfReading weightedAverage Rank
0 b 89 98.0 94.85 1.0
1 b 20 40.0 33.00 2.0
.
.
.
.
.
是否可以使用 pandas 枢轴 table 和 Rank 创建一个类似的?或任何其他方法?
如有任何帮助,我们将不胜感激。提前致谢。!!
您可以在不分隔主题的情况下分配等级 groupby
:
df["weightedAverage"] = df["Articles"]*0.35 + df["NoOfReading"]*0.65
df['Rank'] = df.groupby('SubjectArea')['weightedAverage'].rank()
df = df.sort_values(['SubjectArea', 'Rank'])
输出:
SubjectArea Articles NoOfReading weightedAverage Rank
0 a 10 30.0 23.00 1.0
2 a 5 45.0 31.00 2.0
4 a 98 35.0 57.05 3.0
9 a 15 125.0 86.50 4.0
14 a 25 265.0 181.00 5.0
1 b 20 40.0 33.00 1.0
7 b 89 98.0 94.85 2.0
3 c 58 25.0 36.55 1.0
11 c 145 NaN NaN NaN
6 d 35 68.0 56.45 1.0
8 s 47 45.0 45.70 1.0
5 s 15 88.0 62.45 2.0
12 s 89 75.0 79.90 3.0
10 s 25 255.0 174.50 4.0
13 z 689 125.0 322.40 1.0
注意:一般来说,如果要按列访问子数据框,使用groupby
:
for subject, data in df.groupby('SubjectArea'):
# do something with `data`