R - 将行索引添加到数据框但处理具有最低等级的联系
R - Add row index to a data frame but handle ties with minimum rank
我成功地使用了这个 SO 线程中的答案
r-how-to-add-row-index-to-a-data-frame-based-on-combination-of-factors 但我需要处理可以绑定两行(或更多行)的情况。
df <- data.frame(
season = c(2014,2014,2014,2014,2014,2014, 2014, 2014),
week = c(1,1,1,1,2,2,2,2),
player.name = c("Matt Ryan","Peyton Manning","Cam Newton","Matthew Stafford","Carson Palmer","Andrew Luck", "Aaron Rodgers", "Chad Henne"),
fant.pts.passing = c(28,19,29,28,18,22,29,22)
)
df <- df[order(-df$season, df$week, -df$fant.pts.passing),]
df$Index <- ave( 1:nrow(df), df$season, df$week, FUN=function(x) 1:length(x) )
df
在此示例中,第 1 周,Matt Ryan 和 Matthew Stafford 均为 2 岁,然后 Peyton Manning 为 4 岁。
假设您想要按季节和周排名,这可以通过 dplyr
的 min_rank
:
轻松实现
library(dplyr)
df %>% group_by(season, week) %>%
mutate(indx = min_rank(desc(fant.pts.passing)))
# season week player.name fant.pts.passing Index indx
# 1 2014 1 Cam Newton 29 1 1
# 2 2014 1 Matt Ryan 28 2 2
# 3 2014 1 Matthew Stafford 28 3 2
# 4 2014 1 Peyton Manning 19 4 4
# 5 2014 2 Aaron Rodgers 29 1 1
# 6 2014 2 Andrew Luck 22 2 2
# 7 2014 2 Chad Henne 22 3 2
# 8 2014 2 Carson Palmer 18 4 4
您希望在 ave
调用中将 rank
函数与 ties.method="min"
一起使用:
df$Index <- ave(-df$fant.pts.passing, df$season, df$week,
FUN=function(x) rank(x, ties.method="min"))
df
# season week player.name fant.pts.passing Index
# 3 2014 1 Cam Newton 29 1
# 1 2014 1 Matt Ryan 28 2
# 4 2014 1 Matthew Stafford 28 2
# 2 2014 1 Peyton Manning 19 4
# 7 2014 2 Aaron Rodgers 29 1
# 6 2014 2 Andrew Luck 22 2
# 8 2014 2 Chad Henne 22 2
# 5 2014 2 Carson Palmer 18 4
您可以使用 data.table
中更快的 frank
并通过引用分配 (:=
) 列
library(data.table)#v1.9.5+
setDT(df)[, indx := frank(-fant.pts.passing, ties.method='min'), .(season, week)]
# season week player.name fant.pts.passing indx
#1: 2014 1 Cam Newton 29 1
#2: 2014 1 Matt Ryan 28 2
#3: 2014 1 Matthew Stafford 28 2
#4: 2014 1 Peyton Manning 19 4
#5: 2014 2 Aaron Rodgers 29 1
#6: 2014 2 Andrew Luck 22 2
#7: 2014 2 Chad Henne 22 2
#8: 2014 2 Carson Palmer 18 4
我成功地使用了这个 SO 线程中的答案 r-how-to-add-row-index-to-a-data-frame-based-on-combination-of-factors 但我需要处理可以绑定两行(或更多行)的情况。
df <- data.frame(
season = c(2014,2014,2014,2014,2014,2014, 2014, 2014),
week = c(1,1,1,1,2,2,2,2),
player.name = c("Matt Ryan","Peyton Manning","Cam Newton","Matthew Stafford","Carson Palmer","Andrew Luck", "Aaron Rodgers", "Chad Henne"),
fant.pts.passing = c(28,19,29,28,18,22,29,22)
)
df <- df[order(-df$season, df$week, -df$fant.pts.passing),]
df$Index <- ave( 1:nrow(df), df$season, df$week, FUN=function(x) 1:length(x) )
df
在此示例中,第 1 周,Matt Ryan 和 Matthew Stafford 均为 2 岁,然后 Peyton Manning 为 4 岁。
假设您想要按季节和周排名,这可以通过 dplyr
的 min_rank
:
library(dplyr)
df %>% group_by(season, week) %>%
mutate(indx = min_rank(desc(fant.pts.passing)))
# season week player.name fant.pts.passing Index indx
# 1 2014 1 Cam Newton 29 1 1
# 2 2014 1 Matt Ryan 28 2 2
# 3 2014 1 Matthew Stafford 28 3 2
# 4 2014 1 Peyton Manning 19 4 4
# 5 2014 2 Aaron Rodgers 29 1 1
# 6 2014 2 Andrew Luck 22 2 2
# 7 2014 2 Chad Henne 22 3 2
# 8 2014 2 Carson Palmer 18 4 4
您希望在 ave
调用中将 rank
函数与 ties.method="min"
一起使用:
df$Index <- ave(-df$fant.pts.passing, df$season, df$week,
FUN=function(x) rank(x, ties.method="min"))
df
# season week player.name fant.pts.passing Index
# 3 2014 1 Cam Newton 29 1
# 1 2014 1 Matt Ryan 28 2
# 4 2014 1 Matthew Stafford 28 2
# 2 2014 1 Peyton Manning 19 4
# 7 2014 2 Aaron Rodgers 29 1
# 6 2014 2 Andrew Luck 22 2
# 8 2014 2 Chad Henne 22 2
# 5 2014 2 Carson Palmer 18 4
您可以使用 data.table
中更快的 frank
并通过引用分配 (:=
) 列
library(data.table)#v1.9.5+
setDT(df)[, indx := frank(-fant.pts.passing, ties.method='min'), .(season, week)]
# season week player.name fant.pts.passing indx
#1: 2014 1 Cam Newton 29 1
#2: 2014 1 Matt Ryan 28 2
#3: 2014 1 Matthew Stafford 28 2
#4: 2014 1 Peyton Manning 19 4
#5: 2014 2 Aaron Rodgers 29 1
#6: 2014 2 Andrew Luck 22 2
#7: 2014 2 Chad Henne 22 2
#8: 2014 2 Carson Palmer 18 4