R - 将行索引添加到数据框但处理具有最低等级的联系

R - Add row index to a data frame but handle ties with minimum rank

我成功地使用了这个 SO 线程中的答案 r-how-to-add-row-index-to-a-data-frame-based-on-combination-of-factors 但我需要处理可以绑定两行(或更多行)的情况。

df <- data.frame(
season = c(2014,2014,2014,2014,2014,2014, 2014, 2014), 
week = c(1,1,1,1,2,2,2,2), 
player.name = c("Matt Ryan","Peyton Manning","Cam Newton","Matthew Stafford","Carson Palmer","Andrew Luck", "Aaron Rodgers", "Chad Henne"), 
fant.pts.passing = c(28,19,29,28,18,22,29,22)
)

df <- df[order(-df$season, df$week, -df$fant.pts.passing),]

df$Index <- ave( 1:nrow(df), df$season, df$week, FUN=function(x) 1:length(x) )

df

在此示例中,第 1 周,Matt Ryan 和 Matthew Stafford 均为 2 岁,然后 Peyton Manning 为 4 岁。

假设您想要按季节和周排名,这可以通过 dplyrmin_rank:

轻松实现
library(dplyr)

df %>% group_by(season, week) %>%
  mutate(indx = min_rank(desc(fant.pts.passing)))

#   season week      player.name fant.pts.passing Index indx
# 1   2014    1       Cam Newton               29     1    1
# 2   2014    1        Matt Ryan               28     2    2
# 3   2014    1 Matthew Stafford               28     3    2
# 4   2014    1   Peyton Manning               19     4    4
# 5   2014    2    Aaron Rodgers               29     1    1
# 6   2014    2      Andrew Luck               22     2    2
# 7   2014    2       Chad Henne               22     3    2
# 8   2014    2    Carson Palmer               18     4    4

您希望在 ave 调用中将 rank 函数与 ties.method="min" 一起使用:

df$Index <- ave(-df$fant.pts.passing, df$season, df$week,
                FUN=function(x) rank(x, ties.method="min"))
df
#   season week      player.name fant.pts.passing Index
# 3   2014    1       Cam Newton               29     1
# 1   2014    1        Matt Ryan               28     2
# 4   2014    1 Matthew Stafford               28     2
# 2   2014    1   Peyton Manning               19     4
# 7   2014    2    Aaron Rodgers               29     1
# 6   2014    2      Andrew Luck               22     2
# 8   2014    2       Chad Henne               22     2
# 5   2014    2    Carson Palmer               18     4

您可以使用 data.table 中更快的 frank 并通过引用分配 (:=) 列

library(data.table)#v1.9.5+
setDT(df)[, indx := frank(-fant.pts.passing, ties.method='min'), .(season, week)]
 #   season week      player.name fant.pts.passing indx
 #1:   2014    1       Cam Newton               29    1
 #2:   2014    1        Matt Ryan               28    2
 #3:   2014    1 Matthew Stafford               28    2
 #4:   2014    1   Peyton Manning               19    4
 #5:   2014    2    Aaron Rodgers               29    1
 #6:   2014    2      Andrew Luck               22    2
 #7:   2014    2       Chad Henne               22    2
 #8:   2014    2    Carson Palmer               18    4