R - 仅为 top/bottom 行在新列中添加标志

R - Adding a flag in a new column only for rows of top/bottom ranks

(R 初学者级别,Win7 上的 R studio)

我有一个按州排名的数据框。我想将最高排名标记为 "best",将最高排名标记为 "worst",但每个子集的成员数量不同,因此我必须计算每个州的最大索引,然后更新 col "level"。我可以为 "best" 做到这一点,但无法识别 "worst" 并且我不想使用循环:

mystate<- c(rep("TX",5),rep("AL",3),rep("NM",7))
mycounty<-c("TX1" ,"TX2", "TX3", "TX4", "TX5", "AL1", "AL2", "AL3", "NM1", "NM2", "NM3", "NM4", "NM5", "NM6", "NM7")
mycrime<-c(5,6,22,5,12,17,4,16,3,7,3,5,3,NA,16)
mydf<-data.frame(mystate,mycounty,mycrime)
mydf$rank<-NA
mydf <- transform(mydf,rank = ave(mycrime, mystate,FUN = function(x) rank(x, ties.method = "first")))
mydf$level <- NA
mydf[mydf$rank==1,"level"]<-"best"
# flag worst next

结果应如下所示:

    mystate mycounty mycrime rank level
 1       TX      TX1       5    1  best
 2       TX      TX2       6    3  <NA>
 3       TX      TX3      22    5  worst
 4       TX      TX4       5    2  <NA>
 5       TX      TX5      12    4  <NA>
 6       AL      AL1      17    3  worst
 7       AL      AL2       4    1  best
 8       AL      AL3      16    2  <NA>
 9       NM      NM1       3    1  best
 10      NM      NM2       7    5  <NA>
 11      NM      NM3       3    2  <NA>
 12      NM      NM4       5    4  <NA>
 13      NM      NM5       3    3  <NA>
 14      NM      NM6      NA    7  <NA>
 15      NM      NM7      16    6  worst 

感谢您的帮助。

1) no packages 使用 ave 计算 0/1 向量,最坏情况下为 1,否则为 0,然后使用 ifelse 来计算设置 level 的值:

is.max <- function(x) seq_along(x) == which.max(x)
worst <- with(mydf, ave(mycrime, mystate, FUN = is.max))
transform(mydf, level = ifelse(worst, "worst", level))

giving;

   mystate mycounty mycrime rank level
1       TX      TX1       5    1  best
2       TX      TX2       6    3  <NA>
3       TX      TX3      22    5 worst
4       TX      TX4       5    2  <NA>
5       TX      TX5      12    4  <NA>
6       AL      AL1      17    3 worst
7       AL      AL2       4    1  best
8       AL      AL3      16    2  <NA>
9       NM      NM1       3    1  best
10      NM      NM2       7    5  <NA>
11      NM      NM3       3    2  <NA>
12      NM      NM4       5    4  <NA>
13      NM      NM5       3    3  <NA>
14      NM      NM6      NA    7  <NA>
15      NM      NM7      16    6 worst

2) dplyr 从上面使用 dplyr 和 is.max 可以这样做:

library(dplyr)
mydf %>% 
     group_by(mystate) %>% 
     mutate(level = ifelse(is.max(mycrime), "worst", level)

3) data.table 使用上面的 data.table 和 is.max:

library(data.table)
mydt <- as.data.table(mydf)
mydt[, level := ifelse(is.max(mycrime), "worst", level), by = "mystate"]

base R 这是一种同时获得 "worst" 和 "best" 的方法:

mydf <- data.frame(mystate, mycounty, mycrime)

z = ave(mydf$mycrime, mydf$mystate, FUN = function(x) {
  r = rank(x, ties.method="first")
  factor(r, levels = range(r))
})

mydf$level = factor(z, labels = c("best", "worst"))

ave 不能自己完成这项工作,因为它不能 return 一个 factor(据我所知)。


dplyrdata.table 类似物

library(dplyr)
mydf %>% group_by(mystate) %>% mutate(
  r     = rank(x, ties.method="first"),
  level = factor(r, levels = range(r), labels = c("best", "worst")),
  r     = NULL
)

# or...
library(data.table)
setDT(mydf)[, level := {
  r = rank(x, ties.method="first")
  factor(r, levels = range(r), labels = c("best", "worst"))
}, by=mystate]