R - 仅为 top/bottom 行在新列中添加标志
R - Adding a flag in a new column only for rows of top/bottom ranks
(R 初学者级别,Win7 上的 R studio)
我有一个按州排名的数据框。我想将最高排名标记为 "best",将最高排名标记为 "worst",但每个子集的成员数量不同,因此我必须计算每个州的最大索引,然后更新 col "level"。我可以为 "best" 做到这一点,但无法识别 "worst" 并且我不想使用循环:
mystate<- c(rep("TX",5),rep("AL",3),rep("NM",7))
mycounty<-c("TX1" ,"TX2", "TX3", "TX4", "TX5", "AL1", "AL2", "AL3", "NM1", "NM2", "NM3", "NM4", "NM5", "NM6", "NM7")
mycrime<-c(5,6,22,5,12,17,4,16,3,7,3,5,3,NA,16)
mydf<-data.frame(mystate,mycounty,mycrime)
mydf$rank<-NA
mydf <- transform(mydf,rank = ave(mycrime, mystate,FUN = function(x) rank(x, ties.method = "first")))
mydf$level <- NA
mydf[mydf$rank==1,"level"]<-"best"
# flag worst next
结果应如下所示:
mystate mycounty mycrime rank level
1 TX TX1 5 1 best
2 TX TX2 6 3 <NA>
3 TX TX3 22 5 worst
4 TX TX4 5 2 <NA>
5 TX TX5 12 4 <NA>
6 AL AL1 17 3 worst
7 AL AL2 4 1 best
8 AL AL3 16 2 <NA>
9 NM NM1 3 1 best
10 NM NM2 7 5 <NA>
11 NM NM3 3 2 <NA>
12 NM NM4 5 4 <NA>
13 NM NM5 3 3 <NA>
14 NM NM6 NA 7 <NA>
15 NM NM7 16 6 worst
感谢您的帮助。
1) no packages 使用 ave
计算 0/1 向量,最坏情况下为 1,否则为 0,然后使用 ifelse
来计算设置 level
的值:
is.max <- function(x) seq_along(x) == which.max(x)
worst <- with(mydf, ave(mycrime, mystate, FUN = is.max))
transform(mydf, level = ifelse(worst, "worst", level))
giving;
mystate mycounty mycrime rank level
1 TX TX1 5 1 best
2 TX TX2 6 3 <NA>
3 TX TX3 22 5 worst
4 TX TX4 5 2 <NA>
5 TX TX5 12 4 <NA>
6 AL AL1 17 3 worst
7 AL AL2 4 1 best
8 AL AL3 16 2 <NA>
9 NM NM1 3 1 best
10 NM NM2 7 5 <NA>
11 NM NM3 3 2 <NA>
12 NM NM4 5 4 <NA>
13 NM NM5 3 3 <NA>
14 NM NM6 NA 7 <NA>
15 NM NM7 16 6 worst
2) dplyr 从上面使用 dplyr 和 is.max
可以这样做:
library(dplyr)
mydf %>%
group_by(mystate) %>%
mutate(level = ifelse(is.max(mycrime), "worst", level)
3) data.table 使用上面的 data.table 和 is.max
:
library(data.table)
mydt <- as.data.table(mydf)
mydt[, level := ifelse(is.max(mycrime), "worst", level), by = "mystate"]
base R 这是一种同时获得 "worst" 和 "best" 的方法:
mydf <- data.frame(mystate, mycounty, mycrime)
z = ave(mydf$mycrime, mydf$mystate, FUN = function(x) {
r = rank(x, ties.method="first")
factor(r, levels = range(r))
})
mydf$level = factor(z, labels = c("best", "worst"))
ave
不能自己完成这项工作,因为它不能 return 一个 factor
(据我所知)。
dplyr 和 data.table 类似物
library(dplyr)
mydf %>% group_by(mystate) %>% mutate(
r = rank(x, ties.method="first"),
level = factor(r, levels = range(r), labels = c("best", "worst")),
r = NULL
)
# or...
library(data.table)
setDT(mydf)[, level := {
r = rank(x, ties.method="first")
factor(r, levels = range(r), labels = c("best", "worst"))
}, by=mystate]
(R 初学者级别,Win7 上的 R studio)
我有一个按州排名的数据框。我想将最高排名标记为 "best",将最高排名标记为 "worst",但每个子集的成员数量不同,因此我必须计算每个州的最大索引,然后更新 col "level"。我可以为 "best" 做到这一点,但无法识别 "worst" 并且我不想使用循环:
mystate<- c(rep("TX",5),rep("AL",3),rep("NM",7))
mycounty<-c("TX1" ,"TX2", "TX3", "TX4", "TX5", "AL1", "AL2", "AL3", "NM1", "NM2", "NM3", "NM4", "NM5", "NM6", "NM7")
mycrime<-c(5,6,22,5,12,17,4,16,3,7,3,5,3,NA,16)
mydf<-data.frame(mystate,mycounty,mycrime)
mydf$rank<-NA
mydf <- transform(mydf,rank = ave(mycrime, mystate,FUN = function(x) rank(x, ties.method = "first")))
mydf$level <- NA
mydf[mydf$rank==1,"level"]<-"best"
# flag worst next
结果应如下所示:
mystate mycounty mycrime rank level
1 TX TX1 5 1 best
2 TX TX2 6 3 <NA>
3 TX TX3 22 5 worst
4 TX TX4 5 2 <NA>
5 TX TX5 12 4 <NA>
6 AL AL1 17 3 worst
7 AL AL2 4 1 best
8 AL AL3 16 2 <NA>
9 NM NM1 3 1 best
10 NM NM2 7 5 <NA>
11 NM NM3 3 2 <NA>
12 NM NM4 5 4 <NA>
13 NM NM5 3 3 <NA>
14 NM NM6 NA 7 <NA>
15 NM NM7 16 6 worst
感谢您的帮助。
1) no packages 使用 ave
计算 0/1 向量,最坏情况下为 1,否则为 0,然后使用 ifelse
来计算设置 level
的值:
is.max <- function(x) seq_along(x) == which.max(x)
worst <- with(mydf, ave(mycrime, mystate, FUN = is.max))
transform(mydf, level = ifelse(worst, "worst", level))
giving;
mystate mycounty mycrime rank level
1 TX TX1 5 1 best
2 TX TX2 6 3 <NA>
3 TX TX3 22 5 worst
4 TX TX4 5 2 <NA>
5 TX TX5 12 4 <NA>
6 AL AL1 17 3 worst
7 AL AL2 4 1 best
8 AL AL3 16 2 <NA>
9 NM NM1 3 1 best
10 NM NM2 7 5 <NA>
11 NM NM3 3 2 <NA>
12 NM NM4 5 4 <NA>
13 NM NM5 3 3 <NA>
14 NM NM6 NA 7 <NA>
15 NM NM7 16 6 worst
2) dplyr 从上面使用 dplyr 和 is.max
可以这样做:
library(dplyr)
mydf %>%
group_by(mystate) %>%
mutate(level = ifelse(is.max(mycrime), "worst", level)
3) data.table 使用上面的 data.table 和 is.max
:
library(data.table)
mydt <- as.data.table(mydf)
mydt[, level := ifelse(is.max(mycrime), "worst", level), by = "mystate"]
base R 这是一种同时获得 "worst" 和 "best" 的方法:
mydf <- data.frame(mystate, mycounty, mycrime)
z = ave(mydf$mycrime, mydf$mystate, FUN = function(x) {
r = rank(x, ties.method="first")
factor(r, levels = range(r))
})
mydf$level = factor(z, labels = c("best", "worst"))
ave
不能自己完成这项工作,因为它不能 return 一个 factor
(据我所知)。
dplyr 和 data.table 类似物
library(dplyr)
mydf %>% group_by(mystate) %>% mutate(
r = rank(x, ties.method="first"),
level = factor(r, levels = range(r), labels = c("best", "worst")),
r = NULL
)
# or...
library(data.table)
setDT(mydf)[, level := {
r = rank(x, ties.method="first")
factor(r, levels = range(r), labels = c("best", "worst"))
}, by=mystate]