fct_collapse 的否定级别
negating levels for fct_collapse
我有一个不应该折叠的级别列表("Alberta"、"British Columbia"、"Ontario"、"Quebec")比应该(所有其他) ).我无法否定 fct_collapse 的级别(作为目标示例的代码)(除以下之外的所有级别)。有什么建议么?
df$`Province group` %<>% fct_collapse(df$Province, `Smaller provinces` = !c("Alberta", "British Columbia", "Ontario", "Quebec"))
我对您在此处使用的某些语法感到有点困惑,但此解决方案应该适合您!它使用 dplyr 的管道结构,并在变量名中使用下划线而不是空格(即 variable_name 而不是“变量名”)
library(dplyr)
library(forcats)
#What I imagine your df$Province variable looks like
df <- tibble(Province = rep(c("Ontario", "Alberta", "Quebec", "British Columbia", "PEI", "Manitoba", "Nova Scotia"), 10))
#Define your big provinces in this vector
big_provinces <- c("Ontario", "Alberta", "Quebec", "British Columbia")
#Modify the dataset (i.e. do the fct_collapse)
df %>%
mutate(Province_group = fct_collapse(
Province, #For the variable "Province"
"Smaller provinces" = unique(Province[!(Province %in% big_provinces)]) #"Smaller provinces" is any province not in the vector big_province.
) #end of fct_collapse
) #mutate
如果"Provinces"是因子变量,您需要先将其转换为字符变量。
P.S。来自魁北克的问候
这是一个使用 levels
来获取因子水平的解决方案。然后,通过取反 %in%
来对不折叠的值进行子集化。
首先重新创建 in user @R me matey 的答案。
library(magrittr)
library(dplyr)
library(forcats)
df <- tibble(Province = rep(c("Ontario", "Alberta", "Quebec", "British Columbia", "PEI", "Manitoba", "Nova Scotia"), 10))
df$Province <- factor(df$Province)
现在是问题。
big_provinces <- c("Alberta", "British Columbia", "Ontario", "Quebec")
df %<>%
mutate(Province = fct_collapse(Province, `Smaller provinces` = levels(Province)[!levels(Province) %in% big_provinces]))
df
## A tibble: 70 x 1
# Province
# <fct>
# 1 Ontario
# 2 Alberta
# 3 Quebec
# 4 British Columbia
# 5 Smaller provinces
# 6 Smaller provinces
# 7 Smaller provinces
# 8 Ontario
# 9 Alberta
#10 Quebec
## ... with 60 more rows
fct_lump是这道题的最佳解法(只是因为题目的逻辑是否定4个大n省份)。如果有人找到比 Rui Barradas 更短的解决方案,我仍然会对未来的因子工作感兴趣。
df%>%
mutate(`Compared to smaller provinces` = fct_lump(Province, n = 4)) %>%
count(`Compared to smaller provinces`)
这会产生 5 个组,其中“其他”是所有其他 n 个较小的响应省份。
我有一个不应该折叠的级别列表("Alberta"、"British Columbia"、"Ontario"、"Quebec")比应该(所有其他) ).我无法否定 fct_collapse 的级别(作为目标示例的代码)(除以下之外的所有级别)。有什么建议么?
df$`Province group` %<>% fct_collapse(df$Province, `Smaller provinces` = !c("Alberta", "British Columbia", "Ontario", "Quebec"))
我对您在此处使用的某些语法感到有点困惑,但此解决方案应该适合您!它使用 dplyr 的管道结构,并在变量名中使用下划线而不是空格(即 variable_name 而不是“变量名”)
library(dplyr)
library(forcats)
#What I imagine your df$Province variable looks like
df <- tibble(Province = rep(c("Ontario", "Alberta", "Quebec", "British Columbia", "PEI", "Manitoba", "Nova Scotia"), 10))
#Define your big provinces in this vector
big_provinces <- c("Ontario", "Alberta", "Quebec", "British Columbia")
#Modify the dataset (i.e. do the fct_collapse)
df %>%
mutate(Province_group = fct_collapse(
Province, #For the variable "Province"
"Smaller provinces" = unique(Province[!(Province %in% big_provinces)]) #"Smaller provinces" is any province not in the vector big_province.
) #end of fct_collapse
) #mutate
如果"Provinces"是因子变量,您需要先将其转换为字符变量。
P.S。来自魁北克的问候
这是一个使用 levels
来获取因子水平的解决方案。然后,通过取反 %in%
来对不折叠的值进行子集化。
首先重新创建
library(magrittr)
library(dplyr)
library(forcats)
df <- tibble(Province = rep(c("Ontario", "Alberta", "Quebec", "British Columbia", "PEI", "Manitoba", "Nova Scotia"), 10))
df$Province <- factor(df$Province)
现在是问题。
big_provinces <- c("Alberta", "British Columbia", "Ontario", "Quebec")
df %<>%
mutate(Province = fct_collapse(Province, `Smaller provinces` = levels(Province)[!levels(Province) %in% big_provinces]))
df
## A tibble: 70 x 1
# Province
# <fct>
# 1 Ontario
# 2 Alberta
# 3 Quebec
# 4 British Columbia
# 5 Smaller provinces
# 6 Smaller provinces
# 7 Smaller provinces
# 8 Ontario
# 9 Alberta
#10 Quebec
## ... with 60 more rows
fct_lump是这道题的最佳解法(只是因为题目的逻辑是否定4个大n省份)。如果有人找到比 Rui Barradas 更短的解决方案,我仍然会对未来的因子工作感兴趣。
df%>%
mutate(`Compared to smaller provinces` = fct_lump(Province, n = 4)) %>%
count(`Compared to smaller provinces`)
这会产生 5 个组,其中“其他”是所有其他 n 个较小的响应省份。