fct_collapse 的否定级别

negating levels for fct_collapse

我有一个不应该折叠的级别列表("Alberta"、"British Columbia"、"Ontario"、"Quebec")比应该(所有其他) ).我无法否定 fct_collapse 的级别(作为目标示例的代码)(除以下之外的所有级别)。有什么建议么?

df$`Province group` %<>% fct_collapse(df$Province, `Smaller provinces` = !c("Alberta", "British Columbia", "Ontario", "Quebec"))

我对您在此处使用的某些语法感到有点困惑,但此解决方案应该适合您!它使用 dplyr 的管道结构,并在变量名中使用下划线而不是空格(即 variable_name 而不是“变量名”)

    library(dplyr)
    library(forcats)

    #What I imagine your df$Province variable looks like
    df <- tibble(Province = rep(c("Ontario", "Alberta", "Quebec", "British Columbia", "PEI", "Manitoba", "Nova Scotia"), 10))

    #Define your big provinces in this vector
    big_provinces <- c("Ontario", "Alberta", "Quebec", "British Columbia")

    #Modify the dataset (i.e. do the fct_collapse)
    df %>%
      mutate(Province_group =  fct_collapse(
                 Province, #For the variable "Province"
                 "Smaller provinces" = unique(Province[!(Province %in% big_provinces)]) #"Smaller provinces" is any province not in the vector big_province.
                 ) #end of fct_collapse
             ) #mutate

如果"Provinces"是因子变量,您需要先将其转换为字符变量。

P.S。来自魁北克的问候

这是一个使用 levels 来获取因子水平的解决方案。然后,通过取反 %in% 来对不折叠的值进行子集化。

首先重新创建 in user @R me matey 的答案。

library(magrittr)
library(dplyr)
library(forcats)

df <- tibble(Province = rep(c("Ontario", "Alberta", "Quebec", "British Columbia", "PEI", "Manitoba", "Nova Scotia"), 10))
df$Province <- factor(df$Province)

现在是问题。

big_provinces <- c("Alberta", "British Columbia", "Ontario", "Quebec")

df %<>%
  mutate(Province = fct_collapse(Province, `Smaller provinces` = levels(Province)[!levels(Province) %in% big_provinces]))

df
## A tibble: 70 x 1
#   Province         
#   <fct>            
# 1 Ontario          
# 2 Alberta          
# 3 Quebec           
# 4 British Columbia 
# 5 Smaller provinces
# 6 Smaller provinces
# 7 Smaller provinces
# 8 Ontario          
# 9 Alberta          
#10 Quebec           
## ... with 60 more rows

fct_lump是这道题的最佳解法(只是因为题目的逻辑是否定4个大n省份)。如果有人找到比 Rui Barradas 更短的解决方案,我仍然会对未来的因子工作感兴趣。

df%>%
  mutate(`Compared to smaller provinces` = fct_lump(Province, n = 4)) %>%
  count(`Compared to smaller provinces`)

这会产生 5 个组,其中“其他”是所有其他 n 个较小的响应省份。