如何使用 case_when 和 grep 一起定义一个新变量

how to use case_when and grep together to define a new varaible

我有一个这样的数据,

可以使用代码构建:

df<-structure(list(Gender = c("M", "F", "M", "F", "F"), Location = c("Cleveland, OH", 
"New Olreans, LA", "Chicago, IL", "Strongsville, OH", "Boston, MA"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))

我想建立变量“comment”如下:

规则是: 如果 Gender=="F" 并且我们在 Location 中找到了 "OH",那么评论 ="Female in OH" 如果 Gender=="F" 并且我们在 Location 中找不到 "OH",则评论 ="Female in Other" 如果 Gender=="M" 并且我们在 Location 中找到 "OH",则评论 ="Male in OH" 如果 Gender=="M" 并且我们在 Location 中找不到 "OH",则评论 ="Male in Other"

所以我的代码是

 df<-df %>% 
     mutate(Comment = case_when(Gender=="F" & grep("OH", df$Location)~"Female in OH",
                            Gender=="F" & !grep("OH", df$Location)~ "Female in Other",                        
                            Gender=="M" & grep("OH", df$Location2)~ "Male in OH",
                            Gender=="M" & !grep("OH", df$Location)~ "Male in other)",
                            TRUE~NA))

不行。谁能给我一些指导?

使用 grepl 而不是 grep 来获取布尔 TRUE/FALSE 值而不是索引。例如(以及修复其他拼写错误)

df %>% 
     mutate(Comment = case_when(Gender=="F" & grepl("OH", Location)~"Female in OH",
                            Gender=="F" & !grepl("OH", Location)~ "Female in Other",                        
                            Gender=="M" & grepl("OH", Location)~ "Male in OH",
                            Gender=="M" & !grepl("OH", Location)~ "Male in other"))

我去掉了 NA 部分,因为你涵盖了所有情况,NA 是没有其他匹配项出现时的默认值。但是如果你明确需要它,那么你应该为字符使用 NA 的类型化版本。

df %>% 
  mutate(Comment = case_when(Gender=="F" & grepl("OH", Location)~"Female in OH",
                             Gender=="F" & !grepl("OH", Location)~ "Female in Other",                        
                             Gender=="M" & grepl("OH", Location)~ "Male in OH",
                             Gender=="M" & !grepl("OH", Location)~ "Male in other",
                             TRUE~NA_character_))

我认为这可以稍微简化而不是检查所有可能的条件。

vec <- c('M' = 'Male', 'F' = 'Female')

transform(df, Comment = paste(vec[Gender], 
                       ifelse(grepl('OH', Location), 'in OH', 'in Other')))

#  Gender Location         Comment        
#  <chr>  <chr>            <chr>          
#1 M      Cleveland, OH    Male in OH     
#2 F      New Olreans, LA  Female in Other
#3 M      Chicago, IL      Male in other  
#4 F      Strongsville, OH Female in OH   
#5 F      Boston, MA       Female in Other

这实际上只是@Ronak Shah 回答的变体。
状态缩写是用 str_extract"OH" 发现的,因为焦点状态是参数化的。

gender_vec <- c('M' = 'Male', 'F' = 'Female')
state_map <- function(s, target = "OH") if_else(s == target, s, "Other")

df %>%
  mutate(Comment = str_c(recode(Gender, !!!gender_vec), "in", 
                         state_map(str_extract(Location, "(\w{2})$")), 
                         sep = " "))