如何使用 case_when 和 grep 一起定义一个新变量
how to use case_when and grep together to define a new varaible
我有一个这样的数据,
可以使用代码构建:
df<-structure(list(Gender = c("M", "F", "M", "F", "F"), Location = c("Cleveland, OH",
"New Olreans, LA", "Chicago, IL", "Strongsville, OH", "Boston, MA"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
我想建立变量“comment”如下:
规则是:
如果 Gender=="F" 并且我们在 Location 中找到了 "OH",那么评论 ="Female in OH"
如果 Gender=="F" 并且我们在 Location 中找不到 "OH",则评论 ="Female in Other"
如果 Gender=="M" 并且我们在 Location 中找到 "OH",则评论 ="Male in OH"
如果 Gender=="M" 并且我们在 Location 中找不到 "OH",则评论 ="Male in Other"
所以我的代码是
df<-df %>%
mutate(Comment = case_when(Gender=="F" & grep("OH", df$Location)~"Female in OH",
Gender=="F" & !grep("OH", df$Location)~ "Female in Other",
Gender=="M" & grep("OH", df$Location2)~ "Male in OH",
Gender=="M" & !grep("OH", df$Location)~ "Male in other)",
TRUE~NA))
不行。谁能给我一些指导?
使用 grepl
而不是 grep
来获取布尔 TRUE/FALSE 值而不是索引。例如(以及修复其他拼写错误)
df %>%
mutate(Comment = case_when(Gender=="F" & grepl("OH", Location)~"Female in OH",
Gender=="F" & !grepl("OH", Location)~ "Female in Other",
Gender=="M" & grepl("OH", Location)~ "Male in OH",
Gender=="M" & !grepl("OH", Location)~ "Male in other"))
我去掉了 NA 部分,因为你涵盖了所有情况,NA 是没有其他匹配项出现时的默认值。但是如果你明确需要它,那么你应该为字符使用 NA 的类型化版本。
df %>%
mutate(Comment = case_when(Gender=="F" & grepl("OH", Location)~"Female in OH",
Gender=="F" & !grepl("OH", Location)~ "Female in Other",
Gender=="M" & grepl("OH", Location)~ "Male in OH",
Gender=="M" & !grepl("OH", Location)~ "Male in other",
TRUE~NA_character_))
我认为这可以稍微简化而不是检查所有可能的条件。
vec <- c('M' = 'Male', 'F' = 'Female')
transform(df, Comment = paste(vec[Gender],
ifelse(grepl('OH', Location), 'in OH', 'in Other')))
# Gender Location Comment
# <chr> <chr> <chr>
#1 M Cleveland, OH Male in OH
#2 F New Olreans, LA Female in Other
#3 M Chicago, IL Male in other
#4 F Strongsville, OH Female in OH
#5 F Boston, MA Female in Other
这实际上只是@Ronak Shah 回答的变体。
状态缩写是用 str_extract
和 "OH"
发现的,因为焦点状态是参数化的。
gender_vec <- c('M' = 'Male', 'F' = 'Female')
state_map <- function(s, target = "OH") if_else(s == target, s, "Other")
df %>%
mutate(Comment = str_c(recode(Gender, !!!gender_vec), "in",
state_map(str_extract(Location, "(\w{2})$")),
sep = " "))
我有一个这样的数据,
可以使用代码构建:
df<-structure(list(Gender = c("M", "F", "M", "F", "F"), Location = c("Cleveland, OH",
"New Olreans, LA", "Chicago, IL", "Strongsville, OH", "Boston, MA"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
我想建立变量“comment”如下:
规则是: 如果 Gender=="F" 并且我们在 Location 中找到了 "OH",那么评论 ="Female in OH" 如果 Gender=="F" 并且我们在 Location 中找不到 "OH",则评论 ="Female in Other" 如果 Gender=="M" 并且我们在 Location 中找到 "OH",则评论 ="Male in OH" 如果 Gender=="M" 并且我们在 Location 中找不到 "OH",则评论 ="Male in Other"
所以我的代码是
df<-df %>%
mutate(Comment = case_when(Gender=="F" & grep("OH", df$Location)~"Female in OH",
Gender=="F" & !grep("OH", df$Location)~ "Female in Other",
Gender=="M" & grep("OH", df$Location2)~ "Male in OH",
Gender=="M" & !grep("OH", df$Location)~ "Male in other)",
TRUE~NA))
不行。谁能给我一些指导?
使用 grepl
而不是 grep
来获取布尔 TRUE/FALSE 值而不是索引。例如(以及修复其他拼写错误)
df %>%
mutate(Comment = case_when(Gender=="F" & grepl("OH", Location)~"Female in OH",
Gender=="F" & !grepl("OH", Location)~ "Female in Other",
Gender=="M" & grepl("OH", Location)~ "Male in OH",
Gender=="M" & !grepl("OH", Location)~ "Male in other"))
我去掉了 NA 部分,因为你涵盖了所有情况,NA 是没有其他匹配项出现时的默认值。但是如果你明确需要它,那么你应该为字符使用 NA 的类型化版本。
df %>%
mutate(Comment = case_when(Gender=="F" & grepl("OH", Location)~"Female in OH",
Gender=="F" & !grepl("OH", Location)~ "Female in Other",
Gender=="M" & grepl("OH", Location)~ "Male in OH",
Gender=="M" & !grepl("OH", Location)~ "Male in other",
TRUE~NA_character_))
我认为这可以稍微简化而不是检查所有可能的条件。
vec <- c('M' = 'Male', 'F' = 'Female')
transform(df, Comment = paste(vec[Gender],
ifelse(grepl('OH', Location), 'in OH', 'in Other')))
# Gender Location Comment
# <chr> <chr> <chr>
#1 M Cleveland, OH Male in OH
#2 F New Olreans, LA Female in Other
#3 M Chicago, IL Male in other
#4 F Strongsville, OH Female in OH
#5 F Boston, MA Female in Other
这实际上只是@Ronak Shah 回答的变体。
状态缩写是用 str_extract
和 "OH"
发现的,因为焦点状态是参数化的。
gender_vec <- c('M' = 'Male', 'F' = 'Female')
state_map <- function(s, target = "OH") if_else(s == target, s, "Other")
df %>%
mutate(Comment = str_c(recode(Gender, !!!gender_vec), "in",
state_map(str_extract(Location, "(\w{2})$")),
sep = " "))