你能在 R 中结合 case_when 和 startsWith 进行复杂分组吗
Can you combine case_when and startsWith in R for complex groupings
我正在尝试将一些与字符串具有相似开头的复杂类别组合在一起。
这是第一个 case_when 子句的示例,您可以看到它很长(为简洁起见,我已经编辑了字符串)
有没有办法编写一个 case_when 语句,将所有以 'Conditions' 开头的值分组? (这也适用于以 'Mental health etc etc' 开头的其余子句。)
谢谢大家!
mutate(condition=case_when(health_conditions == 'Conditions ABC' | health_conditions == 'Conditions DEF' | health_conditions =='HIJ' | health_conditions == 'Conditions KLM, Parkinsons)' | health_conditions == 'Conditions NOP' ~ 'Conditions')
我们可以使用带有 grepl/str_detect
的正则表达式来组合这些情况
library(dplyr)
library(stringr)
df1 %>%
mutate(condition = case_when(str_detect(health_conditions,
"^Conditions")|health_conditions == "HIJ" ~ 'Conditions'))
或者另一种选择是startsWith
df1 %>%
mutate(condition = case_when(startsWith(health_conditions,
"Conditions")|health_conditions == "HIJ" ~ "Conditions"))
case_when()
有点代码味道。以下基本 R 习语更简单,使用 %in%
:
conds <- c("Conditions DEF", "HIJ") # add extra as required
df1$condition[df1$health_conditions %in% conds] <- "Condition"
或者,正如其他答案中所建议的,正则表达式可能会有所帮助:
df1$condition[grepl("Conditions", df1$health_conditions)] <- "Condition"
我正在尝试将一些与字符串具有相似开头的复杂类别组合在一起。
这是第一个 case_when 子句的示例,您可以看到它很长(为简洁起见,我已经编辑了字符串)
有没有办法编写一个 case_when 语句,将所有以 'Conditions' 开头的值分组? (这也适用于以 'Mental health etc etc' 开头的其余子句。)
谢谢大家!
mutate(condition=case_when(health_conditions == 'Conditions ABC' | health_conditions == 'Conditions DEF' | health_conditions =='HIJ' | health_conditions == 'Conditions KLM, Parkinsons)' | health_conditions == 'Conditions NOP' ~ 'Conditions')
我们可以使用带有 grepl/str_detect
的正则表达式来组合这些情况
library(dplyr)
library(stringr)
df1 %>%
mutate(condition = case_when(str_detect(health_conditions,
"^Conditions")|health_conditions == "HIJ" ~ 'Conditions'))
或者另一种选择是startsWith
df1 %>%
mutate(condition = case_when(startsWith(health_conditions,
"Conditions")|health_conditions == "HIJ" ~ "Conditions"))
case_when()
有点代码味道。以下基本 R 习语更简单,使用 %in%
:
conds <- c("Conditions DEF", "HIJ") # add extra as required
df1$condition[df1$health_conditions %in% conds] <- "Condition"
或者,正如其他答案中所建议的,正则表达式可能会有所帮助:
df1$condition[grepl("Conditions", df1$health_conditions)] <- "Condition"