R:在检测具有多个条件的字符串并替换它们时,有没有 case_when() 的完美替代方案?

R: any perfect alternative to case_when() when detecting strings with multiple conditions and replacing them?

我将 case_when 应用于数千行的文本数据以检测具有多个条件的字符串并替换它们但得到了错误的结果,因为 case_when 不执行一次条件的剩余条件被满足。我在看到了一个解决方案,但是这个解决方案并没有像我的数据中的多重条件

任何 case_when 的替代方案都将不胜感激。

这是虚拟数据:

statement <- structure(list(stmt = c("diabetes is common", "police not my friend"
  "transport is cheap", "english is my language", "education is my right")), 
  class = "data.frame", row.names = c(NA, -5L))

我尝试采用 中的第一个解决方案,但无法真正弄明白。

我想检测列 stmt 中文本中的字符串并将该列重新编码为这五个域:APCPDPAPGAAPPSDP。下面是要检测的字符串:

APC <- c("addiction|mental||Diabetes|health|healthy|Oranga|unwell|AOD| well| surgery|dental|recovery|oranga|Mirimiri|asthma|anger|checks|alcohol|pregnant|clinical|clinic")

PDP <- c("whanau direct|whānau direct|money|transport|home|repairs|social|budget|job|housing|house|financial|finance|Ohanga|furniture|accommodation|welfare|living|work|babies arrival|AT hop card|Entitlements|ohunga|bills|electricity|water|employment")

APGA <- c("Kaupapa|Te reo|language|Tikanga|Iwi|relationship|Tikinga|Reunite|")

APP <- c("Studying|training|NCEA|ECE|Counseling|counsel|Knowledge|School|Education|matauranga|parenting|skills")

rangatiratanga <- c("self-management|Rangitiratanga|custody|police|court|CYFS|advocacy|Oranga Tamariki|rangatiratanga|section 101|EPOA|Familly issues")

您可以将 case_whengrepl 和正则表达式交替使用:

statement$col <- case_when(
    grepl("(addiction|mental|Diabetes|health|healthy|Oranga|unwell|AOD| well| surgery|dental|recovery|oranga|Mirimiri|asthma|anger|checks|alcohol|pregnant|clinical|clinic)", statement$stmt) ~ "APC",
    grepl("(whanau direct|whānau direct|money|transport|home|repairs|social|budget|job|housing|house|financial|finance|Ohanga|furniture|accommodation|welfare|living|work|babies arrival|AT hop card|Entitlements|ohunga|bills|electricity|water|employment)", statement$stmt) ~ "PDP",
    grepl("(Kaupapa|Te reo|language|Tikanga|Iwi|relationship|Tikinga|Reunite)", statement$stmt) ~ "APGA",
    grepl("(Studying|training|NCEA|ECE|Counseling|counsel|Knowledge|School|Education|matauranga|parenting|skills)", statement$stmt) ~ "APP",
    grepl("(self-management|Rangitiratanga|custody|police|court|CYFS|advocacy|Oranga Tamariki|rangatiratanga|section 101|EPOA|Familly issues)", statement$stmt) ~ "rangatiratanga",
    TRUE ~ NA_character_
)

感谢@Tim Biegeleisen,但如果不忽略大小写,通常使用 case_when()grepl() 检测字符串可能会抛出错误。 grepl() 可以包含 ignore.case = T 参数以使字符串匹配不区分大小写,例如在下面的代码中:

statement$col <- case_when(
      grepl(ignore.case = T, "(addiction|mental|Diabetes|health|healthy|Oranga|unwell|AOD| well| surgery|dental|recovery|oranga|Mirimiri|asthma|anger|checks|alcohol|pregnant|clinical|clinic)", statement$stmt) ~ "APC",
      grepl(ignore.case = T, "(whanau direct|whānau direct|money|transport|home|repairs|social|budget|job|housing|house|financial|finance|Ohanga|furniture|accommodation|welfare|living|work|babies arrival|AT hop card|Entitlements|ohunga|bills|electricity|water|employment)", statement$stmt) ~ "PDP",
      grepl(ignore.case = T, "(Kaupapa|Te reo|language|Tikanga|Iwi|relationship|Tikinga|Reunite)", statement$stmt) ~ "APGA",
      grepl(ignore.case = T, "(Studying|training|NCEA|ECE|Counseling|counsel|Knowledge|School|Education|matauranga|parenting|skills)", statement$stmt) ~ "APP",
      grepl(ignore.case = T, "(self-management|Rangitiratanga|custody|police|court|CYFS|advocacy|Oranga Tamariki|rangatiratanga|section 101|EPOA|Familly issues)", statement$stmt) ~ "rangatiratanga",
      TRUE ~ NA_character_
    )