拆分字符串并连接删除 R 中的整个单词

Split string and concatenate removing whole word in R

我正在尝试从包含由“/”不同学科知识连接的字符串中删除“艺术与人文”和“社会科学”这两个词,如下所示:

string = "Arts and Humanities Other Topics/Social Sciences Other Topics/Arts and Humanities/Social Sciences/Sociology"

我已经尝试使用 stringr 包:

sapply(strsplit(string, "/"), function(x) paste(str_remove(x, "\bArts and Humanities\b|\bSocial Sciences\b"), collapse = "/"))

但是生成的输出是 " Other Topics/ Other Topics///Sociology",我需要这样的输出:

"Arts and Humanities Other Topics/Social Sciences Other Topics/Sociology"

提前致谢。

一种方法是将整个字符串分开,然后排除您不感兴趣的部分:

paste0(unlist(strsplit(string, '/'))[!unlist(strsplit(string, '/')) %in% c("Arts and Humanities", "Social Sciences")],
      collapse = '/')

paste0(base::setdiff(unlist(strsplit(string, '/')),
        c("Arts and Humanities", "Social Sciences")), collapse = '/')

#"Arts and Humanities Other Topics/Social Science Other Topics/Sociology"

只需要一点点调整,现在 strings 可以概括为这样的字符串的 向量

解决方案

sapply(
  # Split each string by "/" into its components.
  X = strsplit(x = strings, split = "/"),
  # Remove undesired components and then reassemble the strings.
  FUN = function(v){paste0(
    # Use subscripting to filter out matches.
    v[!grepl(x = v, pattern = "^\s*(Arts and Humanities|Social Sciences)\s*$")],
    # Reassemble components as separated by "/".
    collapse = "/"
  )},
  
  # Make the result a vector like the original 'string' (rather than a list).
  simplify = TRUE,
  USE.NAMES = FALSE
)

结果

给定一个 strings 这样的向量

strings <- c(
  "Arts and Humanities Other Topics/Social Sciences Other Topics/Arts and Humanities/Social Sciences/Sociology",
  "Sociology/Arts and Humanities"
)

此解决方案应产生以下结果:

[1] "Arts and Humanities Other Topics/Social Sciences Other Topics/Sociology"
[2] "Sociology"

备注

使用 unlist() 的解决方案会将所有内容折叠成一个巨大的字符串,而不是重新组合 strings 中的每个字符串。