拆分字符串并连接删除 R 中的整个单词
Split string and concatenate removing whole word in R
我正在尝试从包含由“/”不同学科知识连接的字符串中删除“艺术与人文”和“社会科学”这两个词,如下所示:
string = "Arts and Humanities Other Topics/Social Sciences Other Topics/Arts and Humanities/Social Sciences/Sociology"
我已经尝试使用 stringr
包:
sapply(strsplit(string, "/"), function(x) paste(str_remove(x, "\bArts and Humanities\b|\bSocial Sciences\b"), collapse = "/"))
但是生成的输出是 " Other Topics/ Other Topics///Sociology"
,我需要这样的输出:
"Arts and Humanities Other Topics/Social Sciences Other Topics/Sociology"
提前致谢。
一种方法是将整个字符串分开,然后排除您不感兴趣的部分:
paste0(unlist(strsplit(string, '/'))[!unlist(strsplit(string, '/')) %in% c("Arts and Humanities", "Social Sciences")],
collapse = '/')
或
paste0(base::setdiff(unlist(strsplit(string, '/')),
c("Arts and Humanities", "Social Sciences")), collapse = '/')
#"Arts and Humanities Other Topics/Social Science Other Topics/Sociology"
只需要一点点调整,现在 strings
可以概括为这样的字符串的 向量 :
解决方案
sapply(
# Split each string by "/" into its components.
X = strsplit(x = strings, split = "/"),
# Remove undesired components and then reassemble the strings.
FUN = function(v){paste0(
# Use subscripting to filter out matches.
v[!grepl(x = v, pattern = "^\s*(Arts and Humanities|Social Sciences)\s*$")],
# Reassemble components as separated by "/".
collapse = "/"
)},
# Make the result a vector like the original 'string' (rather than a list).
simplify = TRUE,
USE.NAMES = FALSE
)
结果
给定一个 strings
这样的向量
strings <- c(
"Arts and Humanities Other Topics/Social Sciences Other Topics/Arts and Humanities/Social Sciences/Sociology",
"Sociology/Arts and Humanities"
)
此解决方案应产生以下结果:
[1] "Arts and Humanities Other Topics/Social Sciences Other Topics/Sociology"
[2] "Sociology"
备注
使用 unlist()
的解决方案会将所有内容折叠成一个巨大的字符串,而不是重新组合 strings
中的每个字符串。
我正在尝试从包含由“/”不同学科知识连接的字符串中删除“艺术与人文”和“社会科学”这两个词,如下所示:
string = "Arts and Humanities Other Topics/Social Sciences Other Topics/Arts and Humanities/Social Sciences/Sociology"
我已经尝试使用 stringr
包:
sapply(strsplit(string, "/"), function(x) paste(str_remove(x, "\bArts and Humanities\b|\bSocial Sciences\b"), collapse = "/"))
但是生成的输出是 " Other Topics/ Other Topics///Sociology"
,我需要这样的输出:
"Arts and Humanities Other Topics/Social Sciences Other Topics/Sociology"
提前致谢。
一种方法是将整个字符串分开,然后排除您不感兴趣的部分:
paste0(unlist(strsplit(string, '/'))[!unlist(strsplit(string, '/')) %in% c("Arts and Humanities", "Social Sciences")],
collapse = '/')
或
paste0(base::setdiff(unlist(strsplit(string, '/')),
c("Arts and Humanities", "Social Sciences")), collapse = '/')
#"Arts and Humanities Other Topics/Social Science Other Topics/Sociology"
只需要一点点调整,现在 strings
可以概括为这样的字符串的 向量 :
解决方案
sapply(
# Split each string by "/" into its components.
X = strsplit(x = strings, split = "/"),
# Remove undesired components and then reassemble the strings.
FUN = function(v){paste0(
# Use subscripting to filter out matches.
v[!grepl(x = v, pattern = "^\s*(Arts and Humanities|Social Sciences)\s*$")],
# Reassemble components as separated by "/".
collapse = "/"
)},
# Make the result a vector like the original 'string' (rather than a list).
simplify = TRUE,
USE.NAMES = FALSE
)
结果
给定一个 strings
这样的向量
strings <- c(
"Arts and Humanities Other Topics/Social Sciences Other Topics/Arts and Humanities/Social Sciences/Sociology",
"Sociology/Arts and Humanities"
)
此解决方案应产生以下结果:
[1] "Arts and Humanities Other Topics/Social Sciences Other Topics/Sociology"
[2] "Sociology"
备注
使用 unlist()
的解决方案会将所有内容折叠成一个巨大的字符串,而不是重新组合 strings
中的每个字符串。