R - 使用 stringr::str_split 合并来自两个列表的唯一值
R - Merge unique values from two lists using stringr::str_split
我有一个函数,当给定一个字符串列表时,应该 return 一个包含 N 大小的所有唯一字符串的向量。
get_unique <- function (input_list, size = 3) {
output = c()
for (input in input_list) {
current = stringr::str_replace(input, "[-_\s]", "")
current = trimws(gsub(paste0("(.{",size,"})"), "\1 ", current))
parts = stringr::str_split(current, "\s", simplify = TRUE)[1,]
output = union(output, parts)
}
return(output)
}
我的期望是:
get_unique(c("ABC", "ABCDEF", "GHIDEF"))
[1] "ABC" "DEF" "GHI"
但我得到的是:
get_unique(c("ABC", "ABCDEF", "GHIDEF"))
[[1]]
[1] "ABC"
[[2]]
[1] "DEF"
[[3]]
[1] "GHI"
我是 R 的新手,所以我很难理解哪里出了问题。
最后我们可以用unlist
get_unique <- function (input_list, size = 3) {
output = c()
for (input in input_list) {
current = stringr::str_replace(input, "[-_\s]", "")
current = trimws(gsub(paste0("(.{",size,"})"), "\1 ", current))
parts = stringr::str_split(current, "\s", simplify = TRUE)[1,]
output = union(output, parts)
}
return(unlist(output))
}
get_unique(c("ABC", "ABCDEF", "GHIDEF"))
#[1] "ABC" "DEF" "GHI"
我们也可以在一行中使用正则表达式环视每 3 个字符拆分一次
unique(unlist(strsplit(v1, "(?<=...)", perl = TRUE)))
#[1] "ABC" "DEF" "GHI"
数据
v1 <- c("ABC", "ABCDEF", "GHIDEF")
完整 baseR
解决方案,使用 substr
:
get_unique <- function(v) unique(unlist(sapply(v, function(x) sapply(1:(nchar(x)/3), function(y) substr(x, 3*(y-1)+1, 3*y) ))))
get_unique(v1)
[1] "ABC" "DEF" "GHI"
substr(x, 3*(y-1)+1, 3*y)
从 x 中抓取 3 个字符子串。
我有一个函数,当给定一个字符串列表时,应该 return 一个包含 N 大小的所有唯一字符串的向量。
get_unique <- function (input_list, size = 3) {
output = c()
for (input in input_list) {
current = stringr::str_replace(input, "[-_\s]", "")
current = trimws(gsub(paste0("(.{",size,"})"), "\1 ", current))
parts = stringr::str_split(current, "\s", simplify = TRUE)[1,]
output = union(output, parts)
}
return(output)
}
我的期望是:
get_unique(c("ABC", "ABCDEF", "GHIDEF"))
[1] "ABC" "DEF" "GHI"
但我得到的是:
get_unique(c("ABC", "ABCDEF", "GHIDEF"))
[[1]]
[1] "ABC"
[[2]]
[1] "DEF"
[[3]]
[1] "GHI"
我是 R 的新手,所以我很难理解哪里出了问题。
最后我们可以用unlist
get_unique <- function (input_list, size = 3) {
output = c()
for (input in input_list) {
current = stringr::str_replace(input, "[-_\s]", "")
current = trimws(gsub(paste0("(.{",size,"})"), "\1 ", current))
parts = stringr::str_split(current, "\s", simplify = TRUE)[1,]
output = union(output, parts)
}
return(unlist(output))
}
get_unique(c("ABC", "ABCDEF", "GHIDEF"))
#[1] "ABC" "DEF" "GHI"
我们也可以在一行中使用正则表达式环视每 3 个字符拆分一次
unique(unlist(strsplit(v1, "(?<=...)", perl = TRUE)))
#[1] "ABC" "DEF" "GHI"
数据
v1 <- c("ABC", "ABCDEF", "GHIDEF")
完整 baseR
解决方案,使用 substr
:
get_unique <- function(v) unique(unlist(sapply(v, function(x) sapply(1:(nchar(x)/3), function(y) substr(x, 3*(y-1)+1, 3*y) ))))
get_unique(v1)
[1] "ABC" "DEF" "GHI"
substr(x, 3*(y-1)+1, 3*y)
从 x 中抓取 3 个字符子串。