R - 使用 stringr::str_split 合并来自两个列表的唯一值

Question

我有一个函数，当给定一个字符串列表时，应该 return 一个包含 N 大小的所有唯一字符串的向量。

get_unique <- function (input_list, size = 3) {
   output = c()

   for (input in input_list) {
    current = stringr::str_replace(input, "[-_\s]", "")
    current = trimws(gsub(paste0("(.{",size,"})"), "\1 ", current))
    parts = stringr::str_split(current, "\s", simplify = TRUE)[1,]
    output = union(output, parts)
   }

   return(output)
}

我的期望是：

get_unique(c("ABC", "ABCDEF", "GHIDEF"))

[1] "ABC" "DEF" "GHI"

但我得到的是：

get_unique(c("ABC", "ABCDEF", "GHIDEF"))

[[1]]
[1] "ABC"

[[2]]
[1] "DEF"

[[3]]
[1] "GHI"

我是 R 的新手，所以我很难理解哪里出了问题。

Answer 1

最后我们可以用unlist

get_unique <- function (input_list, size = 3) {
  output = c()

  for (input in input_list) {
     current = stringr::str_replace(input, "[-_\s]", "")
     current = trimws(gsub(paste0("(.{",size,"})"), "\1 ", current))
    parts = stringr::str_split(current, "\s", simplify = TRUE)[1,]
    output = union(output, parts)
  }

  return(unlist(output))
 }

get_unique(c("ABC", "ABCDEF", "GHIDEF"))
#[1] "ABC" "DEF" "GHI"

我们也可以在一行中使用正则表达式环视每 3 个字符拆分一次

unique(unlist(strsplit(v1, "(?<=...)", perl = TRUE)))
#[1] "ABC" "DEF" "GHI"

数据

v1 <- c("ABC", "ABCDEF", "GHIDEF")

Answer 2

完整 baseR 解决方案，使用 substr:

get_unique <- function(v) unique(unlist(sapply(v, function(x) sapply(1:(nchar(x)/3), function(y) substr(x, 3*(y-1)+1, 3*y) ))))

get_unique(v1)
[1] "ABC" "DEF" "GHI"

substr(x, 3*(y-1)+1, 3*y) 从 x 中抓取 3 个字符子串。

R - 使用 stringr::str_split 合并来自两个列表的唯一值

R - Merge unique values from two lists using stringr::str_split

r

strsplit

stringr

数据