如何在向量中的每个字符串中只保留唯一的单词
How do keep only unique words within each string in a vector
我有这样的数据:
vector = c("hello I like to code hello","Coding is fun", "fun fun fun")
我想删除重复的单词(space 分隔)即输出应该看起来像
vector_cleaned
[1] "hello I like to code"
[2] "coding is fun"
[3] "fun"
将其拆分(strsplit
空格),使用 unique
(在 lapply
中),然后 paste
将其重新组合在一起:
vapply(lapply(strsplit(vector, " "), unique), paste, character(1L), collapse = " ")
# [1] "hello i like to code" "coding is fun" "fun"
## OR
vapply(strsplit(vector, " "), function(x) paste(unique(x), collapse = " "), character(1L))
根据评论更新
您始终可以编写自定义函数以与您的 vapply
函数一起使用。例如,这里有一个函数,它接受一个拆分字符串,删除比一定数量的字符短的字符串,并将 "unique" 设置作为用户选择。
myFun <- function(x, minLen = 3, onlyUnique = TRUE) {
a <- if (isTRUE(onlyUnique)) unique(x) else x
paste(a[nchar(a) > minLen], collapse = " ")
}
比较下面的输出,看看它是如何工作的。
vapply(strsplit(vector, " "), myFun, character(1L))
vapply(strsplit(vector, " "), myFun, character(1L), onlyUnique = FALSE)
vapply(strsplit(vector, " "), myFun, character(1L), minLen = 0)
我花了一段时间寻找一个数据框,这个 tidyverse 友好的版本,所以我想我会粘贴我的详细解决方案:
library(tidyverse)
df <- data.frame(vector = c("hello I like to code hello",
"Coding is fun",
"fun fun fun"))
df %>%
mutate(split = str_split(vector, " ")) %>% # split
mutate(split = map(.$split, ~ unique(.x))) %>% # drop duplicates
mutate(split = map_chr(.$split, ~paste(.x, collapse = " "))) # recombine
结果:
#> vector split
#> 1 hello I like to code hello hello I like to code
#> 2 Coding is fun Coding is fun
#> 3 fun fun fun fun
由 reprex package (v0.3.0)
创建于 2021-01-03
我有这样的数据:
vector = c("hello I like to code hello","Coding is fun", "fun fun fun")
我想删除重复的单词(space 分隔)即输出应该看起来像
vector_cleaned
[1] "hello I like to code"
[2] "coding is fun"
[3] "fun"
将其拆分(strsplit
空格),使用 unique
(在 lapply
中),然后 paste
将其重新组合在一起:
vapply(lapply(strsplit(vector, " "), unique), paste, character(1L), collapse = " ")
# [1] "hello i like to code" "coding is fun" "fun"
## OR
vapply(strsplit(vector, " "), function(x) paste(unique(x), collapse = " "), character(1L))
根据评论更新
您始终可以编写自定义函数以与您的 vapply
函数一起使用。例如,这里有一个函数,它接受一个拆分字符串,删除比一定数量的字符短的字符串,并将 "unique" 设置作为用户选择。
myFun <- function(x, minLen = 3, onlyUnique = TRUE) {
a <- if (isTRUE(onlyUnique)) unique(x) else x
paste(a[nchar(a) > minLen], collapse = " ")
}
比较下面的输出,看看它是如何工作的。
vapply(strsplit(vector, " "), myFun, character(1L))
vapply(strsplit(vector, " "), myFun, character(1L), onlyUnique = FALSE)
vapply(strsplit(vector, " "), myFun, character(1L), minLen = 0)
我花了一段时间寻找一个数据框,这个 tidyverse 友好的版本,所以我想我会粘贴我的详细解决方案:
library(tidyverse)
df <- data.frame(vector = c("hello I like to code hello",
"Coding is fun",
"fun fun fun"))
df %>%
mutate(split = str_split(vector, " ")) %>% # split
mutate(split = map(.$split, ~ unique(.x))) %>% # drop duplicates
mutate(split = map_chr(.$split, ~paste(.x, collapse = " "))) # recombine
结果:
#> vector split
#> 1 hello I like to code hello hello I like to code
#> 2 Coding is fun Coding is fun
#> 3 fun fun fun fun
由 reprex package (v0.3.0)
创建于 2021-01-03