Return 只有独特的词

Return only the unique words

假设我有一个字符串,我只希望句子中的唯一单词作为单独的元素

 a = "an apple is an apple"
word <- function(a){
  
  words<- c(strsplit(a,split = " "))
  return(unique(words))
}

word(a)

这个returns

[[1]]
[1] "an"    "apple" "is"    "an"    "apple"

我期望的输出是

'an','apple','is'

我做错了什么?非常感谢任何帮助

干杯!

你可以试试

unique(unlist(strsplita, " ")))

另一种可能的解决方案,基于stringr::str_split

library(tidyverse)

a %>% str_split("\s+") %>% unlist %>% unique

#> [1] "an"    "apple" "is"

问题是将 strsplit(.) 包裹在 c(.) 中并没有改变它仍然是 list 的事实,并且 unique 将在 list-level,不是 word-level.

c(strsplit(rep(a, 2), "\s+"))
# [[1]]
# [1] "an"    "apple" "is"    "an"    "apple"
# [[2]]
# [1] "an"    "apple" "is"    "an"    "apple"
unique(c(strsplit(rep(a, 2), "\s+")))
# [[1]]
# [1] "an"    "apple" "is"    "an"    "apple"

备选方案:

  1. 如果length(a)总是1,那么也许

    unique(strsplit(a, "\s+")[[1]])
    # [1] "an"    "apple" "is"   
    
  2. 如果 length(a) 可以是 2 个或更多,并且您想要每个句子 的唯一单词列表 ,那么

    a2 <- c("an apple is an apple", "a pear is a pear", "an orange is an orange")
    lapply(strsplit(a2, "\s+"), unique)
    # [[1]]
    # [1] "an"    "apple" "is"   
    # [[2]]
    # [1] "a"    "pear" "is"  
    # [[3]]
    # [1] "an"     "orange" "is"    
    

    (注意:这总是returns一个list,不管输入中的句子数量是多少。)

  3. 如果 length(a) 可以是 2 个或更多,并且您想要一个独特的词 跨越所有句子 ,那么

    unique(unlist(strsplit(a2, "\s+")))
    # [1] "an"     "apple"  "is"     "a"      "pear"   "orange"
    

    (注:此方法在length(a)为1时同样适用。)