是否可以在 R 中的 textcnt 函数的输出中保持 ngram 的顺序?

Is it possible to maintain order of ngrams in the output of textcnt function in R?

我正在使用 tau 包中的 textcnt() 函数来获取二元语法,如下所示:

sentence <- "A sample sentence in English for testing purpose"
english <- textcnt(sentence, method = "string", n=2, tolower = FALSE)  

二元组 returned 按字母顺序排列,如下所示:

 A sample     English for     for testing      in English sample sentence     sentence in testing purpose  

不过,我正在寻找一种解决方案,可以 return 双字母按句子中出现的顺序排列。更准确地说,所需的输出如下:

 A sample  sample sentence sentence in  in English  English for  for testing   testing purpose       

如果 textcnt() 无法实现,是否有替代方案来实现所需的输出?

尝试

library(tokenizers)
tokenize_ngrams(sentence, n = 2L)
# [[1]]
# [1] "a sample"        "sample sentence" "sentence in"     "in english"      "english for"     "for testing"     "testing purpose"