如何只得到一元组和三元组?
How to get unigram and trigram only?
我需要得到没有双字母的单字母和三字母
trigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 1, max = 3))
如何编辑此代码以获得答案
一种方法是使用 quanteda
包中的 dfm
函数,如下所示,
library(quanteda)
dfm('I only want uni and trigrams', ngrams = c(1,3), verbose = FALSE)
#Document-feature matrix of: 1 document, 10 features.
#1 x 10 sparse Matrix of class "dfmSparse"
# features
#docs i only want uni and trigrams i_only_want only_want_uni want_uni_and uni_and_trigrams
# text1 1 1 1 1 1 1 1 1 1 1
我需要得到没有双字母的单字母和三字母
trigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 1, max = 3))
如何编辑此代码以获得答案
一种方法是使用 quanteda
包中的 dfm
函数,如下所示,
library(quanteda)
dfm('I only want uni and trigrams', ngrams = c(1,3), verbose = FALSE)
#Document-feature matrix of: 1 document, 10 features.
#1 x 10 sparse Matrix of class "dfmSparse"
# features
#docs i only want uni and trigrams i_only_want only_want_uni want_uni_and uni_and_trigrams
# text1 1 1 1 1 1 1 1 1 1 1