Quanteda：具有预定义特征集的文档特征矩阵

Question

我正在使用 quanteda 构建两个文档特征矩阵：

library(quanteda)
DFM1 <- dfm("this is a rock")
#        features
# docs    this is a rock
#   text1    1  1 1    1
DFM2 <- dfm("this is music")
#        features
# docs    this is music
#   text1    1  1     1

但是，我希望 DFM2 具有一组特定的功能，即来自 DFM1 的功能：

DFM2 <- dfm("this is music", *magicargument* = featnames(DFM1))
#        features
# docs    this is a rock
#   text1    1  1 0    0

有没有我遗漏的神奇论点？还是有另一种有效的方法来为大袋词归档它？

Answer 1

魔术参数是 pattern，您可以在其中提供其特征将匹配的 dfm（包括不在目标 dfm 中的特征的零）：

dfm_select(DFM2, pattern = DFM1)
# Document-feature matrix of: 1 document, 4 features (50% sparse).
# 1 x 4 sparse Matrix of class "dfmSparse"
#        features
# docs    this is a rock
#   text1    1  1 0    0

Quanteda：具有预定义特征集的文档特征矩阵

Quanteda: Document Feature Matrix with predefined set of features

r

text-mining

quanteda