将两个命名的 vectors/matrices 相乘,应用 n-gram 模型(stringdist::qgrams)

Multiply two named vectors/matrices, applying an n-gram model (stringdist::qgrams)

我正在尝试对字符串应用 n-gram character model 以计算其在此模型中的概率。

我用 stringdist::qgram() 创建了一个字符二元模型:

library(tidyverse)
library(stringdist)

ref_corpus   <- c("This is a sample sentence", "Other sentences from the reference corpus", "Many other ones")
bigram_ref   <- qgrams(ref_corpus, q = 2)       # collecting all bigrams
bigram_model <- log(bigram_ref/sum(bigram_ref)) # computing the log probabilities of each 

bigram_model
#           Th        hi        is        s         sa        se        te        th
# V1 -4.356709 -4.356709 -3.663562 -3.258097 -4.356709 -3.663562 -3.663562 -3.258097

现在,我想使用这个模型来计算模型中出现新字符串的概率:

bigram_string <- qgrams("This one", q = 2) 
bigram_string
#    Th hi is s  on ne  o
# V1  1  1  1  1  1  1  1

我找不到如何将这两个命名为 matrices/vectors 相乘,以便我可以获得 bigram_string 中的计数乘以 bigram_model 中的对数概率。

预期输出:

bigram_string %*% bigram_model
#            Th        hi        is         s  ...
# V1  -4.356709 -4.356709 -3.663562 -3.258097  ...

# Actual output:
# Error in bigram_string %*% bigram_model : non-conformable arguments

我在子集化方面取得了一些进展:

bigram_model["V1",][bigram_string]

# But output:
#        Th        Th        Th        Th        Th        Th        Th 
# -4.356709 -4.356709 -4.356709 -4.356709 -4.356709 -4.356709 -4.356709

也许,我们需要对列名称进行子集化

bigram_model[, colnames(bigram_string)] * bigram_string

-输出

        Th        hi        is        s         on        ne         o
V1 -4.356709 -4.356709 -3.663562 -3.258097 -4.356709 -4.356709 -3.663562