将两个命名的 vectors/matrices 相乘,应用 n-gram 模型(stringdist::qgrams)
Multiply two named vectors/matrices, applying an n-gram model (stringdist::qgrams)
我正在尝试对字符串应用 n-gram character model 以计算其在此模型中的概率。
我用 stringdist::qgram()
创建了一个字符二元模型:
library(tidyverse)
library(stringdist)
ref_corpus <- c("This is a sample sentence", "Other sentences from the reference corpus", "Many other ones")
bigram_ref <- qgrams(ref_corpus, q = 2) # collecting all bigrams
bigram_model <- log(bigram_ref/sum(bigram_ref)) # computing the log probabilities of each
bigram_model
# Th hi is s sa se te th
# V1 -4.356709 -4.356709 -3.663562 -3.258097 -4.356709 -3.663562 -3.663562 -3.258097
现在,我想使用这个模型来计算模型中出现新字符串的概率:
bigram_string <- qgrams("This one", q = 2)
bigram_string
# Th hi is s on ne o
# V1 1 1 1 1 1 1 1
我找不到如何将这两个命名为 matrices/vectors 相乘,以便我可以获得 bigram_string
中的计数乘以 bigram_model
中的对数概率。
预期输出:
bigram_string %*% bigram_model
# Th hi is s ...
# V1 -4.356709 -4.356709 -3.663562 -3.258097 ...
# Actual output:
# Error in bigram_string %*% bigram_model : non-conformable arguments
我在子集化方面取得了一些进展:
bigram_model["V1",][bigram_string]
# But output:
# Th Th Th Th Th Th Th
# -4.356709 -4.356709 -4.356709 -4.356709 -4.356709 -4.356709 -4.356709
也许,我们需要对列名称进行子集化
bigram_model[, colnames(bigram_string)] * bigram_string
-输出
Th hi is s on ne o
V1 -4.356709 -4.356709 -3.663562 -3.258097 -4.356709 -4.356709 -3.663562
我正在尝试对字符串应用 n-gram character model 以计算其在此模型中的概率。
我用 stringdist::qgram()
创建了一个字符二元模型:
library(tidyverse)
library(stringdist)
ref_corpus <- c("This is a sample sentence", "Other sentences from the reference corpus", "Many other ones")
bigram_ref <- qgrams(ref_corpus, q = 2) # collecting all bigrams
bigram_model <- log(bigram_ref/sum(bigram_ref)) # computing the log probabilities of each
bigram_model
# Th hi is s sa se te th
# V1 -4.356709 -4.356709 -3.663562 -3.258097 -4.356709 -3.663562 -3.663562 -3.258097
现在,我想使用这个模型来计算模型中出现新字符串的概率:
bigram_string <- qgrams("This one", q = 2)
bigram_string
# Th hi is s on ne o
# V1 1 1 1 1 1 1 1
我找不到如何将这两个命名为 matrices/vectors 相乘,以便我可以获得 bigram_string
中的计数乘以 bigram_model
中的对数概率。
预期输出:
bigram_string %*% bigram_model
# Th hi is s ...
# V1 -4.356709 -4.356709 -3.663562 -3.258097 ...
# Actual output:
# Error in bigram_string %*% bigram_model : non-conformable arguments
我在子集化方面取得了一些进展:
bigram_model["V1",][bigram_string]
# But output:
# Th Th Th Th Th Th Th
# -4.356709 -4.356709 -4.356709 -4.356709 -4.356709 -4.356709 -4.356709
也许,我们需要对列名称进行子集化
bigram_model[, colnames(bigram_string)] * bigram_string
-输出
Th hi is s on ne o
V1 -4.356709 -4.356709 -3.663562 -3.258097 -4.356709 -4.356709 -3.663562