对数据集中的几对使用 cpquery 函数
Using cpquery function for several pairs from dataset
我是 R 的初学者,正在尝试弄清楚如何为 DAG 的所有边使用 bnlearn 包的 cpquery 函数。
首先我创建了一个bn对象,一个bn的network和一个table全力以赴。
library(bnlearn)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
然后我尝试在sttbl数据集中创建一个新变量,这是cpquery函数的结果。
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
sttbl[1,4] = cpquery(fit, `A` == 1, `D` == 1)
它看起来很不错(尤其是在更大的数据上),但是当我试图以某种方式自动化这个过程时,我遇到了错误,例如:
Error in sampling(fitted = fitted, event = event, evidence = evidence, :
logical vector for evidence is of length 1 instead of 10000.
在完美的情况下,我需要创建一个函数来填充 prob 生成的 sttbl 数据集的变量,而不管它的大小。我试着用for循环来做,但一次又一次地被上面的错误绊倒了。不幸的是,我正在删除失败的尝试,但它们是这样的:
for (i in 1:nrow(sttbl)) {
j = sttbl[i,1]
k = sttbl[i,2]
sttbl[i,4]=cpquery(fit, fit$j %in% sttbl[i,1]==1, fit$k %in% sttbl[i,2]==1)
}
或者这个:
for (i in 1:nrow(sttbl)) {
sttbl[i,4]=cpquery(fit, sttbl[i,1] == 1, sttbl[i,2] == 1)
}
现在我想我误解了 R 或 bnlearn 包中的某些内容。
你能告诉我如何通过多个cpqueries填充列来实现这个任务吗?这对我的研究有很大帮助!
cpquery
很难以编程方式使用。如果您查看帮助页面中的示例,您会发现作者使用 eval(parse(...))
来构建查询。我在下面添加了两种方法,一种使用帮助页面中的方法,另一种使用 cpdist
抽取样本并重新加权以获得概率。
你的例子
library(bnlearn); library(dplyr)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
这使用了 cpquery
和饱受诟病的 eval(parse(...))
-- 这是
bnlearn
作者在 ?cpquery
示例中以编程方式执行此操作的方法。无论如何,
# You want the evidence and event to be the same; in your question it is `1`
# but for example using learning.test data we use 'a'
state = "\'a\'" # note if the states are character then these need to be quoted
event = paste(sttbl$from, "==", state)
evidence = paste(sttbl$to, "==", state)
# loop through using code similar to that found in `cpquery`
set.seed(1) # to make sampling reproducible
for(i in 1:nrow(sttbl)) {
qtxt = paste("cpquery(fit, ", event[i], ", ", evidence[i], ",n=1e6", ")")
sttbl$prob[i] = eval(parse(text=qtxt))
}
我发现使用 cpdist
更好,它用于根据某些证据生成随机样本。然后您可以使用这些示例来构建查询。如果您使用似然加权 (method="lw"
),则以编程方式执行此操作会稍微容易一些(并且没有 evil(parse(...))
)。
证据被添加到命名列表中,即 list(A='a')
.
# The following just gives a quick way to assign the same
# evidence state to all the evidence nodes.
evidence = setNames(replicate(nrow(sttbl), "a", simplify = FALSE), sttbl$to)
# Now loop though the queries
# As we are using likelihood weighting we need to reweight to get the probabilities
# (cpquery does this under the hood)
# Also note with this method that you could simulate from more than
# one variable (event) at a time if the evidence was the same.
for(i in 1:nrow(sttbl)) {
temp = cpdist(fit, sttbl$from[i], evidence[i], method="lw")
w = attr(temp, "weights")
sttbl$prob2[i] = sum(w[temp=='a'])/ sum(w)
}
sttbl
# from to strength prob prob2
# 1 A D -1938.9499 0.6186238 0.6233387
# 2 A B -1153.8796 0.6050552 0.6133448
# 3 C D -823.7605 0.7027782 0.7067417
# 4 B E -720.8266 0.7332107 0.7328657
# 5 F E -549.2300 0.5850828 0.5895373
我是 R 的初学者,正在尝试弄清楚如何为 DAG 的所有边使用 bnlearn 包的 cpquery 函数。
首先我创建了一个bn对象,一个bn的network和一个table全力以赴。
library(bnlearn)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
然后我尝试在sttbl数据集中创建一个新变量,这是cpquery函数的结果。
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
sttbl[1,4] = cpquery(fit, `A` == 1, `D` == 1)
它看起来很不错(尤其是在更大的数据上),但是当我试图以某种方式自动化这个过程时,我遇到了错误,例如:
Error in sampling(fitted = fitted, event = event, evidence = evidence, : logical vector for evidence is of length 1 instead of 10000.
在完美的情况下,我需要创建一个函数来填充 prob 生成的 sttbl 数据集的变量,而不管它的大小。我试着用for循环来做,但一次又一次地被上面的错误绊倒了。不幸的是,我正在删除失败的尝试,但它们是这样的:
for (i in 1:nrow(sttbl)) {
j = sttbl[i,1]
k = sttbl[i,2]
sttbl[i,4]=cpquery(fit, fit$j %in% sttbl[i,1]==1, fit$k %in% sttbl[i,2]==1)
}
或者这个:
for (i in 1:nrow(sttbl)) {
sttbl[i,4]=cpquery(fit, sttbl[i,1] == 1, sttbl[i,2] == 1)
}
现在我想我误解了 R 或 bnlearn 包中的某些内容。
你能告诉我如何通过多个cpqueries填充列来实现这个任务吗?这对我的研究有很大帮助!
cpquery
很难以编程方式使用。如果您查看帮助页面中的示例,您会发现作者使用 eval(parse(...))
来构建查询。我在下面添加了两种方法,一种使用帮助页面中的方法,另一种使用 cpdist
抽取样本并重新加权以获得概率。
你的例子
library(bnlearn); library(dplyr)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
这使用了 cpquery
和饱受诟病的 eval(parse(...))
-- 这是
bnlearn
作者在 ?cpquery
示例中以编程方式执行此操作的方法。无论如何,
# You want the evidence and event to be the same; in your question it is `1`
# but for example using learning.test data we use 'a'
state = "\'a\'" # note if the states are character then these need to be quoted
event = paste(sttbl$from, "==", state)
evidence = paste(sttbl$to, "==", state)
# loop through using code similar to that found in `cpquery`
set.seed(1) # to make sampling reproducible
for(i in 1:nrow(sttbl)) {
qtxt = paste("cpquery(fit, ", event[i], ", ", evidence[i], ",n=1e6", ")")
sttbl$prob[i] = eval(parse(text=qtxt))
}
我发现使用 cpdist
更好,它用于根据某些证据生成随机样本。然后您可以使用这些示例来构建查询。如果您使用似然加权 (method="lw"
),则以编程方式执行此操作会稍微容易一些(并且没有 evil(parse(...))
)。
证据被添加到命名列表中,即 list(A='a')
.
# The following just gives a quick way to assign the same
# evidence state to all the evidence nodes.
evidence = setNames(replicate(nrow(sttbl), "a", simplify = FALSE), sttbl$to)
# Now loop though the queries
# As we are using likelihood weighting we need to reweight to get the probabilities
# (cpquery does this under the hood)
# Also note with this method that you could simulate from more than
# one variable (event) at a time if the evidence was the same.
for(i in 1:nrow(sttbl)) {
temp = cpdist(fit, sttbl$from[i], evidence[i], method="lw")
w = attr(temp, "weights")
sttbl$prob2[i] = sum(w[temp=='a'])/ sum(w)
}
sttbl
# from to strength prob prob2
# 1 A D -1938.9499 0.6186238 0.6233387
# 2 A B -1153.8796 0.6050552 0.6133448
# 3 C D -823.7605 0.7027782 0.7067417
# 4 B E -720.8266 0.7332107 0.7328657
# 5 F E -549.2300 0.5850828 0.5895373