在 R 中的成对误差估计中产生了错误的产品
Wrong product produced in pairwise error estimation in R
我想从融化的矩阵中产生一个成对的错误,看起来像这样:
pw.data = data.frame(true_tree = rep(c("maple","oak","pine"),3),
guess_tree = c(rep("maple",3),rep("oak",3),rep("pine",3)),
value = c(12,0,1,1,15,0,2,1,14))
true_tree guess_tree value
maple maple 12
oak maple 0
pine maple 1
maple oak 1
oak oak 15
pine oak 0
maple pine 2
oak pine 1
pine pine 14
所以我想估计真实树种和猜测树种之间的成对误差。对于此估计,公式应为“成对错误分配/所选两个物种的所有估计数。
给出更好的解释:枫木和橡木的错误猜测(枫木-橡木和橡木-枫木比较)= 1 + 0 / 所有猜测数 = 12 + 1 + 2(所有计数 true_tree == "maple) + 0 + 15 + 1 (all counts for true_tree == "橡木)。所以估计乘积是1/31.
当我针对一种特定情况进行检查时,让我们再说一遍枫木和橡木,我可以像这样手动估算:
sum(pw.data[((pw.data[,1] == "maple" & pw.data[,2] == "oak") |
(pw.data[,1] == "oak" & pw.data[,2] == "maple")) &
(pw.data[,1] != pw.data[,2]),3]) /
(sum(pw.data[pw.data[,1] == "maple",3]) + sum(pw.data[pw.data[,1] == "oak",3]))
但是,我想对更大的数据进行估算,因此,我想创建一个 for loop/function 来进行估算并将结果存储在数据框中,例如:
Pw_tree value
Maple-oak 0.0123
....
我曾尝试在如下所示的 for 循环中使用该逻辑,但它根本不起作用。
for (i in pw.data[,1]) {
for (j in pw.data[,2]) {
x = sum( pw.data[((pw.data[,1] == i & pw.data[,2] == j ) |
(pw.data[,1] == j & pw.data[,2] == i)) &
(pw.data[,1] != pw.data[,2]),3])
y = (sum(pw.data[pw.data[,1] == i,3]) + sum(pw.data[pw.data[,1] == j,3]))
PWerr_data = data.frame( pw_tree = paste(i,j, sep = "-"), value = x/y)
}
}
那就太好了,如果我能看到我做错了什么。
非常感谢!
我通常通过构建我想要应用的函数(你几乎已经完成)来解决这些类型的问题,然后构建最方便应用它的数据结构,然后我可以使用一个apply
系列函数中的一个,用于遍历我的数据结构以获得结果。这让我避免了 for
循环结构,这很好,因为我是那种总是会在双 for 循环中搞砸索引的程序员。
对于您的情况,我们可以将您的总和比率包装到一个函数中,该函数以 data.frame 和两个树名作为参数。然后我们只需要创建我们想要使用的一组对。一个方便的函数是 combn()
,它允许您从 x
的元素中获取大小 m
的所有组合:这将为我们提供所需的一组非冗余对。
下面的注释示例代码:
# Load your data
pw.data = data.frame(true_tree = rep(c("maple","oak","pine"),3),
guess_tree = c(rep("maple",3),rep("oak",3),rep("pine",3)),
value = c(12,0,1,1,15,0,2,1,14))
pw.data
#> true_tree guess_tree value
#> 1 maple maple 12
#> 2 oak maple 0
#> 3 pine maple 1
#> 4 maple oak 1
#> 5 oak oak 15
#> 6 pine oak 0
#> 7 maple pine 2
#> 8 oak pine 1
#> 9 pine pine 14
# build the function we will repeatedly apply
getErr <- function(t1, t2, data=pw.data) {
# compute the rate as you wrote it
rate <- sum(data[((pw.data[,1] == t1 & data[,2] == t2) |
(data[,1] == t2 & data[,2] == t1)) &
(data[,1] != data[,2]),3]) /
(sum(data[data[,1] == t1,3]) + sum(data[data[,1] == t2,3]))
# output the items involved as a named list (useful for later)
list(Pw_tree = paste(t1, t2, sep='-'), error_rate = rate)
}
# test it
getErr("maple", "oak")
#> $Pw_tree
#> [1] "maple-oak"
#>
#> $error_rate
#> [1] 0.03225806
# Good this matches the output you supplied
# build the data structure we will run the function across
all.trees <- unique(c(as.character(pw.data$true_tree), as.character(pw.data$guess_tree)))
all.name.combos <- combn(all.trees, 2)
# we will use the do.call(rbind, ls) trick, where we generate a list
# with the apply function and coerce it to a matrix
error_rates_df <- do.call(rbind, apply(all.name.combos, 2, function(row){getErr(row[1], row[2])}))
error_rates_df
#> Pw_tree error_rate
#> [1,] "maple-oak" 0.03225806
#> [2,] "maple-pine" 0.1
#> [3,] "oak-pine" 0.03225806
由 reprex package (v0.2.1)
于 2018-10-30 创建
我想从融化的矩阵中产生一个成对的错误,看起来像这样:
pw.data = data.frame(true_tree = rep(c("maple","oak","pine"),3),
guess_tree = c(rep("maple",3),rep("oak",3),rep("pine",3)),
value = c(12,0,1,1,15,0,2,1,14))
true_tree guess_tree value
maple maple 12
oak maple 0
pine maple 1
maple oak 1
oak oak 15
pine oak 0
maple pine 2
oak pine 1
pine pine 14
所以我想估计真实树种和猜测树种之间的成对误差。对于此估计,公式应为“成对错误分配/所选两个物种的所有估计数。
给出更好的解释:枫木和橡木的错误猜测(枫木-橡木和橡木-枫木比较)= 1 + 0 / 所有猜测数 = 12 + 1 + 2(所有计数 true_tree == "maple) + 0 + 15 + 1 (all counts for true_tree == "橡木)。所以估计乘积是1/31.
当我针对一种特定情况进行检查时,让我们再说一遍枫木和橡木,我可以像这样手动估算:
sum(pw.data[((pw.data[,1] == "maple" & pw.data[,2] == "oak") |
(pw.data[,1] == "oak" & pw.data[,2] == "maple")) &
(pw.data[,1] != pw.data[,2]),3]) /
(sum(pw.data[pw.data[,1] == "maple",3]) + sum(pw.data[pw.data[,1] == "oak",3]))
但是,我想对更大的数据进行估算,因此,我想创建一个 for loop/function 来进行估算并将结果存储在数据框中,例如:
Pw_tree value
Maple-oak 0.0123
....
我曾尝试在如下所示的 for 循环中使用该逻辑,但它根本不起作用。
for (i in pw.data[,1]) {
for (j in pw.data[,2]) {
x = sum( pw.data[((pw.data[,1] == i & pw.data[,2] == j ) |
(pw.data[,1] == j & pw.data[,2] == i)) &
(pw.data[,1] != pw.data[,2]),3])
y = (sum(pw.data[pw.data[,1] == i,3]) + sum(pw.data[pw.data[,1] == j,3]))
PWerr_data = data.frame( pw_tree = paste(i,j, sep = "-"), value = x/y)
}
}
那就太好了,如果我能看到我做错了什么。 非常感谢!
我通常通过构建我想要应用的函数(你几乎已经完成)来解决这些类型的问题,然后构建最方便应用它的数据结构,然后我可以使用一个apply
系列函数中的一个,用于遍历我的数据结构以获得结果。这让我避免了 for
循环结构,这很好,因为我是那种总是会在双 for 循环中搞砸索引的程序员。
对于您的情况,我们可以将您的总和比率包装到一个函数中,该函数以 data.frame 和两个树名作为参数。然后我们只需要创建我们想要使用的一组对。一个方便的函数是 combn()
,它允许您从 x
的元素中获取大小 m
的所有组合:这将为我们提供所需的一组非冗余对。
下面的注释示例代码:
# Load your data
pw.data = data.frame(true_tree = rep(c("maple","oak","pine"),3),
guess_tree = c(rep("maple",3),rep("oak",3),rep("pine",3)),
value = c(12,0,1,1,15,0,2,1,14))
pw.data
#> true_tree guess_tree value
#> 1 maple maple 12
#> 2 oak maple 0
#> 3 pine maple 1
#> 4 maple oak 1
#> 5 oak oak 15
#> 6 pine oak 0
#> 7 maple pine 2
#> 8 oak pine 1
#> 9 pine pine 14
# build the function we will repeatedly apply
getErr <- function(t1, t2, data=pw.data) {
# compute the rate as you wrote it
rate <- sum(data[((pw.data[,1] == t1 & data[,2] == t2) |
(data[,1] == t2 & data[,2] == t1)) &
(data[,1] != data[,2]),3]) /
(sum(data[data[,1] == t1,3]) + sum(data[data[,1] == t2,3]))
# output the items involved as a named list (useful for later)
list(Pw_tree = paste(t1, t2, sep='-'), error_rate = rate)
}
# test it
getErr("maple", "oak")
#> $Pw_tree
#> [1] "maple-oak"
#>
#> $error_rate
#> [1] 0.03225806
# Good this matches the output you supplied
# build the data structure we will run the function across
all.trees <- unique(c(as.character(pw.data$true_tree), as.character(pw.data$guess_tree)))
all.name.combos <- combn(all.trees, 2)
# we will use the do.call(rbind, ls) trick, where we generate a list
# with the apply function and coerce it to a matrix
error_rates_df <- do.call(rbind, apply(all.name.combos, 2, function(row){getErr(row[1], row[2])}))
error_rates_df
#> Pw_tree error_rate
#> [1,] "maple-oak" 0.03225806
#> [2,] "maple-pine" 0.1
#> [3,] "oak-pine" 0.03225806
由 reprex package (v0.2.1)
于 2018-10-30 创建