R - 向量化函数
R - Vectorization function
----大家好,我的问题是我有下一个中奖组合和3"tickets".
winner <- c("L","L",rep("X",12))
[1] "L" "L" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X"
combinations
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14
1 L L L L L L L L L L L L L L
2 X L L L L L L L L L L L L L
3 V L L L L L L L L L L L L L
dput(combinations)
structure(list(Var1 = structure(1:3, .Label = c("L", "X", "V"
), class = "factor"), Var2 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var3 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var4 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var5 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var6 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var7 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var8 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var9 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var10 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var11 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var12 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var13 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var14 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor")), .Names = c("Var1", "Var2", "Var3",
"Var4", "Var5", "Var6", "Var7", "Var8", "Var9", "Var10", "Var11",
"Var12", "Var13", "Var14"), out.attrs = structure(list(dim = c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), dimnames = structure(list(
Var1 = c("Var1=L", "Var1=X", "Var1=V"), Var2 = c("Var2=L",
"Var2=X", "Var2=V"), Var3 = c("Var3=L", "Var3=X", "Var3=V"
), Var4 = c("Var4=L", "Var4=X", "Var4=V"), Var5 = c("Var5=L",
"Var5=X", "Var5=V"), Var6 = c("Var6=L", "Var6=X", "Var6=V"
), Var7 = c("Var7=L", "Var7=X", "Var7=V"), Var8 = c("Var8=L",
"Var8=X", "Var8=V"), Var9 = c("Var9=L", "Var9=X", "Var9=V"
), Var10 = c("Var10=L", "Var10=X", "Var10=V"), Var11 = c("Var11=L",
"Var11=X", "Var11=V"), Var12 = c("Var12=L", "Var12=X", "Var12=V"
), Var13 = c("Var13=L", "Var13=X", "Var13=V"), Var14 = c("Var14=L",
"Var14=X", "Var14=V")), .Names = c("Var1", "Var2", "Var3",
"Var4", "Var5", "Var6", "Var7", "Var8", "Var9", "Var10", "Var11",
"Var12", "Var13", "Var14"))), .Names = c("dim", "dimnames")), row.names = c(NA,
3L), class = "data.frame")
接下来是票价
price
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14
1 2.48 1.79 2.99 2.31 4.03 2.1 2.71 2.22 2.7 2.94 2.01 2.16 3.41 2.16
2 3.28 1.79 2.99 2.31 4.03 2.1 2.71 2.22 2.7 2.94 2.01 2.16 3.41 2.16
3 3.16 1.79 2.99 2.31 4.03 2.1 2.71 2.22 2.7 2.94 2.01 2.16 3.41 2.16
dput(price)
structure(list(Var1 = c(2.48, 3.28, 3.16), Var2 = c(1.79, 1.79,
1.79), Var3 = c(2.99, 2.99, 2.99), Var4 = c(2.31, 2.31, 2.31),
Var5 = c(4.03, 4.03, 4.03), Var6 = c(2.1, 2.1, 2.1), Var7 = c(2.71,
2.71, 2.71), Var8 = c(2.22, 2.22, 2.22), Var9 = c(2.7, 2.7,
2.7), Var10 = c(2.94, 2.94, 2.94), Var11 = c(2.01, 2.01,
2.01), Var12 = c(2.16, 2.16, 2.16), Var13 = c(3.41, 3.41,
3.41), Var14 = c(2.16, 2.16, 2.16)), .Names = c("Var1", "Var2",
"Var3", "Var4", "Var5", "Var6", "Var7", "Var8", "Var9", "Var10",
"Var11", "Var12", "Var13", "Var14"), out.attrs = structure(list(
dim = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), dimnames = structure(list(Var1 = c("Var1=2.48", "Var1=3.28",
"Var1=3.16"), Var2 = c("Var2=1.79", "Var2=4.04", "Var2=4.55"
), Var3 = c("Var3=2.99", "Var3=3.38", "Var3=2.54"), Var4 = c("Var4=2.31",
"Var4=3.39", "Var4=3.36"), Var5 = c("Var5=4.03", "Var5=3.22",
"Var5=2.14"), Var6 = c("Var6=2.10", "Var6=3.77", "Var6=3.60"
), Var7 = c("Var7=2.71", "Var7=3.22", "Var7=2.93"), Var8 = c("Var8=2.22",
"Var8=3.56", "Var8=3.47"), Var9 = c("Var9=2.70", "Var9=3.66",
"Var9=2.65"), Var10 = c("Var10=2.94", "Var10=3.30", "Var10=2.65"
), Var11 = c("Var11=2.01", "Var11=3.61", "Var11=4.09"), Var12 = c("Var12=2.16",
"Var12=3.15", "Var12=4.19"), Var13 = c("Var13=3.41", "Var13=3.43",
"Var13=2.27"), Var14 = c("Var14=2.16", "Var14=3.78", "Var14=3.43"
)), .Names = c("Var1", "Var2", "Var3", "Var4", "Var5", "Var6",
"Var7", "Var8", "Var9", "Var10", "Var11", "Var12", "Var13",
"Var14"))), .Names = c("dim", "dimnames")), row.names = c(NA,
3L), class = "data.frame")
我做了下一个公式来计算某张机票的价格(这是一个不切实际的价格,但我的书上是这么写的)
myfunction2 <- function(x,y){
ifelse(sum((x==winner))>=2,prod(((x==winner)*y)+((x==winner)<=0)*1),return(1))
}
myfunction2(combinations[1,],price[1,])
[1] 4.4392
myfunction2(combinations[2,],price[2,])
[1] 1.0000
myfunction2(combinations[3,],price[3,])
[1] 1.0000
公式是正确的,在第一种情况下价格是(2.48*1.79)=4.4392,在第二种情况和第三种情况下答案都是1,因为我至少需要两个正确答案才能支付超过1。
当我尝试 "vectorize" 公式时,答案不正确
R3 <- apply(combinations,1,myfunction2,y=price)
R3
1 2 3
595.0378 1.00000 1.00000
我可以用for循环解决这个问题,但是我有3^14
个组合而且很慢,我尝试用mapply
,但我有同样的问题(答案不正确)
欢迎任何帮助,非常感谢
要弄清楚为什么 apply
方法不起作用,您可以做的一件事是在您的函数中手动插入一个 browser()
并查看它实际看到的参数。使用上面的数据,这是添加了一行的函数:
myfunction2 <- function(x,y){
browser()
if (sum((x==winner))>=2) {
return( prod( ((x == winner) * price) + (( x == winner) == 0) * 1) )
} else {
return(1)
}
}
apply(combinations,1,myfunction2,price)
# Called from: FUN(newX[, i], ...)
# debug at #3: if (sum((x == winner)) >= 2) {
# return(prod(((x == winner) * price) + ((x == winner) == 0) *
# 1))
# } else {
# return(1)
# }
# Browse[2]>
x
# Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14
# "L" "L" "L" "L" "L" "L" "L" "L" "L" "L" "L" "L" "L" "L"
到目前为止一切顺利。
# Browse[2]>
y
# Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14
# 1 2.48 1.79 2.99 2.31 4.03 2.1 2.71 2.22 2.7 2.94 2.01 2.16 3.41 2.16
# 2 3.28 1.79 2.99 2.31 4.03 2.1 2.71 2.22 2.7 2.94 2.01 2.16 3.41 2.16
# 3 3.16 1.79 2.99 2.31 4.03 2.1 2.71 2.22 2.7 2.94 2.01 2.16 3.41 2.16
你的问题有一部分:它在每次调用时看到 price
的 all,而你的函数不知道要检查这个。
顺便说一句:您误读了我关于使用 ifelse
的建议。虽然你使用它的方式是有效的,但它是不正确的,以后会咬你的。我建议你googleifelse
和传统if ... else
的区别。长话短说:ifelse
需要假设其三个参数的长度相同(或容易回收),因此当您测试单个事物时(sum(...)>=2
,它的第一个参数),然后你应该使用 if ... else
.
此外,由于 combinations[1,]
从技术上讲返回的是 data.frame,您应该将其取消列出以一致地处理其他事情。 (与 price
相同。)将此视为您的功能的替代方案:
myfunction3 <- function(x,y){
i <- (unlist(x) == winner)
if (sum(i) >= 2) prod(ifelse(i, unlist(y), 1)) else 1
}
myfunction3(combinations[1,], price[1,])
# [1] 4.4392
myfunction3(combinations[2,], price[2,])
# [1] 1
myfunction3(combinations[3,], price[3,])
# [1] 1
最后,我非常喜欢避免 side-effects (wiki). Something similar to this is reaching out of a function's scope (wiki) 获取父环境或名称中的变量 space。虽然它有效,但它可能会出现问题。我将通过在函数参数中传递 winner
的值来解决这个问题。
myfunction4 <- function(x,y,w){
i <- (unlist(x) == w)
if (sum(i) >= 2) prod(ifelse(i, unlist(y), 1)) else 1
}
回到最初的一次性获取所有 price
的问题。虽然当然可以将其扭曲以使用 mapply
,但我认为它是 *apply
函数中唯一可以轻松应用于此问题的函数,即便如此,它也需要一些修改和转换data.frame 年代。相反,我建议使用这样的东西:
myfunction5 <- function(comb, pr, win) {
i <- (unlist(comb) == win)
if (sum(i) >= 2) prod(ifelse(i, unlist(pr), 1)) else 1
}
最后,我们可以做这样的事情了:
sapply(1:nrow(combinations),
function(i, comb, pr, win) myfunction5(comb[i,], pr[i,], win),
combinations, price, winner)
# [1] 4.4392 1.0000 1.0000
在这种情况下,i
只是 combinations
和 price
中行的索引。虽然看起来很迂腐,但编写完全自给自足的函数(对于未提供给它的变量,不要达到它们 space 的 "outside")通常会产生更强、更多的 "defensive"程序。
注意:所有这些仍然在规避一个大的潜在问题:当 x
是 data.frame。虽然它在这里有效(只是因为你的每个 data.frame 都是完全同质的),但它会在你最意想不到的时候咬你一口。问题是 data.frames 允许您添加一个不同于其他 class 的列,因此如果您出于任何原因向 price
变量添加了一列字符串,none 将不再起作用。
此外,如果您没有进行健全性检查,这往往会出现一些问题。例如,强制要求 combinations
和 price
必须具有相同的维度,并且 winner
的长度必须与其他两个中的列数相同。
----大家好,我的问题是我有下一个中奖组合和3"tickets".
winner <- c("L","L",rep("X",12))
[1] "L" "L" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X"
combinations
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14
1 L L L L L L L L L L L L L L
2 X L L L L L L L L L L L L L
3 V L L L L L L L L L L L L L
dput(combinations)
structure(list(Var1 = structure(1:3, .Label = c("L", "X", "V"
), class = "factor"), Var2 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var3 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var4 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var5 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var6 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var7 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var8 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var9 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var10 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var11 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var12 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var13 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor"), Var14 = structure(c(1L, 1L, 1L), .Label = c("L",
"X", "V"), class = "factor")), .Names = c("Var1", "Var2", "Var3",
"Var4", "Var5", "Var6", "Var7", "Var8", "Var9", "Var10", "Var11",
"Var12", "Var13", "Var14"), out.attrs = structure(list(dim = c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), dimnames = structure(list(
Var1 = c("Var1=L", "Var1=X", "Var1=V"), Var2 = c("Var2=L",
"Var2=X", "Var2=V"), Var3 = c("Var3=L", "Var3=X", "Var3=V"
), Var4 = c("Var4=L", "Var4=X", "Var4=V"), Var5 = c("Var5=L",
"Var5=X", "Var5=V"), Var6 = c("Var6=L", "Var6=X", "Var6=V"
), Var7 = c("Var7=L", "Var7=X", "Var7=V"), Var8 = c("Var8=L",
"Var8=X", "Var8=V"), Var9 = c("Var9=L", "Var9=X", "Var9=V"
), Var10 = c("Var10=L", "Var10=X", "Var10=V"), Var11 = c("Var11=L",
"Var11=X", "Var11=V"), Var12 = c("Var12=L", "Var12=X", "Var12=V"
), Var13 = c("Var13=L", "Var13=X", "Var13=V"), Var14 = c("Var14=L",
"Var14=X", "Var14=V")), .Names = c("Var1", "Var2", "Var3",
"Var4", "Var5", "Var6", "Var7", "Var8", "Var9", "Var10", "Var11",
"Var12", "Var13", "Var14"))), .Names = c("dim", "dimnames")), row.names = c(NA,
3L), class = "data.frame")
接下来是票价
price
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14
1 2.48 1.79 2.99 2.31 4.03 2.1 2.71 2.22 2.7 2.94 2.01 2.16 3.41 2.16
2 3.28 1.79 2.99 2.31 4.03 2.1 2.71 2.22 2.7 2.94 2.01 2.16 3.41 2.16
3 3.16 1.79 2.99 2.31 4.03 2.1 2.71 2.22 2.7 2.94 2.01 2.16 3.41 2.16
dput(price)
structure(list(Var1 = c(2.48, 3.28, 3.16), Var2 = c(1.79, 1.79,
1.79), Var3 = c(2.99, 2.99, 2.99), Var4 = c(2.31, 2.31, 2.31),
Var5 = c(4.03, 4.03, 4.03), Var6 = c(2.1, 2.1, 2.1), Var7 = c(2.71,
2.71, 2.71), Var8 = c(2.22, 2.22, 2.22), Var9 = c(2.7, 2.7,
2.7), Var10 = c(2.94, 2.94, 2.94), Var11 = c(2.01, 2.01,
2.01), Var12 = c(2.16, 2.16, 2.16), Var13 = c(3.41, 3.41,
3.41), Var14 = c(2.16, 2.16, 2.16)), .Names = c("Var1", "Var2",
"Var3", "Var4", "Var5", "Var6", "Var7", "Var8", "Var9", "Var10",
"Var11", "Var12", "Var13", "Var14"), out.attrs = structure(list(
dim = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), dimnames = structure(list(Var1 = c("Var1=2.48", "Var1=3.28",
"Var1=3.16"), Var2 = c("Var2=1.79", "Var2=4.04", "Var2=4.55"
), Var3 = c("Var3=2.99", "Var3=3.38", "Var3=2.54"), Var4 = c("Var4=2.31",
"Var4=3.39", "Var4=3.36"), Var5 = c("Var5=4.03", "Var5=3.22",
"Var5=2.14"), Var6 = c("Var6=2.10", "Var6=3.77", "Var6=3.60"
), Var7 = c("Var7=2.71", "Var7=3.22", "Var7=2.93"), Var8 = c("Var8=2.22",
"Var8=3.56", "Var8=3.47"), Var9 = c("Var9=2.70", "Var9=3.66",
"Var9=2.65"), Var10 = c("Var10=2.94", "Var10=3.30", "Var10=2.65"
), Var11 = c("Var11=2.01", "Var11=3.61", "Var11=4.09"), Var12 = c("Var12=2.16",
"Var12=3.15", "Var12=4.19"), Var13 = c("Var13=3.41", "Var13=3.43",
"Var13=2.27"), Var14 = c("Var14=2.16", "Var14=3.78", "Var14=3.43"
)), .Names = c("Var1", "Var2", "Var3", "Var4", "Var5", "Var6",
"Var7", "Var8", "Var9", "Var10", "Var11", "Var12", "Var13",
"Var14"))), .Names = c("dim", "dimnames")), row.names = c(NA,
3L), class = "data.frame")
我做了下一个公式来计算某张机票的价格(这是一个不切实际的价格,但我的书上是这么写的)
myfunction2 <- function(x,y){
ifelse(sum((x==winner))>=2,prod(((x==winner)*y)+((x==winner)<=0)*1),return(1))
}
myfunction2(combinations[1,],price[1,])
[1] 4.4392
myfunction2(combinations[2,],price[2,])
[1] 1.0000
myfunction2(combinations[3,],price[3,])
[1] 1.0000
公式是正确的,在第一种情况下价格是(2.48*1.79)=4.4392,在第二种情况和第三种情况下答案都是1,因为我至少需要两个正确答案才能支付超过1。
当我尝试 "vectorize" 公式时,答案不正确
R3 <- apply(combinations,1,myfunction2,y=price)
R3
1 2 3
595.0378 1.00000 1.00000
我可以用for循环解决这个问题,但是我有3^14
个组合而且很慢,我尝试用mapply
,但我有同样的问题(答案不正确)
欢迎任何帮助,非常感谢
要弄清楚为什么 apply
方法不起作用,您可以做的一件事是在您的函数中手动插入一个 browser()
并查看它实际看到的参数。使用上面的数据,这是添加了一行的函数:
myfunction2 <- function(x,y){
browser()
if (sum((x==winner))>=2) {
return( prod( ((x == winner) * price) + (( x == winner) == 0) * 1) )
} else {
return(1)
}
}
apply(combinations,1,myfunction2,price)
# Called from: FUN(newX[, i], ...)
# debug at #3: if (sum((x == winner)) >= 2) {
# return(prod(((x == winner) * price) + ((x == winner) == 0) *
# 1))
# } else {
# return(1)
# }
# Browse[2]>
x
# Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14
# "L" "L" "L" "L" "L" "L" "L" "L" "L" "L" "L" "L" "L" "L"
到目前为止一切顺利。
# Browse[2]>
y
# Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14
# 1 2.48 1.79 2.99 2.31 4.03 2.1 2.71 2.22 2.7 2.94 2.01 2.16 3.41 2.16
# 2 3.28 1.79 2.99 2.31 4.03 2.1 2.71 2.22 2.7 2.94 2.01 2.16 3.41 2.16
# 3 3.16 1.79 2.99 2.31 4.03 2.1 2.71 2.22 2.7 2.94 2.01 2.16 3.41 2.16
你的问题有一部分:它在每次调用时看到 price
的 all,而你的函数不知道要检查这个。
顺便说一句:您误读了我关于使用 ifelse
的建议。虽然你使用它的方式是有效的,但它是不正确的,以后会咬你的。我建议你googleifelse
和传统if ... else
的区别。长话短说:ifelse
需要假设其三个参数的长度相同(或容易回收),因此当您测试单个事物时(sum(...)>=2
,它的第一个参数),然后你应该使用 if ... else
.
此外,由于 combinations[1,]
从技术上讲返回的是 data.frame,您应该将其取消列出以一致地处理其他事情。 (与 price
相同。)将此视为您的功能的替代方案:
myfunction3 <- function(x,y){
i <- (unlist(x) == winner)
if (sum(i) >= 2) prod(ifelse(i, unlist(y), 1)) else 1
}
myfunction3(combinations[1,], price[1,])
# [1] 4.4392
myfunction3(combinations[2,], price[2,])
# [1] 1
myfunction3(combinations[3,], price[3,])
# [1] 1
最后,我非常喜欢避免 side-effects (wiki). Something similar to this is reaching out of a function's scope (wiki) 获取父环境或名称中的变量 space。虽然它有效,但它可能会出现问题。我将通过在函数参数中传递 winner
的值来解决这个问题。
myfunction4 <- function(x,y,w){
i <- (unlist(x) == w)
if (sum(i) >= 2) prod(ifelse(i, unlist(y), 1)) else 1
}
回到最初的一次性获取所有 price
的问题。虽然当然可以将其扭曲以使用 mapply
,但我认为它是 *apply
函数中唯一可以轻松应用于此问题的函数,即便如此,它也需要一些修改和转换data.frame 年代。相反,我建议使用这样的东西:
myfunction5 <- function(comb, pr, win) {
i <- (unlist(comb) == win)
if (sum(i) >= 2) prod(ifelse(i, unlist(pr), 1)) else 1
}
最后,我们可以做这样的事情了:
sapply(1:nrow(combinations),
function(i, comb, pr, win) myfunction5(comb[i,], pr[i,], win),
combinations, price, winner)
# [1] 4.4392 1.0000 1.0000
在这种情况下,i
只是 combinations
和 price
中行的索引。虽然看起来很迂腐,但编写完全自给自足的函数(对于未提供给它的变量,不要达到它们 space 的 "outside")通常会产生更强、更多的 "defensive"程序。
注意:所有这些仍然在规避一个大的潜在问题:当 x
是 data.frame。虽然它在这里有效(只是因为你的每个 data.frame 都是完全同质的),但它会在你最意想不到的时候咬你一口。问题是 data.frames 允许您添加一个不同于其他 class 的列,因此如果您出于任何原因向 price
变量添加了一列字符串,none 将不再起作用。
此外,如果您没有进行健全性检查,这往往会出现一些问题。例如,强制要求 combinations
和 price
必须具有相同的维度,并且 winner
的长度必须与其他两个中的列数相同。