R - 向量化函数

R - Vectorization function

----大家好,我的问题是我有下一个中奖组合和3"tickets".

winner <- c("L","L",rep("X",12))

[1] "L" "L" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X"

combinations

   Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14
1    L    L    L    L    L    L    L    L    L     L     L     L     L     L
2    X    L    L    L    L    L    L    L    L     L     L     L     L     L
3    V    L    L    L    L    L    L    L    L     L     L     L     L     L

dput(combinations)

structure(list(Var1 = structure(1:3, .Label = c("L", "X", "V"
), class = "factor"), Var2 = structure(c(1L, 1L, 1L), .Label = c("L", 
"X", "V"), class = "factor"), Var3 = structure(c(1L, 1L, 1L), .Label = c("L", 
"X", "V"), class = "factor"), Var4 = structure(c(1L, 1L, 1L), .Label = c("L", 
"X", "V"), class = "factor"), Var5 = structure(c(1L, 1L, 1L), .Label = c("L", 
"X", "V"), class = "factor"), Var6 = structure(c(1L, 1L, 1L), .Label = c("L", 
"X", "V"), class = "factor"), Var7 = structure(c(1L, 1L, 1L), .Label = c("L", 
"X", "V"), class = "factor"), Var8 = structure(c(1L, 1L, 1L), .Label = c("L", 
"X", "V"), class = "factor"), Var9 = structure(c(1L, 1L, 1L), .Label = c("L", 
"X", "V"), class = "factor"), Var10 = structure(c(1L, 1L, 1L), .Label = c("L", 
"X", "V"), class = "factor"), Var11 = structure(c(1L, 1L, 1L), .Label = c("L", 
"X", "V"), class = "factor"), Var12 = structure(c(1L, 1L, 1L), .Label = c("L", 
"X", "V"), class = "factor"), Var13 = structure(c(1L, 1L, 1L), .Label = c("L", 
"X", "V"), class = "factor"), Var14 = structure(c(1L, 1L, 1L), .Label = c("L", 
"X", "V"), class = "factor")), .Names = c("Var1", "Var2", "Var3", 
"Var4", "Var5", "Var6", "Var7", "Var8", "Var9", "Var10", "Var11", 
"Var12", "Var13", "Var14"), out.attrs = structure(list(dim = c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), dimnames = structure(list(
    Var1 = c("Var1=L", "Var1=X", "Var1=V"), Var2 = c("Var2=L", 
    "Var2=X", "Var2=V"), Var3 = c("Var3=L", "Var3=X", "Var3=V"
    ), Var4 = c("Var4=L", "Var4=X", "Var4=V"), Var5 = c("Var5=L", 
    "Var5=X", "Var5=V"), Var6 = c("Var6=L", "Var6=X", "Var6=V"
    ), Var7 = c("Var7=L", "Var7=X", "Var7=V"), Var8 = c("Var8=L", 
    "Var8=X", "Var8=V"), Var9 = c("Var9=L", "Var9=X", "Var9=V"
    ), Var10 = c("Var10=L", "Var10=X", "Var10=V"), Var11 = c("Var11=L", 
    "Var11=X", "Var11=V"), Var12 = c("Var12=L", "Var12=X", "Var12=V"
    ), Var13 = c("Var13=L", "Var13=X", "Var13=V"), Var14 = c("Var14=L", 
    "Var14=X", "Var14=V")), .Names = c("Var1", "Var2", "Var3", 
"Var4", "Var5", "Var6", "Var7", "Var8", "Var9", "Var10", "Var11", 
"Var12", "Var13", "Var14"))), .Names = c("dim", "dimnames")), row.names = c(NA, 
3L), class = "data.frame")

接下来是票价

price

  Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14
1 2.48 1.79 2.99 2.31 4.03  2.1 2.71 2.22  2.7  2.94  2.01  2.16  3.41  2.16
2 3.28 1.79 2.99 2.31 4.03  2.1 2.71 2.22  2.7  2.94  2.01  2.16  3.41  2.16
3 3.16 1.79 2.99 2.31 4.03  2.1 2.71 2.22  2.7  2.94  2.01  2.16  3.41  2.16

dput(price)

structure(list(Var1 = c(2.48, 3.28, 3.16), Var2 = c(1.79, 1.79, 
1.79), Var3 = c(2.99, 2.99, 2.99), Var4 = c(2.31, 2.31, 2.31), 
    Var5 = c(4.03, 4.03, 4.03), Var6 = c(2.1, 2.1, 2.1), Var7 = c(2.71, 
    2.71, 2.71), Var8 = c(2.22, 2.22, 2.22), Var9 = c(2.7, 2.7, 
    2.7), Var10 = c(2.94, 2.94, 2.94), Var11 = c(2.01, 2.01, 
    2.01), Var12 = c(2.16, 2.16, 2.16), Var13 = c(3.41, 3.41, 
    3.41), Var14 = c(2.16, 2.16, 2.16)), .Names = c("Var1", "Var2", 
"Var3", "Var4", "Var5", "Var6", "Var7", "Var8", "Var9", "Var10", 
"Var11", "Var12", "Var13", "Var14"), out.attrs = structure(list(
    dim = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L), dimnames = structure(list(Var1 = c("Var1=2.48", "Var1=3.28", 
    "Var1=3.16"), Var2 = c("Var2=1.79", "Var2=4.04", "Var2=4.55"
    ), Var3 = c("Var3=2.99", "Var3=3.38", "Var3=2.54"), Var4 = c("Var4=2.31", 
    "Var4=3.39", "Var4=3.36"), Var5 = c("Var5=4.03", "Var5=3.22", 
    "Var5=2.14"), Var6 = c("Var6=2.10", "Var6=3.77", "Var6=3.60"
    ), Var7 = c("Var7=2.71", "Var7=3.22", "Var7=2.93"), Var8 = c("Var8=2.22", 
    "Var8=3.56", "Var8=3.47"), Var9 = c("Var9=2.70", "Var9=3.66", 
    "Var9=2.65"), Var10 = c("Var10=2.94", "Var10=3.30", "Var10=2.65"
    ), Var11 = c("Var11=2.01", "Var11=3.61", "Var11=4.09"), Var12 = c("Var12=2.16", 
    "Var12=3.15", "Var12=4.19"), Var13 = c("Var13=3.41", "Var13=3.43", 
    "Var13=2.27"), Var14 = c("Var14=2.16", "Var14=3.78", "Var14=3.43"
    )), .Names = c("Var1", "Var2", "Var3", "Var4", "Var5", "Var6", 
    "Var7", "Var8", "Var9", "Var10", "Var11", "Var12", "Var13", 
    "Var14"))), .Names = c("dim", "dimnames")), row.names = c(NA, 
3L), class = "data.frame")

我做了下一个公式来计算某张机票的价格(这是一个不切实际的价格,但我的书上是这么写的)

myfunction2 <- function(x,y){
 ifelse(sum((x==winner))>=2,prod(((x==winner)*y)+((x==winner)<=0)*1),return(1))
                             }

  myfunction2(combinations[1,],price[1,])
  [1] 4.4392

  myfunction2(combinations[2,],price[2,])
  [1] 1.0000

  myfunction2(combinations[3,],price[3,])
  [1] 1.0000

公式是正确的,在第一种情况下价格是(2.48*1.79)=4.4392,在第二种情况和第三种情况下答案都是1,因为我至少需要两个正确答案才能支付超过1。

当我尝试 "vectorize" 公式时,答案不正确

R3 <- apply(combinations,1,myfunction2,y=price)

R3
    1        2        3 
  595.0378   1.00000  1.00000 

我可以用for循环解决这个问题,但是我有3^14个组合而且很慢,我尝试用mapply,但我有同样的问题(答案不正确)

欢迎任何帮助,非常感谢

要弄清楚为什么 apply 方法不起作用,您可以做的一件事是在您的函数中手动插入一个 browser() 并查看它实际看到的参数。使用上面的数据,这是添加了一行的函数:

myfunction2 <- function(x,y){
  browser()
  if (sum((x==winner))>=2) {
    return( prod( ((x == winner) * price) + (( x == winner) == 0) * 1) )
  } else {
    return(1)
  }
}

apply(combinations,1,myfunction2,price)
# Called from: FUN(newX[, i], ...)
# debug at #3: if (sum((x == winner)) >= 2) {
#     return(prod(((x == winner) * price) + ((x == winner) == 0) * 
#         1))
# } else {
#     return(1)
# }
# Browse[2]>
x
#  Var1  Var2  Var3  Var4  Var5  Var6  Var7  Var8  Var9 Var10 Var11 Var12 Var13 Var14 
#   "L"   "L"   "L"   "L"   "L"   "L"   "L"   "L"   "L"   "L"   "L"   "L"   "L"   "L" 

到目前为止一切顺利。

# Browse[2]>
y
#   Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14
# 1 2.48 1.79 2.99 2.31 4.03  2.1 2.71 2.22  2.7  2.94  2.01  2.16  3.41  2.16
# 2 3.28 1.79 2.99 2.31 4.03  2.1 2.71 2.22  2.7  2.94  2.01  2.16  3.41  2.16
# 3 3.16 1.79 2.99 2.31 4.03  2.1 2.71 2.22  2.7  2.94  2.01  2.16  3.41  2.16

你的问题有一部分:它在每次调用时看到 priceall,而你的函数不知道要检查这个。

顺便说一句:您误读了我关于使用 ifelse 的建议。虽然你使用它的方式是有效的,但它是不正确的,以后会咬你的。我建议你googleifelse和传统if ... else的区别。长话短说:ifelse 需要假设其三个参数的长度相同(或容易回收),因此当您测试单个事物时(sum(...)>=2,它的第一个参数),然后你应该使用 if ... else.

此外,由于 combinations[1,] 从技术上讲返回的是 data.frame,您应该将其取消列出以一致地处理其他事情。 (与 price 相同。)将此视为您的功能的替代方案:

myfunction3 <- function(x,y){
  i <- (unlist(x) == winner)
  if (sum(i) >= 2) prod(ifelse(i, unlist(y), 1)) else 1
}
myfunction3(combinations[1,], price[1,])
# [1] 4.4392
myfunction3(combinations[2,], price[2,])
# [1] 1
myfunction3(combinations[3,], price[3,])
# [1] 1

最后,我非常喜欢避免 side-effects (wiki). Something similar to this is reaching out of a function's scope (wiki) 获取父环境或名称中的变量 space。虽然它有效,但它可能会出现问题。我将通过在函数参数中传递 winner 的值来解决这个问题。

myfunction4 <- function(x,y,w){
  i <- (unlist(x) == w)
  if (sum(i) >= 2) prod(ifelse(i, unlist(y), 1)) else 1
}

回到最初的一次性获取所有 price 的问题。虽然当然可以将其扭曲以使用 mapply,但我认为它是 *apply 函数中唯一可以轻松应用于此问题的函数,即便如此,它也需要一些修改和转换data.frame 年代。相反,我建议使用这样的东西:

myfunction5 <- function(comb, pr, win) {
  i <- (unlist(comb) == win)
  if (sum(i) >= 2) prod(ifelse(i, unlist(pr), 1)) else 1
}

最后,我们可以做这样的事情了:

sapply(1:nrow(combinations),
       function(i, comb, pr, win) myfunction5(comb[i,], pr[i,], win),
       combinations, price, winner)
# [1] 4.4392 1.0000 1.0000

在这种情况下,i 只是 combinationsprice 中行的索引。虽然看起来很迂腐,但编写完全自给自足的函数(对于未提供给它的变量,不要达到它们 space 的 "outside")通常会产生更强、更多的 "defensive"程序。

注意:所有这些仍然在规避一个大的潜在问题:当 x 是 data.frame。虽然它在这里有效(只是因为你的每个 data.frame 都是完全同质的),但它会在你最意想不到的时候咬你一口。问题是 data.frames 允许您添加一个不同于其他 class 的列,因此如果您出于任何原因向 price 变量添加了一列字符串,none 将不再起作用。

此外,如果您没有进行健全性检查,这往往会出现一些问题。例如,强制要求 combinationsprice 必须具有相同的维度,并且 winner 的长度必须与其他两个中的列数相同。