从 for 循环到 R 中的函数

Question

我很好奇如何将我编写的 for 循环转换为 R 中的函数？我没有在 R 中编写自己的函数的经验。我看过 here and here but these did not seem to offer much help. I am aware that for loops are not necessary and overall I'm trying to do something similar to this blog post。

具有可重现数据的 for 循环在这里：

P <- c(1:50)
y <- length(P)
D <- as.data.frame(combs(P,2))
Z <- choose(y,2)
Num = NULL
Denom = NULL
Diff = NULL

for(n in 1:Z)
   {
    Num[n] = abs(D$V1[n]-D$V2[n])
    Denom[n] = max(D$V1[n], D$V2[n])
    Diff[n] = Num[n]/Denom[n]
    }
 PV=mean(Diff)
 PV

但是，我有兴趣根据以下数据中的级别计算 PV：

DATA <- c(1:500)
NAME <- c("a", "b", "c", "d", "e")
mydf <- as.data.frame(cbind(DATA, NAME))

因此，我想使用的最终代码是：

ANSWER <- tapply(mydf$DATA, mydf$NAME, MY.FUNCTION)

所以，如果我可以将上面的 for 循环变成一个工作函数，我可以运行 tapply 函数根据级别获取 PV。

如有任何帮助或任何其他与我提供的建议相反的建议，我们将不胜感激。

谢谢！

Answer 1

加载库后：

library(caTools)

这是您可以对数据运行使用的函数：

mymeandiff <- function(values){
    df <- as.data.frame(combs(values, 2))
    diff <- abs(df$V1 - df$V2)/pmax(df$V1, df$V2)
    mean(diff)
}
mymeandiff(1:50)

然后我们可以在每个组上使用dplyr到运行（修正数据后）：

mydf$DATA <-as.numeric(as.character(mydf$DATA))

library(dplyr)
mydf %>% group_by(NAME) %>%
         summarise(mymeandiff(DATA))

申请，而不是dplyr：

tapply(mydf$DATA, mydf$NAME, FUN = mymeandiff)

让我们计时：

microbenchmark::microbenchmark(tapply = tapply(mydf$DATA, mydf$NAME, FUN=mymeandiff),
                               dplyr = mydf %>% group_by(NAME) %>%
                                                summarise(mymeandiff(DATA)))
Unit: milliseconds
   expr      min       lq     mean   median       uq       max neval
 tapply 60.36543 61.08658 63.81995 62.61182 66.13671  80.37819   100
  dplyr 61.84766 62.53751 67.33161 63.61270 67.58688 287.78364   100

tapply稍快

从 for 循环到 R 中的函数

Going from a for loop to a function in R

loops

r

function

apply

levels