从 for 循环到 R 中的函数
Going from a for loop to a function in R
我很好奇如何将我编写的 for 循环转换为 R 中的函数?我没有在 R 中编写自己的函数的经验。我看过 here and here but these did not seem to offer much help. I am aware that for loops are not necessary and overall I'm trying to do something similar to this blog post。
具有可重现数据的 for 循环在这里:
P <- c(1:50)
y <- length(P)
D <- as.data.frame(combs(P,2))
Z <- choose(y,2)
Num = NULL
Denom = NULL
Diff = NULL
for(n in 1:Z)
{
Num[n] = abs(D$V1[n]-D$V2[n])
Denom[n] = max(D$V1[n], D$V2[n])
Diff[n] = Num[n]/Denom[n]
}
PV=mean(Diff)
PV
但是,我有兴趣根据以下数据中的级别计算 PV:
DATA <- c(1:500)
NAME <- c("a", "b", "c", "d", "e")
mydf <- as.data.frame(cbind(DATA, NAME))
因此,我想使用的最终代码是:
ANSWER <- tapply(mydf$DATA, mydf$NAME, MY.FUNCTION)
所以,如果我可以将上面的 for 循环变成一个工作函数,我可以 运行 tapply 函数根据级别获取 PV。
如有任何帮助或任何其他与我提供的建议相反的建议,我们将不胜感激。
谢谢!
加载库后:
library(caTools)
这是您可以对数据 运行 使用的函数:
mymeandiff <- function(values){
df <- as.data.frame(combs(values, 2))
diff <- abs(df$V1 - df$V2)/pmax(df$V1, df$V2)
mean(diff)
}
mymeandiff(1:50)
然后我们可以在每个组上使用dplyr
到运行(修正数据后):
mydf$DATA <-as.numeric(as.character(mydf$DATA))
library(dplyr)
mydf %>% group_by(NAME) %>%
summarise(mymeandiff(DATA))
申请,而不是dplyr:
tapply(mydf$DATA, mydf$NAME, FUN = mymeandiff)
让我们计时:
microbenchmark::microbenchmark(tapply = tapply(mydf$DATA, mydf$NAME, FUN=mymeandiff),
dplyr = mydf %>% group_by(NAME) %>%
summarise(mymeandiff(DATA)))
Unit: milliseconds
expr min lq mean median uq max neval
tapply 60.36543 61.08658 63.81995 62.61182 66.13671 80.37819 100
dplyr 61.84766 62.53751 67.33161 63.61270 67.58688 287.78364 100
tapply
稍快
我很好奇如何将我编写的 for 循环转换为 R 中的函数?我没有在 R 中编写自己的函数的经验。我看过 here and here but these did not seem to offer much help. I am aware that for loops are not necessary and overall I'm trying to do something similar to this blog post。
具有可重现数据的 for 循环在这里:
P <- c(1:50)
y <- length(P)
D <- as.data.frame(combs(P,2))
Z <- choose(y,2)
Num = NULL
Denom = NULL
Diff = NULL
for(n in 1:Z)
{
Num[n] = abs(D$V1[n]-D$V2[n])
Denom[n] = max(D$V1[n], D$V2[n])
Diff[n] = Num[n]/Denom[n]
}
PV=mean(Diff)
PV
但是,我有兴趣根据以下数据中的级别计算 PV:
DATA <- c(1:500)
NAME <- c("a", "b", "c", "d", "e")
mydf <- as.data.frame(cbind(DATA, NAME))
因此,我想使用的最终代码是:
ANSWER <- tapply(mydf$DATA, mydf$NAME, MY.FUNCTION)
所以,如果我可以将上面的 for 循环变成一个工作函数,我可以 运行 tapply 函数根据级别获取 PV。
如有任何帮助或任何其他与我提供的建议相反的建议,我们将不胜感激。
谢谢!
加载库后:
library(caTools)
这是您可以对数据 运行 使用的函数:
mymeandiff <- function(values){
df <- as.data.frame(combs(values, 2))
diff <- abs(df$V1 - df$V2)/pmax(df$V1, df$V2)
mean(diff)
}
mymeandiff(1:50)
然后我们可以在每个组上使用dplyr
到运行(修正数据后):
mydf$DATA <-as.numeric(as.character(mydf$DATA))
library(dplyr)
mydf %>% group_by(NAME) %>%
summarise(mymeandiff(DATA))
申请,而不是dplyr:
tapply(mydf$DATA, mydf$NAME, FUN = mymeandiff)
让我们计时:
microbenchmark::microbenchmark(tapply = tapply(mydf$DATA, mydf$NAME, FUN=mymeandiff),
dplyr = mydf %>% group_by(NAME) %>%
summarise(mymeandiff(DATA)))
Unit: milliseconds
expr min lq mean median uq max neval
tapply 60.36543 61.08658 63.81995 62.61182 66.13671 80.37819 100
dplyr 61.84766 62.53751 67.33161 63.61270 67.58688 287.78364 100
tapply
稍快