R double for 循环：外部还是应用？

Question

我有以下代码：

a <- c(1,2,2,3,4,5,6)
b <- c(4,5,6,7,8,8,9)
data <- data.frame(cbind(a,b))
trial <- copy(data)
for (j in 1: ncol(trial)) {
  for (i in 2: nrow(trial)) {
  if (trial[i,j] == trial[i-1,j] & !is.na(trial[i,j]) & !is.na(trial[i-1,j]))  {
     trial[i,j] <- trial[i-1,j] + (0.001*sd(trial[,j], na.rm = T))
    }
 }
}

代码完美运行，但在更大的数据集上运行有点慢。我想通过使用 apply 或 outer 系列来提高速度。问题是：

我知道如何使用 apply 应用单个循环，但不知道如何使用 2，特别是在这种情况下，我需要根据具体情况用另一个单个值（滞后）替换单个值标准偏差的乘数（这是我需要在整个列上计算的东西；
除了，我完全没有使用外部函数和向量化函数而不是循环的经验。

Answer 1

这对你有用吗？

a <- c(1,2,2,3,4,5,6)
b <- c(4,5,6,7,8,8,9)
data <- data.frame(cbind(a,b))
trial <- data.frame(a,b)
for (j in 1: ncol(trial)) {
# Finds matching rows and add a single row shift in the results
# (diff returns n-1 elements and we want n elements) 
  matching<-!c(TRUE, diff(trial[,j]))
  trial[matching,j]<- data[matching,j]+(0.001*sd(trial[,j], na.rm = T))
}

我对内循环进行了向量化处理，这对性能应该有显着的提升。我没有测试如果有多个匹配行，sd 计算会发生什么。
我会把它留给其他人来改进这个修订版。 data.table 的使用还有其他好处。

Answer 2

和data.table

library(data.table)
f <- function(x)ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
setDT(data)[, lapply(.SD, f), ]

和dplyr

library(dplyr)
f <- function(x)ifelse(x==lag(x), x + 0.001* sd(x, na.rm = TRUE), x)
data %>%
  mutate_each(funs(f))

R double for 循环：外部还是应用？

R double for loop: outer or apply?

for-loop

r

outer-join

apply