如何匹配 Row 和 Row +1 使用 apply R

Question

我正在尝试用 apply 函数替换在大型数据集上不会运行的低效嵌套 for 循环。

    unique <- cbind.data.frame(c(1,2,3))
    colnames(unique) <- "note"

    ptSeensub <- rbind.data.frame(c(1,"a"), c(1,"b"), c(2,"a"), c(2,"d"), c(3,"e"), c(3,"f"))
    colnames(ptSeenSub) <- c("PARENT_EVENT_ID", "USER_NAME")

    uniqueRow <- nrow(unique)
    ptSeenSubRow <- nrow(ptSeenSubRow)

    for (note in 1:uniqueRow)
    {
       for (row in 1:ptSeenSubRow)
       {
         if (ptSeenSub$PARENT_EVENT_ID[row] == unique$note[note])
         {
           unique$attending_name[note] <- ptSeenSub$USER_NAME[row]
           unique$attending_name[note] <- ptSeenSub$USER_NAME[row +1]
         } 
       }
     }

我希望结果类似于此数据框：

results <- rbind.data.frame(c(1, "a", "b"), c(2, "a", "d"), c(3,"e", "f"))
colnames(results) <- c("note", "attending_name", "resident_name")

循环将运行超过数百万行并且不会结束。我如何对其进行矢量化以完成大型数据集？非常感谢任何建议

Answer 1

听起来您正在尝试将数据重塑为宽格式。我发现 dplyr 和 tidyr 找到了完成此任务的好工具。

定义数据

library(tidyr)
library(dplyr)
ptSeenSub <- rbind.data.frame(c(1,"a"), c(1,"b"), c(2,"a"), c(2,"d"), c(3,"e"), c(3,"f"))

整形

result <- ptSeenSub %>%
  group_by(PARENT_EVENT_ID) %>%
  mutate(k = row_number()) %>%
  spread(k, USER_NAME)

然后您可以根据需要更改名称：

names(result) <- c("notes", "attending_name", "resident_name")

Answer 2

您也可以使用 reshape2 中的 dcast 或 data.table 的开发版本（应该很快），即 v1.9.5

library(data.table)
setnames(dcast(setDT(ptSeensub)[, N:= 1:.N, PARENT_EVENT_ID], 
  PARENT_EVENT_ID~N, value.var='USER_NAME'), 
        c('note', 'attending_name', 'resident_name'))[]
#   note attending_name resident_name
#1:    1              a             b
#2:    2              a             d
#3:    3              e             f

如果每个观察只有两个 'PARENT_EVENT_ID'

 setDT(ptSeensub)[,.(attending_name=USER_NAME[1L], 
       resident_name=USER_NAME[2L]) , .(note=PARENT_EVENT_ID)]
 #   note attending_name resident_name
 #1:    1              a             b
 #2:    2              a             d
 #3:    3              e             f

如何匹配 Row 和 Row +1 使用 apply R

How to match Row and Row +1 Using apply R

for-loop

r

vectorization

apply

nested-loops