R中事件序列分析的数据结构

Data Structure for Sequence of Event Analysis in R

下面的代码创建了一个示例数据框来说明我的问题。我有一个带有时间戳的事件列表。

set.seed(100)
mydf<-data.frame(time=(1:1000),event = sample(1:10,10000,replace=TRUE))

mydf
time event
1      6
2      5
3      7
4      8
5      4
6      2
7     10
8      9
9      4
10     6
11     4
12     3
13     8
14     3
15     9
16     1
17     7
18     3
19     8
20     10

我正在尝试创建一个新变量来列出指定 window 中的先前事件。假设 window 的大小为 10。我想在下面创建数据框。我的最终目标是为事件序列分析准备数据。

time event eventList
1      6       NA
2      5       NA
3      7       NA
4      8       NA
5      4       NA
6      2       NA
7     10       NA
8      9       NA
9      4       NA
10     6       NA
11     4       {6,5,7,8,4,2,10,9,4,6}
12     3       {5,7,8,4,2,10,9,4,6,4}
13     8       {7,8,4,2,10,9,4,6,4,3}
14     3       {8,4,2,10,9,4,6,4,3,8}
15     9       {4,2,10,9,4,6,4,3,8,3}
16     1       {2,10,9,4,6,4,3,8,3,9}
17     7       {10,9,4,6,4,3,8,3,9,1}
18     3       {9,4,6,4,3,8,3,9,1,7}
19     8       {4,6,4,3,8,3,9,1,7,8}
20     10      {6,4,3,8,3,9,1,7,8,10}

我假设有人会想出一个更 R 的方法来减少你的运行时间。同时你可以试试这个:

for (i in 1:nrow(mydf)){
  if(i<=w){
    mydf$eventList[i] = NA
    } 
  else {
    mydf$eventList[i] = list(mydf$event[c((i-w):i)])
    }
}

最后三行不匹配,请检查您的预期输出

mydf=read.table(text="
time event
1      6
2      5
3      7
4      8
5      4
6      2
7     10
8      9
9      4
10     6
11     4
12     3
13     8
14     3
15     9
16     1
17     7
18     3
19     8
20     10",header=TRUE,stringsAsFactors=FALSE)


windowSize = 10
mydf$eventList = do.call(rbind,lapply(mydf$time,function(x) {
ifelse(x<windowSize,NA,paste0("{", paste0(mydf[ tail(1:x,windowSize) ,"event"],collapse=",") , "}"))

}))

mydf
#   time event              eventList
#1     1     6                   <NA>
#2     2     5                   <NA>
#3     3     7                   <NA>
#4     4     8                   <NA>
#5     5     4                   <NA>
#6     6     2                   <NA>
#7     7    10                   <NA>
#8     8     9                   <NA>
#9     9     4                   <NA>
#10   10     6 {6,5,7,8,4,2,10,9,4,6}
#11   11     4 {5,7,8,4,2,10,9,4,6,4}
#12   12     3 {7,8,4,2,10,9,4,6,4,3}
#13   13     8 {8,4,2,10,9,4,6,4,3,8}
#14   14     3 {4,2,10,9,4,6,4,3,8,3}
#15   15     9 {2,10,9,4,6,4,3,8,3,9}
#16   16     1 {10,9,4,6,4,3,8,3,9,1}
#17   17     7  {9,4,6,4,3,8,3,9,1,7}
#18   18     3  {4,6,4,3,8,3,9,1,7,3}
#19   19     8  {6,4,3,8,3,9,1,7,3,8}
#20   20    10 {4,3,8,3,9,1,7,3,8,10}