R中事件序列分析的数据结构
Data Structure for Sequence of Event Analysis in R
下面的代码创建了一个示例数据框来说明我的问题。我有一个带有时间戳的事件列表。
set.seed(100)
mydf<-data.frame(time=(1:1000),event = sample(1:10,10000,replace=TRUE))
mydf
time event
1 6
2 5
3 7
4 8
5 4
6 2
7 10
8 9
9 4
10 6
11 4
12 3
13 8
14 3
15 9
16 1
17 7
18 3
19 8
20 10
我正在尝试创建一个新变量来列出指定 window 中的先前事件。假设 window 的大小为 10。我想在下面创建数据框。我的最终目标是为事件序列分析准备数据。
time event eventList
1 6 NA
2 5 NA
3 7 NA
4 8 NA
5 4 NA
6 2 NA
7 10 NA
8 9 NA
9 4 NA
10 6 NA
11 4 {6,5,7,8,4,2,10,9,4,6}
12 3 {5,7,8,4,2,10,9,4,6,4}
13 8 {7,8,4,2,10,9,4,6,4,3}
14 3 {8,4,2,10,9,4,6,4,3,8}
15 9 {4,2,10,9,4,6,4,3,8,3}
16 1 {2,10,9,4,6,4,3,8,3,9}
17 7 {10,9,4,6,4,3,8,3,9,1}
18 3 {9,4,6,4,3,8,3,9,1,7}
19 8 {4,6,4,3,8,3,9,1,7,8}
20 10 {6,4,3,8,3,9,1,7,8,10}
我假设有人会想出一个更 R
的方法来减少你的运行时间。同时你可以试试这个:
for (i in 1:nrow(mydf)){
if(i<=w){
mydf$eventList[i] = NA
}
else {
mydf$eventList[i] = list(mydf$event[c((i-w):i)])
}
}
最后三行不匹配,请检查您的预期输出
mydf=read.table(text="
time event
1 6
2 5
3 7
4 8
5 4
6 2
7 10
8 9
9 4
10 6
11 4
12 3
13 8
14 3
15 9
16 1
17 7
18 3
19 8
20 10",header=TRUE,stringsAsFactors=FALSE)
windowSize = 10
mydf$eventList = do.call(rbind,lapply(mydf$time,function(x) {
ifelse(x<windowSize,NA,paste0("{", paste0(mydf[ tail(1:x,windowSize) ,"event"],collapse=",") , "}"))
}))
mydf
# time event eventList
#1 1 6 <NA>
#2 2 5 <NA>
#3 3 7 <NA>
#4 4 8 <NA>
#5 5 4 <NA>
#6 6 2 <NA>
#7 7 10 <NA>
#8 8 9 <NA>
#9 9 4 <NA>
#10 10 6 {6,5,7,8,4,2,10,9,4,6}
#11 11 4 {5,7,8,4,2,10,9,4,6,4}
#12 12 3 {7,8,4,2,10,9,4,6,4,3}
#13 13 8 {8,4,2,10,9,4,6,4,3,8}
#14 14 3 {4,2,10,9,4,6,4,3,8,3}
#15 15 9 {2,10,9,4,6,4,3,8,3,9}
#16 16 1 {10,9,4,6,4,3,8,3,9,1}
#17 17 7 {9,4,6,4,3,8,3,9,1,7}
#18 18 3 {4,6,4,3,8,3,9,1,7,3}
#19 19 8 {6,4,3,8,3,9,1,7,3,8}
#20 20 10 {4,3,8,3,9,1,7,3,8,10}
下面的代码创建了一个示例数据框来说明我的问题。我有一个带有时间戳的事件列表。
set.seed(100)
mydf<-data.frame(time=(1:1000),event = sample(1:10,10000,replace=TRUE))
mydf
time event
1 6
2 5
3 7
4 8
5 4
6 2
7 10
8 9
9 4
10 6
11 4
12 3
13 8
14 3
15 9
16 1
17 7
18 3
19 8
20 10
我正在尝试创建一个新变量来列出指定 window 中的先前事件。假设 window 的大小为 10。我想在下面创建数据框。我的最终目标是为事件序列分析准备数据。
time event eventList
1 6 NA
2 5 NA
3 7 NA
4 8 NA
5 4 NA
6 2 NA
7 10 NA
8 9 NA
9 4 NA
10 6 NA
11 4 {6,5,7,8,4,2,10,9,4,6}
12 3 {5,7,8,4,2,10,9,4,6,4}
13 8 {7,8,4,2,10,9,4,6,4,3}
14 3 {8,4,2,10,9,4,6,4,3,8}
15 9 {4,2,10,9,4,6,4,3,8,3}
16 1 {2,10,9,4,6,4,3,8,3,9}
17 7 {10,9,4,6,4,3,8,3,9,1}
18 3 {9,4,6,4,3,8,3,9,1,7}
19 8 {4,6,4,3,8,3,9,1,7,8}
20 10 {6,4,3,8,3,9,1,7,8,10}
我假设有人会想出一个更 R
的方法来减少你的运行时间。同时你可以试试这个:
for (i in 1:nrow(mydf)){
if(i<=w){
mydf$eventList[i] = NA
}
else {
mydf$eventList[i] = list(mydf$event[c((i-w):i)])
}
}
最后三行不匹配,请检查您的预期输出
mydf=read.table(text="
time event
1 6
2 5
3 7
4 8
5 4
6 2
7 10
8 9
9 4
10 6
11 4
12 3
13 8
14 3
15 9
16 1
17 7
18 3
19 8
20 10",header=TRUE,stringsAsFactors=FALSE)
windowSize = 10
mydf$eventList = do.call(rbind,lapply(mydf$time,function(x) {
ifelse(x<windowSize,NA,paste0("{", paste0(mydf[ tail(1:x,windowSize) ,"event"],collapse=",") , "}"))
}))
mydf
# time event eventList
#1 1 6 <NA>
#2 2 5 <NA>
#3 3 7 <NA>
#4 4 8 <NA>
#5 5 4 <NA>
#6 6 2 <NA>
#7 7 10 <NA>
#8 8 9 <NA>
#9 9 4 <NA>
#10 10 6 {6,5,7,8,4,2,10,9,4,6}
#11 11 4 {5,7,8,4,2,10,9,4,6,4}
#12 12 3 {7,8,4,2,10,9,4,6,4,3}
#13 13 8 {8,4,2,10,9,4,6,4,3,8}
#14 14 3 {4,2,10,9,4,6,4,3,8,3}
#15 15 9 {2,10,9,4,6,4,3,8,3,9}
#16 16 1 {10,9,4,6,4,3,8,3,9,1}
#17 17 7 {9,4,6,4,3,8,3,9,1,7}
#18 18 3 {4,6,4,3,8,3,9,1,7,3}
#19 19 8 {6,4,3,8,3,9,1,7,3,8}
#20 20 10 {4,3,8,3,9,1,7,3,8,10}