使用 rxDataStep 和 mclapply 创建滞后值数组
Creating an array of lagged values with rxDataStep and mclapply
我有一个名为 probe03Seq 的 xdf 文件,其中包含事件列表。
Variable information:
Var 1: cm_mac_address 55718 factor levels:
Var 2: time, Type: POSIXct
Var 3: status 3 factor levels:
Var 4: duration_disc 10 factor levels:
Var 5: down_power_disc 10 factor levels:
Var 6: down_snr_disc 10 factor levels:
Var 7: down_speed_disc 10 factor levels
Var 8: latency_disc 1 factor levels:
Var 9: up_power_disc 10 factor levels:
Var 10: up_speed_disc 10 factor levels:
Var 11: Sequence 34777 factor levels:
然后我将这个大的 xdf 按 cm_mac_address 列拆分成许多小的 xdf,然后按时间对它们进行排序。
nocSplit <- rxSplit(inData = "probe03Seq.xdf",
outFilesBase = file.path(tempdir(), "MACAddress"),
splitByFactor = "cm_mac_address")
mclapply(nocSplit, FUN = function(xdf) {
rxSort(inData = xdf,
outFile = xdf,
sortByVars = "time",
overwrite = TRUE)
})
我现在正试图弄清楚如何在这些小 xdf 之间创建一个新变量。我希望能够设置 window 大小,并能够在 Sequence 列中创建一个先前值的数组。因此,例如,如果我每小时进行一次观察,window 大小为 10 小时,我希望看到类似于下面的 SequenceList 列的内容。
window.size = 10 hrs
time Sequence SequenceList
1 6 NA
2 5 NA
3 7 NA
4 8 NA
5 4 NA
6 2 NA
7 10 NA
8 9 NA
9 4 NA
10 6 {6,5,7,8,4,2,10,9,4,6}
11 4 {5,7,8,4,2,10,9,4,6,4}
12 3 {7,8,4,2,10,9,4,6,4,3}
13 8 {8,4,2,10,9,4,6,4,3,8}
14 3 {4,2,10,9,4,6,4,3,8,3}
15 9 {2,10,9,4,6,4,3,8,3,9}
16 1 {10,9,4,6,4,3,8,3,9,1}
17 7 {9,4,6,4,3,8,3,9,1,7}
18 3 {4,6,4,3,8,3,9,1,7,3}
19 8 {6,4,3,8,3,9,1,7,3,8}
20 10 {4,3,8,3,9,1,7,3,8,10}
Azure 团队的马特·帕克 (Matt Parker) 有一个很好的代码,可以在下面滞后 1 行。
https://gist.github.com/mmparker/8aca803eae5410875a21
lagVar <- function(dataList) {
if(.rxStartRow == 1) {
dataList[[newName]] <- c(NA, dataList[[varToLag]][-.rxNumRows])
} else {
dataList[[newName]] <- c(.rxGet("lastValue"),
dataList[[varToLag]][-.rxNumRows])
}
.rxSet("lastValue", dataList[[varToLag]][.rxNumRows])
dataList
}
lapply(djiaSplit, FUN = function(xdf) {
rxDataStep(inData = xdf,
outFile = xdf,
transformObjects = list(
varToLag = "Open",
newName = "previousOpen"),
transformFunc = lagVar,
# append = "cols",
overwrite = TRUE)
})
我认为可以再次使用在 mclapply 中使用自定义函数包装 rxDataStep 的相同方法。我只是在想出这个功能时遇到了麻烦。任何帮助,将不胜感激!目前我有这个代码
我想出了一个适用于常规数据框的函数,
set.seed(100)
mydf<-data.frame(time=(1:1000),event = sample(1:10,10000,replace=TRUE))
w=10
for (i in 1:nrow(mydf)){
if(i<=w){
mydf$eventList[i] = NA
}
else {
mydf$eventList[i] = list(mydf$event[c((i-w):i)])
}
}
但是,当我修改它以使用 xdf 文件时出现错误。
lagVarWindow <- function(dataList) {
for (i in 1:.rxNumRows){
if(i<=window.size){
dataList[[newName]][i] = NA
}
else {
dataList[[newName]][i] = list(dataList[[varToLag]][c((i-window.size):i)])
}
}
dataList
}
mclapply(nocSplit, FUN = function(xdf) {
rxDataStep(inData = xdf,
outFile = xdf,
transformObjects = list(
window.size = 10,
varToLag = "Sequence",
newName = "Sequence2"),
transformFunc = lagVarWindow,
# append = "cols",
overwrite = TRUE)
})
Error in doTryCatch(return(expr), name, parentenv, handler) :
Found list tag in the middle of data: '<list=Sequence2&2190:1>
我能够通过将错误包装在 paste() 函数中来修复错误
lagVarWindow <- function(dataList) {
for (i in 1:.rxNumRows){
if(i<=window.size){
dataList[[newName]][i] = NA
}
else {
dataList[[newName]][i] = paste(list(dataList[[varToLag]][c((i-window.size):i)]))
}
}
dataList
}
我有一个名为 probe03Seq 的 xdf 文件,其中包含事件列表。
Variable information:
Var 1: cm_mac_address 55718 factor levels:
Var 2: time, Type: POSIXct
Var 3: status 3 factor levels:
Var 4: duration_disc 10 factor levels:
Var 5: down_power_disc 10 factor levels:
Var 6: down_snr_disc 10 factor levels:
Var 7: down_speed_disc 10 factor levels
Var 8: latency_disc 1 factor levels:
Var 9: up_power_disc 10 factor levels:
Var 10: up_speed_disc 10 factor levels:
Var 11: Sequence 34777 factor levels:
然后我将这个大的 xdf 按 cm_mac_address 列拆分成许多小的 xdf,然后按时间对它们进行排序。
nocSplit <- rxSplit(inData = "probe03Seq.xdf",
outFilesBase = file.path(tempdir(), "MACAddress"),
splitByFactor = "cm_mac_address")
mclapply(nocSplit, FUN = function(xdf) {
rxSort(inData = xdf,
outFile = xdf,
sortByVars = "time",
overwrite = TRUE)
})
我现在正试图弄清楚如何在这些小 xdf 之间创建一个新变量。我希望能够设置 window 大小,并能够在 Sequence 列中创建一个先前值的数组。因此,例如,如果我每小时进行一次观察,window 大小为 10 小时,我希望看到类似于下面的 SequenceList 列的内容。
window.size = 10 hrs
time Sequence SequenceList
1 6 NA
2 5 NA
3 7 NA
4 8 NA
5 4 NA
6 2 NA
7 10 NA
8 9 NA
9 4 NA
10 6 {6,5,7,8,4,2,10,9,4,6}
11 4 {5,7,8,4,2,10,9,4,6,4}
12 3 {7,8,4,2,10,9,4,6,4,3}
13 8 {8,4,2,10,9,4,6,4,3,8}
14 3 {4,2,10,9,4,6,4,3,8,3}
15 9 {2,10,9,4,6,4,3,8,3,9}
16 1 {10,9,4,6,4,3,8,3,9,1}
17 7 {9,4,6,4,3,8,3,9,1,7}
18 3 {4,6,4,3,8,3,9,1,7,3}
19 8 {6,4,3,8,3,9,1,7,3,8}
20 10 {4,3,8,3,9,1,7,3,8,10}
Azure 团队的马特·帕克 (Matt Parker) 有一个很好的代码,可以在下面滞后 1 行。 https://gist.github.com/mmparker/8aca803eae5410875a21
lagVar <- function(dataList) {
if(.rxStartRow == 1) {
dataList[[newName]] <- c(NA, dataList[[varToLag]][-.rxNumRows])
} else {
dataList[[newName]] <- c(.rxGet("lastValue"),
dataList[[varToLag]][-.rxNumRows])
}
.rxSet("lastValue", dataList[[varToLag]][.rxNumRows])
dataList
}
lapply(djiaSplit, FUN = function(xdf) {
rxDataStep(inData = xdf,
outFile = xdf,
transformObjects = list(
varToLag = "Open",
newName = "previousOpen"),
transformFunc = lagVar,
# append = "cols",
overwrite = TRUE)
})
我认为可以再次使用在 mclapply 中使用自定义函数包装 rxDataStep 的相同方法。我只是在想出这个功能时遇到了麻烦。任何帮助,将不胜感激!目前我有这个代码
我想出了一个适用于常规数据框的函数,
set.seed(100)
mydf<-data.frame(time=(1:1000),event = sample(1:10,10000,replace=TRUE))
w=10
for (i in 1:nrow(mydf)){
if(i<=w){
mydf$eventList[i] = NA
}
else {
mydf$eventList[i] = list(mydf$event[c((i-w):i)])
}
}
但是,当我修改它以使用 xdf 文件时出现错误。
lagVarWindow <- function(dataList) {
for (i in 1:.rxNumRows){
if(i<=window.size){
dataList[[newName]][i] = NA
}
else {
dataList[[newName]][i] = list(dataList[[varToLag]][c((i-window.size):i)])
}
}
dataList
}
mclapply(nocSplit, FUN = function(xdf) {
rxDataStep(inData = xdf,
outFile = xdf,
transformObjects = list(
window.size = 10,
varToLag = "Sequence",
newName = "Sequence2"),
transformFunc = lagVarWindow,
# append = "cols",
overwrite = TRUE)
})
Error in doTryCatch(return(expr), name, parentenv, handler) :
Found list tag in the middle of data: '<list=Sequence2&2190:1>
我能够通过将错误包装在 paste() 函数中来修复错误
lagVarWindow <- function(dataList) {
for (i in 1:.rxNumRows){
if(i<=window.size){
dataList[[newName]][i] = NA
}
else {
dataList[[newName]][i] = paste(list(dataList[[varToLag]][c((i-window.size):i)]))
}
}
dataList
}