R - 大型数据集基于列数据的计时器(条件在 x 时间内为真)
R - Timer based on column data (Condition has been true for x time) for large data sets
免责声明:我是 R 的新手,我已经搜索过答案。也有类似的问题,但我在将我读到的内容转化为对我的实施有意义的内容时遇到了问题。
我正在尝试添加一个条件计时器列,它测量 sampleCondition = TRUE 的时间量。如果 Condition returns 为 false,则计时器应重置。注意:我正在尝试摆脱 for 循环。我目前正在以秒为单位计算 ConditionTime,但样本可能是分钟。
最终结果应如下所示:ConditionTime
我仍在学习,到目前为止,我为改进大型数据集所做的每一次尝试都以破坏一切而告终。 (大约 100 万行)有人可以提供示例解决方案或为我指明正确的方向吗?任何帮助是极大的赞赏。 :)
#create sample DateTime
DateTime <- c("2017-09-01 09:37:04", "2017-09-01 09:38:04", "2017-09-01 09:39:04", "2017-09-01 09:40:04", "2017-09-01 09:41:04", "2017-09-01 09:42:04", "2017-09-01 09:43:04")
#create sample condition
sampleCondition <- c(0,1,0,0,1,1,0)
#create sample DF
sampleDF <- data.frame(DateTime,sampleCondition)
#calculate the time diff from data point to data point
sampleDF$rowTimeDiff <- c(0,difftime(sampleDF$DateTime[2:length(sampleDF$DateTime)], sampleDF$DateTime[1:(length(sampleDF$DateTime)-1)] , units = "secs"))
#check if condition is true (else NA), check if condition was true in the last row. ConditionTime = sum of ConditionTime[previous row] and rowTimeDiff
for (i in 1:length(sampleDF$DateTime)) {
sampleDF$ConditionTime[i] <- ifelse(sampleDF$sampleCondition[i] == 1,
ifelse(is.na(sampleDF$ConditionTime[i-1]), sampleDF$rowTimeDiff[i], sum(sampleDF$ConditionTime[i-1], sampleDF$rowTimeDiff[i]))
, NA )
i <- i + 1
}
再次感谢!
编辑:为清楚起见,添加了更多数据以进行采样。
试试这个:
x <- sampleDF$sampleCondition
(cumsum(x)-cummax((!x)*cumsum(x)))*60
[1] 0 60 0 0 60 120 0
时间测试:
microbenchmark(
cumsum(x)-cummax((!x)*cumsum(x))*60
)
Unit: nanoseconds
expr min lq mean median uq max neval
60 973 989.5 1357.09 1060 1139.5 23265 100
示例数据:
sampleDF <- data.frame(
DateTime=c("2017-09-01 09:37:04", "2017-09-01 09:38:04", "2017-09-01 09:39:04", "2017-09-01 09:40:04", "2017-09-01 09:41:04", "2017-09-01 09:42:04", "2017-09-01 09:43:04"),
sampleCondition=c(0,1,0,0,1,1,0),
)
免责声明:我是 R 的新手,我已经搜索过答案。也有类似的问题,但我在将我读到的内容转化为对我的实施有意义的内容时遇到了问题。
我正在尝试添加一个条件计时器列,它测量 sampleCondition = TRUE 的时间量。如果 Condition returns 为 false,则计时器应重置。注意:我正在尝试摆脱 for 循环。我目前正在以秒为单位计算 ConditionTime,但样本可能是分钟。 最终结果应如下所示:ConditionTime
我仍在学习,到目前为止,我为改进大型数据集所做的每一次尝试都以破坏一切而告终。 (大约 100 万行)有人可以提供示例解决方案或为我指明正确的方向吗?任何帮助是极大的赞赏。 :)
#create sample DateTime
DateTime <- c("2017-09-01 09:37:04", "2017-09-01 09:38:04", "2017-09-01 09:39:04", "2017-09-01 09:40:04", "2017-09-01 09:41:04", "2017-09-01 09:42:04", "2017-09-01 09:43:04")
#create sample condition
sampleCondition <- c(0,1,0,0,1,1,0)
#create sample DF
sampleDF <- data.frame(DateTime,sampleCondition)
#calculate the time diff from data point to data point
sampleDF$rowTimeDiff <- c(0,difftime(sampleDF$DateTime[2:length(sampleDF$DateTime)], sampleDF$DateTime[1:(length(sampleDF$DateTime)-1)] , units = "secs"))
#check if condition is true (else NA), check if condition was true in the last row. ConditionTime = sum of ConditionTime[previous row] and rowTimeDiff
for (i in 1:length(sampleDF$DateTime)) {
sampleDF$ConditionTime[i] <- ifelse(sampleDF$sampleCondition[i] == 1,
ifelse(is.na(sampleDF$ConditionTime[i-1]), sampleDF$rowTimeDiff[i], sum(sampleDF$ConditionTime[i-1], sampleDF$rowTimeDiff[i]))
, NA )
i <- i + 1
}
再次感谢!
编辑:为清楚起见,添加了更多数据以进行采样。
试试这个:
x <- sampleDF$sampleCondition
(cumsum(x)-cummax((!x)*cumsum(x)))*60
[1] 0 60 0 0 60 120 0
时间测试:
microbenchmark(
cumsum(x)-cummax((!x)*cumsum(x))*60
)
Unit: nanoseconds
expr min lq mean median uq max neval
60 973 989.5 1357.09 1060 1139.5 23265 100
示例数据:
sampleDF <- data.frame(
DateTime=c("2017-09-01 09:37:04", "2017-09-01 09:38:04", "2017-09-01 09:39:04", "2017-09-01 09:40:04", "2017-09-01 09:41:04", "2017-09-01 09:42:04", "2017-09-01 09:43:04"),
sampleCondition=c(0,1,0,0,1,1,0),
)