如何使用for循环根据日志文件的posixct之间的差异创建新变量
How to use a for-loop to create a new variable based on differences between posixct for log files
我正在尝试遍历我拥有的日志文件数据集,以添加一个变量,其中为每次观察存储服务器会话号。对于第一行,我想创建一个值为 1 的新变量 'session number'。之后,如果 'ResearchNumber' 与之前的行不同,我想为下一行使用不同的会话号。如果相同'ResearchNumber',我想检查Posixct变量中的差异是否大于18000秒(或30分钟)。如果是这种情况,我想创建一个不同的会话号(通过将其增加 1)。在所有其他情况下,我希望会话号与上一行相同。总而言之,我想根据每个参与者超过 30 分钟的不活动时间来创建会话编号。
我已经尝试了几种方法,但我的代码似乎并没有遍历所有行,而且对于其他解决方案,时差的计算方式不正确。
希望有人能帮我解决这个问题。感谢所有帮助!
# create example data
ResearchNumber <- c("AL001","AL002","AL003")
DateTimeTag <- c(
as.POSIXct('2014-09-29 10:35:40', tz='GMT'),
as.POSIXct('2014-09-29 10:35:42', tz='GMT'),
as.POSIXct('2014-09-29 10:38:18', tz='GMT')
)
logdata <- data.frame(ResearchNumber, DateTimeTag)
# loop through logdata to add variable to every observation with a server session number
linecount <- 1
for (lines in logdata) {
if (linecount == 1) {
session_number <- 1
logdata$session_number <- session_number
datetime <- logdata$DateTimeTag
participantbefore <- logdata$ResearchNumber
linecount <- (linecount + 1)
}
else if (linecount > 1) {
difference <- (logdata$DateTimeTag - datetime)
if (logdata$ResearchNumber != participantbefore) {
logdata$session_number <- (session_number + 1)
participantbefore <- logdata$ResearchNumber
session_number <- (session_number + 1)
datetime <- logdata$DateTimeTag
}
else if (difference > 18000) {
logdata$session_number <- (session_number + 1)
participantbefore <- logdata$ResearchNumber
session_number <- (session_number + 1)
datetime <- logdata$DateTimeTag
}
else {
logdata$session_number <- (session_number)
participantbefore <- logdata$ResearchNumber
datetime <- logdata$DateTimeTag
}
}
}
你赢了我@docendo discimus!
这是一个 dplyr 解决方案。
library(tidyverse) # brings in dplyr library
# make better example data
ResearchNumber <- c("AL001","AL002","AL003", "AL003", "AL003")
DateTimeTag <- c(
as.POSIXct('2014-09-29 10:35:40', tz='GMT'),
as.POSIXct('2014-09-29 10:35:42', tz='GMT'),
as.POSIXct('2014-09-29 10:38:18', tz='GMT'),
as.POSIXct('2014-09-29 12:00:00', tz='GMT'),
as.POSIXct('2014-09-29 12:15:18', tz='GMT')
)
logdata <- data.frame(ResearchNumber, DateTimeTag)
logdata
logdata <- logdata %>%
arrange(ResearchNumber) %>%
group_by(ResearchNumber) %>%
mutate(difftime = difftime(DateTimeTag, lag(DateTimeTag), units = "mins"),
DiffSess = case_when(
is.na(difftime) ~ TRUE,
difftime > 30 ~ TRUE,
TRUE ~ FALSE)) %>%
ungroup() %>%
mutate(session_number = cumsum(DiffSess))
结果
ResearchNumber DateTimeTag session_number difftime DiffSess
<fct> <dttm> <int> <drtn> <lgl>
1 AL001 2014-09-29 10:35:40 1 NA mins TRUE
2 AL002 2014-09-29 10:35:42 2 NA mins TRUE
3 AL003 2014-09-29 10:38:18 3 NA mins TRUE
4 AL003 2014-09-29 12:00:00 4 81.7 mins TRUE
5 AL003 2014-09-29 12:15:18 4 15.3 mins FALSE
我正在尝试遍历我拥有的日志文件数据集,以添加一个变量,其中为每次观察存储服务器会话号。对于第一行,我想创建一个值为 1 的新变量 'session number'。之后,如果 'ResearchNumber' 与之前的行不同,我想为下一行使用不同的会话号。如果相同'ResearchNumber',我想检查Posixct变量中的差异是否大于18000秒(或30分钟)。如果是这种情况,我想创建一个不同的会话号(通过将其增加 1)。在所有其他情况下,我希望会话号与上一行相同。总而言之,我想根据每个参与者超过 30 分钟的不活动时间来创建会话编号。
我已经尝试了几种方法,但我的代码似乎并没有遍历所有行,而且对于其他解决方案,时差的计算方式不正确。
希望有人能帮我解决这个问题。感谢所有帮助!
# create example data
ResearchNumber <- c("AL001","AL002","AL003")
DateTimeTag <- c(
as.POSIXct('2014-09-29 10:35:40', tz='GMT'),
as.POSIXct('2014-09-29 10:35:42', tz='GMT'),
as.POSIXct('2014-09-29 10:38:18', tz='GMT')
)
logdata <- data.frame(ResearchNumber, DateTimeTag)
# loop through logdata to add variable to every observation with a server session number
linecount <- 1
for (lines in logdata) {
if (linecount == 1) {
session_number <- 1
logdata$session_number <- session_number
datetime <- logdata$DateTimeTag
participantbefore <- logdata$ResearchNumber
linecount <- (linecount + 1)
}
else if (linecount > 1) {
difference <- (logdata$DateTimeTag - datetime)
if (logdata$ResearchNumber != participantbefore) {
logdata$session_number <- (session_number + 1)
participantbefore <- logdata$ResearchNumber
session_number <- (session_number + 1)
datetime <- logdata$DateTimeTag
}
else if (difference > 18000) {
logdata$session_number <- (session_number + 1)
participantbefore <- logdata$ResearchNumber
session_number <- (session_number + 1)
datetime <- logdata$DateTimeTag
}
else {
logdata$session_number <- (session_number)
participantbefore <- logdata$ResearchNumber
datetime <- logdata$DateTimeTag
}
}
}
你赢了我@docendo discimus!
这是一个 dplyr 解决方案。
library(tidyverse) # brings in dplyr library
# make better example data
ResearchNumber <- c("AL001","AL002","AL003", "AL003", "AL003")
DateTimeTag <- c(
as.POSIXct('2014-09-29 10:35:40', tz='GMT'),
as.POSIXct('2014-09-29 10:35:42', tz='GMT'),
as.POSIXct('2014-09-29 10:38:18', tz='GMT'),
as.POSIXct('2014-09-29 12:00:00', tz='GMT'),
as.POSIXct('2014-09-29 12:15:18', tz='GMT')
)
logdata <- data.frame(ResearchNumber, DateTimeTag)
logdata
logdata <- logdata %>%
arrange(ResearchNumber) %>%
group_by(ResearchNumber) %>%
mutate(difftime = difftime(DateTimeTag, lag(DateTimeTag), units = "mins"),
DiffSess = case_when(
is.na(difftime) ~ TRUE,
difftime > 30 ~ TRUE,
TRUE ~ FALSE)) %>%
ungroup() %>%
mutate(session_number = cumsum(DiffSess))
结果
ResearchNumber DateTimeTag session_number difftime DiffSess
<fct> <dttm> <int> <drtn> <lgl>
1 AL001 2014-09-29 10:35:40 1 NA mins TRUE
2 AL002 2014-09-29 10:35:42 2 NA mins TRUE
3 AL003 2014-09-29 10:38:18 3 NA mins TRUE
4 AL003 2014-09-29 12:00:00 4 81.7 mins TRUE
5 AL003 2014-09-29 12:15:18 4 15.3 mins FALSE