组中每一行的 Dplyr 时间差
Dplyr time difference for each row in a group
我正在尝试计算我的电子邮件持续时间。我有电子邮件分组
ID。在此示例中,我已经将我的电子邮件按组 A 分组。
我想计算 A 组的电子邮件阅读持续时间。我当前使用的代码以秒为单位计算最后一次和第一次。
data <-rawdata %>%
group_by(ID) %>%
summarize(diff = difftime(last(as.POSIXct(Endtime, format ="%m/%d/%Y %I:%M:%S %p")),
first(as.POSIXct(Starttime, format = "%m/%d/%Y %I:%M:%S %p" )), units = "secs"))
但是,我认为这不是我的电子邮件阅读的准确显示。总的来说,我想要每一行的时间差以获得更准确的读数。所需的输出将是(下图),因为它揭示了每一行的时间差,允许我进一步对整个差异列求和以确定我的电子邮件持续时间(以秒为单位)。
Starttime Endtime ID diff
12/18/2019 4:06:59PM 12/18/2019 4:07:05 PM A 6 secs
12/18/2019 4:07:26PM 12/18/2019 4:07:28 PM A 1 secs
12/17/2019 6:48:06PM 12/17/2019 6:48:07PM A 1 sec
12/17/2019 6:25:16PM 12/17/2019 6:25:22PM A 6 secs
感谢任何帮助。我会继续研究这个!
如果你想要电子邮件阅读的开始和结束时间之间的差异,你可以这样做
library(dplyr)
rawdata %>%
mutate_at(vars(ends_with('time')), lubridate::mdy_hms) %>%
mutate(diff = difftime(Endtime, Starttime, units = "secs"))
# Starttime Endtime ID diff
#1 2019-12-18 16:06:59 2019-12-18 16:07:05 A 6 secs
#2 2019-12-18 16:07:26 2019-12-18 16:07:28 A 2 secs
#3 2019-12-17 18:48:06 2019-12-17 18:48:07 A 1 secs
#4 2019-12-17 18:25:16 2019-12-17 18:25:22 A 6 secs
或在基数 R 中:
transform(transform(rawdata,
Starttime = as.POSIXct(Starttime, format = "%m/%d/%Y %I:%M:%S %p"),
Endtime = as.POSIXct(Endtime, format = "%m/%d/%Y %I:%M:%S %p")),
diff = difftime(Endtime, Starttime, units = "secs"))
数据
rawdata <- structure(list(Starttime = structure(c(3L, 4L, 2L, 1L),
.Label = c("12/17/2019 6:25:16PM", "12/17/2019 6:48:06PM", "12/18/2019 4:06:59PM",
"12/18/2019 4:07:26PM"), class = "factor"), Endtime = structure(c(3L, 4L, 2L, 1L),
.Label = c("12/17/2019 6:25:22PM", "12/17/2019 6:48:07PM", "12/18/2019 4:07:05 PM",
"12/18/2019 4:07:28 PM"), class = "factor"), ID = structure(c(1L, 1L, 1L, 1L),
.Label = "A", class = "factor")), row.names = c(NA, -4L), class = "data.frame")
我正在尝试计算我的电子邮件持续时间。我有电子邮件分组 ID。在此示例中,我已经将我的电子邮件按组 A 分组。 我想计算 A 组的电子邮件阅读持续时间。我当前使用的代码以秒为单位计算最后一次和第一次。
data <-rawdata %>%
group_by(ID) %>%
summarize(diff = difftime(last(as.POSIXct(Endtime, format ="%m/%d/%Y %I:%M:%S %p")),
first(as.POSIXct(Starttime, format = "%m/%d/%Y %I:%M:%S %p" )), units = "secs"))
但是,我认为这不是我的电子邮件阅读的准确显示。总的来说,我想要每一行的时间差以获得更准确的读数。所需的输出将是(下图),因为它揭示了每一行的时间差,允许我进一步对整个差异列求和以确定我的电子邮件持续时间(以秒为单位)。
Starttime Endtime ID diff
12/18/2019 4:06:59PM 12/18/2019 4:07:05 PM A 6 secs
12/18/2019 4:07:26PM 12/18/2019 4:07:28 PM A 1 secs
12/17/2019 6:48:06PM 12/17/2019 6:48:07PM A 1 sec
12/17/2019 6:25:16PM 12/17/2019 6:25:22PM A 6 secs
感谢任何帮助。我会继续研究这个!
如果你想要电子邮件阅读的开始和结束时间之间的差异,你可以这样做
library(dplyr)
rawdata %>%
mutate_at(vars(ends_with('time')), lubridate::mdy_hms) %>%
mutate(diff = difftime(Endtime, Starttime, units = "secs"))
# Starttime Endtime ID diff
#1 2019-12-18 16:06:59 2019-12-18 16:07:05 A 6 secs
#2 2019-12-18 16:07:26 2019-12-18 16:07:28 A 2 secs
#3 2019-12-17 18:48:06 2019-12-17 18:48:07 A 1 secs
#4 2019-12-17 18:25:16 2019-12-17 18:25:22 A 6 secs
或在基数 R 中:
transform(transform(rawdata,
Starttime = as.POSIXct(Starttime, format = "%m/%d/%Y %I:%M:%S %p"),
Endtime = as.POSIXct(Endtime, format = "%m/%d/%Y %I:%M:%S %p")),
diff = difftime(Endtime, Starttime, units = "secs"))
数据
rawdata <- structure(list(Starttime = structure(c(3L, 4L, 2L, 1L),
.Label = c("12/17/2019 6:25:16PM", "12/17/2019 6:48:06PM", "12/18/2019 4:06:59PM",
"12/18/2019 4:07:26PM"), class = "factor"), Endtime = structure(c(3L, 4L, 2L, 1L),
.Label = c("12/17/2019 6:25:22PM", "12/17/2019 6:48:07PM", "12/18/2019 4:07:05 PM",
"12/18/2019 4:07:28 PM"), class = "factor"), ID = structure(c(1L, 1L, 1L, 1L),
.Label = "A", class = "factor")), row.names = c(NA, -4L), class = "data.frame")