使用日期时间列投射数据框
Casting a dataframe with datetime columns
我用 R 构建了一个数据框,从我的电子邮件中读取一些事件。
基本上,最终数据帧的结构如下:
'data.frame': 74 obs. of 7 variables:
$ process_name : Factor w/ 2 levels : 1 1 1 1 2 2 2 2 2 2 ...
$ job_code : chr "TRB1619825404" "TRB1619825404" "TRB1619825404" "TRB1619825404" ...
$ phase : Factor w/ 7 levels ,..: 4 4 6 6 4 5 7 1 3 2 ...
$ stage : Factor w/ 2 levels "End","Start": 2 1 2 1 2 1 2 2 2 2 ...
$ date : POSIXct, format: "2021-04-30 23:30:04" "2021-05-01 01:57:26" "2021-05-01 01:57:26" "2021-05-01 02:25:26" ...
$ execution_date: Date, format: "2021-04-30" "2021-05-01" "2021-05-01" "2021-05-01" ...
$ execution_time : 'hms' num 23:30:04 01:57:26 01:57:26 02:25:26 ...
..- attr(*, "units")= chr "secs"
每个事件都有一个关联的开始和结束日期和时间。
我想要做的(为了计算事件的持续时间)是将数据帧转换成这样:
'data.frame'
$ process_name
$ job_code
$ phase
$ start_date
$ end_date
$ duration
我尝试使用 dcast,但它使用默认聚合函数,我只想重塑数据框。
有什么想法吗?
首先,我猜你的数据是这样的:
dat <- structure(list(process_name = structure(c(1L, 1L, 1L, 1L), .Label = c("L01", "L02"), class = "factor"), job_code = c("TRB1619825404", "TRB1619825404", "TRB1619825404", "TRB1619825404"), phase = structure(c(3L, 3L, 5L, 5L), .Label = c("L02", "L03", "L04", "L05", "L06", "L07"), class = "factor"), stage = structure(c(2L, 1L, 2L, 1L), .Label = c("End", "Start" ), class = "factor"), date = structure(c(1619825404, 1619834246, 1619834246, 1619835926), class = c("POSIXct", "POSIXt"), tzone = "UTC"), execution_date = structure(c(18747, 18748, 18748, 18748), class = "Date"), execution_time = c("30:04", "01:57:26", "01:57:26", "02:25:26" )), row.names = c(NA, 4L), class = "data.frame")
(如果没有,请提供您的数据以供我们使用。)
从这里开始:
out <- reshape2::dcast(dat[,1:5], process_name + job_code + phase ~ stage, value.var = "date")
out[,c("End","Start")] <- lapply(out[,c("End","Start")], as.POSIXct, origin = "1970-01-01", tz = "UTC")
out$duration <- difftime(out$End, out$Start, units = "mins")
out
# process_name job_code phase End Start duration
# 1 L01 TRB1619825404 L04 2021-05-01 01:57:26 2021-04-30 23:30:04 147.3667 mins
# 2 L01 TRB1619825404 L06 2021-05-01 02:25:26 2021-05-01 01:57:26 28.0000 mins
我不得不重新as.POSIXct
它,因为dcast
将它们转换为numeric
(不知道如何解决)。
我用 R 构建了一个数据框,从我的电子邮件中读取一些事件。 基本上,最终数据帧的结构如下:
'data.frame': 74 obs. of 7 variables:
$ process_name : Factor w/ 2 levels : 1 1 1 1 2 2 2 2 2 2 ...
$ job_code : chr "TRB1619825404" "TRB1619825404" "TRB1619825404" "TRB1619825404" ...
$ phase : Factor w/ 7 levels ,..: 4 4 6 6 4 5 7 1 3 2 ...
$ stage : Factor w/ 2 levels "End","Start": 2 1 2 1 2 1 2 2 2 2 ...
$ date : POSIXct, format: "2021-04-30 23:30:04" "2021-05-01 01:57:26" "2021-05-01 01:57:26" "2021-05-01 02:25:26" ...
$ execution_date: Date, format: "2021-04-30" "2021-05-01" "2021-05-01" "2021-05-01" ...
$ execution_time : 'hms' num 23:30:04 01:57:26 01:57:26 02:25:26 ...
..- attr(*, "units")= chr "secs"
每个事件都有一个关联的开始和结束日期和时间。 我想要做的(为了计算事件的持续时间)是将数据帧转换成这样:
'data.frame'
$ process_name
$ job_code
$ phase
$ start_date
$ end_date
$ duration
我尝试使用 dcast,但它使用默认聚合函数,我只想重塑数据框。 有什么想法吗?
首先,我猜你的数据是这样的:
dat <- structure(list(process_name = structure(c(1L, 1L, 1L, 1L), .Label = c("L01", "L02"), class = "factor"), job_code = c("TRB1619825404", "TRB1619825404", "TRB1619825404", "TRB1619825404"), phase = structure(c(3L, 3L, 5L, 5L), .Label = c("L02", "L03", "L04", "L05", "L06", "L07"), class = "factor"), stage = structure(c(2L, 1L, 2L, 1L), .Label = c("End", "Start" ), class = "factor"), date = structure(c(1619825404, 1619834246, 1619834246, 1619835926), class = c("POSIXct", "POSIXt"), tzone = "UTC"), execution_date = structure(c(18747, 18748, 18748, 18748), class = "Date"), execution_time = c("30:04", "01:57:26", "01:57:26", "02:25:26" )), row.names = c(NA, 4L), class = "data.frame")
(如果没有,请提供您的数据以供我们使用。)
从这里开始:
out <- reshape2::dcast(dat[,1:5], process_name + job_code + phase ~ stage, value.var = "date")
out[,c("End","Start")] <- lapply(out[,c("End","Start")], as.POSIXct, origin = "1970-01-01", tz = "UTC")
out$duration <- difftime(out$End, out$Start, units = "mins")
out
# process_name job_code phase End Start duration
# 1 L01 TRB1619825404 L04 2021-05-01 01:57:26 2021-04-30 23:30:04 147.3667 mins
# 2 L01 TRB1619825404 L06 2021-05-01 02:25:26 2021-05-01 01:57:26 28.0000 mins
我不得不重新as.POSIXct
它,因为dcast
将它们转换为numeric
(不知道如何解决)。