将 dcast.data.table 与日期值和聚合一起使用
Using dcast.data.table with date values and aggregation
正在尝试解决这个问题。假设你有一个 data.table:
dt <- data.table (person=c('bob', 'bob', 'bob'),
door=c('front door', 'front door', 'front door'),
type=c('timeIn', 'timeIn', 'timeOut'),
time=c(
as.POSIXct('2016 12 02 06 05 01', format = '%Y %m %d %H %M %S'),
as.POSIXct('2016 12 02 06 05 02', format = '%Y %m %d %H %M %S'),
as.POSIXct('2016 12 02 06 05 03', format = '%Y %m %d %H %M %S') )
)
我想把它旋转成这样
person door timeIn timeOut
bob front door min(<date/time>) max(<date/time>)
我似乎无法获得 dcast 的正确语法。data.table。我试过了
dcast.data.table(
dt, person + door ~ type,
value.var = 'time',
fun.aggregate = function(x) ifelse(type == 'timeIn', min(x), max(x))
)
这会引发错误:
Aggregating function(s) should take vector inputs and return a single value (length=1).
我也试过:
dcast.data.table(dt, person + door ~ type, value.var = 'time')
但是结果把我的日期丢掉了
person door timeIn timeOut
1: bob front door 2 1
如有任何建议,我们将不胜感激。 TIA
这将是实现您的目标的一种方式。我修改了您的 dt
并创建了以下数据集。对于每个人,我寻找了 timeIn
的最小时间和 timeOut
的最大时间。然后,我将 dcast()
应用于结果。
# person door type time
#1: bob front door timeIn 2016-12-02 06:05:01
#2: bob front door timeIn 2016-12-02 06:05:02
#3: bob front door timeOut 2016-12-02 06:05:03
#4: bob front door timeOut 2016-12-02 06:05:05
#5: ana front door timeIn 2016-12-02 07:06:01
#6: ana front door timeIn 2016-12-02 07:06:02
#7: ana front door timeOut 2016-12-02 07:06:03
#8: ana front door timeOut 2016-12-02 07:06:05
library(data.table)
dcast(
dt[, .SD[(type == "timeIn" & time == min(time))|(type == "timeOut" & time == max(time))], by = person],
person + door ~ type)
# person door timeIn timeOut
#1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05
#2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05
数据
dt <- structure(list(person = c("bob", "bob", "bob", "bob", "ana",
"ana", "ana", "ana"), door = c("front door", "front door", "front door",
"front door", "front door", "front door", "front door", "front door"
), type = c("timeIn", "timeIn", "timeOut", "timeOut", "timeIn",
"timeIn", "timeOut", "timeOut"), time = structure(c(1480658701,
1480658702, 1480658703, 1480658705, 1480662361, 1480662362, 1480662363,
1480662365), class = c("POSIXct", "POSIXt"))), .Names = c("person",
"door", "type", "time"), row.names = c(NA, -8L), class = c("data.table",
"data.frame"))
有多种方法可以使用 dcast
获得所需的结果。 jazzurro 的解决方案在重塑结果之前进行聚合。这里的方法直接使用 dcast
但可能需要一些 post 处理。我们正在使用 jazzurro 的数据,这些数据经过调整以遵守 UTC
时区和 data.table
.
的 CRAN 版本 1.10.0
1。让 ifelse
开始工作
如 Q 中所述,
dcast(
dt, person + door ~ type,
value.var = 'time',
fun.aggregate = function(x) ifelse(type == 'timeIn', min(x), max(x))
)
returns 错误信息。错误消息的全文包括使用 fill
参数的提示。不幸的是,ifelse()
不遵守 POSIXct
class(有关详细信息,请参阅 ?ifelse
),因此需要强制执行。
与
dcast(
dt, person + door ~ type,
value.var = 'time',
fun.aggregate = function(x)
lubridate::as_datetime(ifelse(type == 'timeIn', min(x), max(x))),
fill = 0
)
我们确实得到了
# person door timeIn timeOut
#1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05
#2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05
2。替代 ifelse
ifelse
的帮助页面建议
(tmp <- yes; tmp[!test] <- no[!test]; tmp)
作为备选方案。按照这个建议,
dcast(
dt, person + door ~ type,
value.var = 'time',
fun.aggregate = function(x) {
test <- type == "timeIn"; tmp <- min(x); tmp[!test] = max(x)[!test]; tmp
}
)
returns
# person door timeIn timeOut
#1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05
#2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05
请注意,fill
参数和 POSIXct
的强制都不需要。
3。使用增强 dcast
使用最新版本的dcast.data.table
,我们可以向fun.aggregate
提供函数列表:
dcast(dt, person + door ~ type, value.var = 'time', fun = list(min, max))
returns
# person door time_min_timeIn time_min_timeOut time_max_timeIn time_max_timeOut
#1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:03 2016-12-02 07:06:02 2016-12-02 07:06:05
#2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:03 2016-12-02 06:05:02 2016-12-02 06:05:05
我们可以通过
删除不需要的列并重命名其他列
dcast(dt, person + door ~ type, value.var = 'time', fun = list(min, max))[
, .(person, door, timeIn = time_min_timeIn, timeOut = time_max_timeOut)]
这让我们
# person door timeIn timeOut
#1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05
#2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05
数据
如上所述,我们使用的是jazzurro的数据
dt <- structure(list(person = c("bob", "bob", "bob", "bob", "ana",
"ana", "ana", "ana"), door = c("front door", "front door", "front door",
"front door", "front door", "front door", "front door", "front door"
), type = c("timeIn", "timeIn", "timeOut", "timeOut", "timeIn",
"timeIn", "timeOut", "timeOut"), time = structure(c(1480658701,
1480658702, 1480658703, 1480658705, 1480662361, 1480662362, 1480662363,
1480662365), class = c("POSIXct", "POSIXt"))), .Names = c("person",
"door", "type", "time"), row.names = c(NA, -8L), class = c("data.table",
"data.frame"))
但将时区强制为 UTC
。
有
dt[, time := lubridate::with_tz(time, "UTC")]
我们有
dt
# person door type time
#1: bob front door timeIn 2016-12-02 06:05:01
#2: bob front door timeIn 2016-12-02 06:05:02
#3: bob front door timeOut 2016-12-02 06:05:03
#4: bob front door timeOut 2016-12-02 06:05:05
#5: ana front door timeIn 2016-12-02 07:06:01
#6: ana front door timeIn 2016-12-02 07:06:02
#7: ana front door timeOut 2016-12-02 07:06:03
#8: ana front door timeOut 2016-12-02 07:06:05
独立于本地时区。
正在尝试解决这个问题。假设你有一个 data.table:
dt <- data.table (person=c('bob', 'bob', 'bob'),
door=c('front door', 'front door', 'front door'),
type=c('timeIn', 'timeIn', 'timeOut'),
time=c(
as.POSIXct('2016 12 02 06 05 01', format = '%Y %m %d %H %M %S'),
as.POSIXct('2016 12 02 06 05 02', format = '%Y %m %d %H %M %S'),
as.POSIXct('2016 12 02 06 05 03', format = '%Y %m %d %H %M %S') )
)
我想把它旋转成这样
person door timeIn timeOut
bob front door min(<date/time>) max(<date/time>)
我似乎无法获得 dcast 的正确语法。data.table。我试过了
dcast.data.table(
dt, person + door ~ type,
value.var = 'time',
fun.aggregate = function(x) ifelse(type == 'timeIn', min(x), max(x))
)
这会引发错误:
Aggregating function(s) should take vector inputs and return a single value (length=1).
我也试过:
dcast.data.table(dt, person + door ~ type, value.var = 'time')
但是结果把我的日期丢掉了
person door timeIn timeOut
1: bob front door 2 1
如有任何建议,我们将不胜感激。 TIA
这将是实现您的目标的一种方式。我修改了您的 dt
并创建了以下数据集。对于每个人,我寻找了 timeIn
的最小时间和 timeOut
的最大时间。然后,我将 dcast()
应用于结果。
# person door type time
#1: bob front door timeIn 2016-12-02 06:05:01
#2: bob front door timeIn 2016-12-02 06:05:02
#3: bob front door timeOut 2016-12-02 06:05:03
#4: bob front door timeOut 2016-12-02 06:05:05
#5: ana front door timeIn 2016-12-02 07:06:01
#6: ana front door timeIn 2016-12-02 07:06:02
#7: ana front door timeOut 2016-12-02 07:06:03
#8: ana front door timeOut 2016-12-02 07:06:05
library(data.table)
dcast(
dt[, .SD[(type == "timeIn" & time == min(time))|(type == "timeOut" & time == max(time))], by = person],
person + door ~ type)
# person door timeIn timeOut
#1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05
#2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05
数据
dt <- structure(list(person = c("bob", "bob", "bob", "bob", "ana",
"ana", "ana", "ana"), door = c("front door", "front door", "front door",
"front door", "front door", "front door", "front door", "front door"
), type = c("timeIn", "timeIn", "timeOut", "timeOut", "timeIn",
"timeIn", "timeOut", "timeOut"), time = structure(c(1480658701,
1480658702, 1480658703, 1480658705, 1480662361, 1480662362, 1480662363,
1480662365), class = c("POSIXct", "POSIXt"))), .Names = c("person",
"door", "type", "time"), row.names = c(NA, -8L), class = c("data.table",
"data.frame"))
有多种方法可以使用 dcast
获得所需的结果。 jazzurro 的解决方案在重塑结果之前进行聚合。这里的方法直接使用 dcast
但可能需要一些 post 处理。我们正在使用 jazzurro 的数据,这些数据经过调整以遵守 UTC
时区和 data.table
.
1。让 ifelse
开始工作
如 Q 中所述,
dcast(
dt, person + door ~ type,
value.var = 'time',
fun.aggregate = function(x) ifelse(type == 'timeIn', min(x), max(x))
)
returns 错误信息。错误消息的全文包括使用 fill
参数的提示。不幸的是,ifelse()
不遵守 POSIXct
class(有关详细信息,请参阅 ?ifelse
),因此需要强制执行。
与
dcast(
dt, person + door ~ type,
value.var = 'time',
fun.aggregate = function(x)
lubridate::as_datetime(ifelse(type == 'timeIn', min(x), max(x))),
fill = 0
)
我们确实得到了
# person door timeIn timeOut
#1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05
#2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05
2。替代 ifelse
ifelse
的帮助页面建议
(tmp <- yes; tmp[!test] <- no[!test]; tmp)
作为备选方案。按照这个建议,
dcast(
dt, person + door ~ type,
value.var = 'time',
fun.aggregate = function(x) {
test <- type == "timeIn"; tmp <- min(x); tmp[!test] = max(x)[!test]; tmp
}
)
returns
# person door timeIn timeOut
#1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05
#2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05
请注意,fill
参数和 POSIXct
的强制都不需要。
3。使用增强 dcast
使用最新版本的dcast.data.table
,我们可以向fun.aggregate
提供函数列表:
dcast(dt, person + door ~ type, value.var = 'time', fun = list(min, max))
returns
# person door time_min_timeIn time_min_timeOut time_max_timeIn time_max_timeOut
#1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:03 2016-12-02 07:06:02 2016-12-02 07:06:05
#2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:03 2016-12-02 06:05:02 2016-12-02 06:05:05
我们可以通过
删除不需要的列并重命名其他列dcast(dt, person + door ~ type, value.var = 'time', fun = list(min, max))[
, .(person, door, timeIn = time_min_timeIn, timeOut = time_max_timeOut)]
这让我们
# person door timeIn timeOut
#1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05
#2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05
数据
如上所述,我们使用的是jazzurro的数据
dt <- structure(list(person = c("bob", "bob", "bob", "bob", "ana",
"ana", "ana", "ana"), door = c("front door", "front door", "front door",
"front door", "front door", "front door", "front door", "front door"
), type = c("timeIn", "timeIn", "timeOut", "timeOut", "timeIn",
"timeIn", "timeOut", "timeOut"), time = structure(c(1480658701,
1480658702, 1480658703, 1480658705, 1480662361, 1480662362, 1480662363,
1480662365), class = c("POSIXct", "POSIXt"))), .Names = c("person",
"door", "type", "time"), row.names = c(NA, -8L), class = c("data.table",
"data.frame"))
但将时区强制为 UTC
。
有
dt[, time := lubridate::with_tz(time, "UTC")]
我们有
dt
# person door type time
#1: bob front door timeIn 2016-12-02 06:05:01
#2: bob front door timeIn 2016-12-02 06:05:02
#3: bob front door timeOut 2016-12-02 06:05:03
#4: bob front door timeOut 2016-12-02 06:05:05
#5: ana front door timeIn 2016-12-02 07:06:01
#6: ana front door timeIn 2016-12-02 07:06:02
#7: ana front door timeOut 2016-12-02 07:06:03
#8: ana front door timeOut 2016-12-02 07:06:05
独立于本地时区。