折叠从 0 到 0 的行
Collapse rows from 0 to 0
对于这样的数据集
Incident.ID.. date product
INCFI0000029582 2014-09-25 08:39:45 foo
INCFI0000029582 2014-09-25 08:39:48 bar
INCFI0000029582 2014-09-25 08:40:44 foo
INCFI0000029582 2014-10-10 23:04:00 foo
INCFI0000029587 2014-09-25 08:33:32 bar
INCFI0000029587 2014-09-25 08:34:41 bar
INCFI0000029587 2014-09-25 08:35:24 bar
INCFI0000029587 2014-10-10 23:04:00 foo
df <- structure(list(Incident.ID.. = c("INCFI0000029582", "INCFI0000029582",
"INCFI0000029582", "INCFI0000029582", "INCFI0000029587", "INCFI0000029587",
"INCFI0000029587", "INCFI0000029587"), date = c("2014-09-25 08:39:45",
"2014-09-25 08:39:48", "2014-09-25 08:40:44", "2014-10-10 23:04:00",
"2014-09-25 08:33:32", "2014-09-25 08:34:41", "2014-09-25 08:35:24",
"2014-10-10 23:04:00"), product =
c("foo","bar","foo","foo","bar","bar","bar","foo")),
class = "data.frame", row.names = c(NA,
-8L))
我正在使用 mutate 函数按 id 计算滚动时间差,如下所示
library(dplyr)
library(lubridate)
df1 <- df %>%
group_by(Incident.ID..) %>%
mutate(diff = c(0, diff(ymd_hms(date))))
这会创建一个列 diff
,如下所示
Incident.ID.. date product diff
INCFI0000029582 2014-09-25 08:39:45 foo 0
INCFI0000029582 2014-09-25 08:39:48 bar 3
INCFI0000029582 2014-09-25 08:40:44 foo 56
INCFI0000029582 2014-10-10 23:04:00 foo 1347796
INCFI0000029587 2014-09-25 08:33:32 bar 0
INCFI0000029587 2014-09-25 08:34:41 bar 69
INCFI0000029587 2014-09-25 08:35:24 bar 43
INCFI0000029587 2014-10-10 23:04:00 foo 1348116
现在我的目标是aggregate/collapse行从零到零,预期的最终数据集是这样的
Incident.ID.. DateMin DateMax product
INCFI0000029582 2014-09-25 08:39:45 2014-10-10 23:04:00 foo,bar,foo,foo
INCFI0000029587 2014-09-25 08:33:32 2014-10-10 23:04:00 bar,bar,bar,foo
我不确定如何折叠带有最小和最大日期列的行,我需要帮助。提前致谢。
group_by
属性保留在mutate
之后,所以我们summarise
通过分组得到min
,max
的'date' 并通过 paste
将元素合并在一起来折叠 'product'(toString
是 paste(., collapse=", ")
的方便包装器)
df %>%
group_by(Incident.ID..) %>%
mutate(diff = c(0, diff(ymd_hms(date)))) %>%
summarise(DateMin = min(date),
DateMax = max(date),
product = toString(product))
# A tibble: 2 x 4
# Incident.ID.. DateMin DateMax product
# <chr> <chr> <chr> <chr>
#1 INCFI0000029582 2014-09-25 08:39:45 2014-10-10 23:04:00 foo, bar, foo, foo
#2 INCFI0000029587 2014-09-25 08:33:32 2014-10-10 23:04:00 bar, bar, bar, foo
对于这样的数据集
Incident.ID.. date product
INCFI0000029582 2014-09-25 08:39:45 foo
INCFI0000029582 2014-09-25 08:39:48 bar
INCFI0000029582 2014-09-25 08:40:44 foo
INCFI0000029582 2014-10-10 23:04:00 foo
INCFI0000029587 2014-09-25 08:33:32 bar
INCFI0000029587 2014-09-25 08:34:41 bar
INCFI0000029587 2014-09-25 08:35:24 bar
INCFI0000029587 2014-10-10 23:04:00 foo
df <- structure(list(Incident.ID.. = c("INCFI0000029582", "INCFI0000029582",
"INCFI0000029582", "INCFI0000029582", "INCFI0000029587", "INCFI0000029587",
"INCFI0000029587", "INCFI0000029587"), date = c("2014-09-25 08:39:45",
"2014-09-25 08:39:48", "2014-09-25 08:40:44", "2014-10-10 23:04:00",
"2014-09-25 08:33:32", "2014-09-25 08:34:41", "2014-09-25 08:35:24",
"2014-10-10 23:04:00"), product =
c("foo","bar","foo","foo","bar","bar","bar","foo")),
class = "data.frame", row.names = c(NA,
-8L))
我正在使用 mutate 函数按 id 计算滚动时间差,如下所示
library(dplyr)
library(lubridate)
df1 <- df %>%
group_by(Incident.ID..) %>%
mutate(diff = c(0, diff(ymd_hms(date))))
这会创建一个列 diff
,如下所示
Incident.ID.. date product diff
INCFI0000029582 2014-09-25 08:39:45 foo 0
INCFI0000029582 2014-09-25 08:39:48 bar 3
INCFI0000029582 2014-09-25 08:40:44 foo 56
INCFI0000029582 2014-10-10 23:04:00 foo 1347796
INCFI0000029587 2014-09-25 08:33:32 bar 0
INCFI0000029587 2014-09-25 08:34:41 bar 69
INCFI0000029587 2014-09-25 08:35:24 bar 43
INCFI0000029587 2014-10-10 23:04:00 foo 1348116
现在我的目标是aggregate/collapse行从零到零,预期的最终数据集是这样的
Incident.ID.. DateMin DateMax product
INCFI0000029582 2014-09-25 08:39:45 2014-10-10 23:04:00 foo,bar,foo,foo
INCFI0000029587 2014-09-25 08:33:32 2014-10-10 23:04:00 bar,bar,bar,foo
我不确定如何折叠带有最小和最大日期列的行,我需要帮助。提前致谢。
group_by
属性保留在mutate
之后,所以我们summarise
通过分组得到min
,max
的'date' 并通过 paste
将元素合并在一起来折叠 'product'(toString
是 paste(., collapse=", ")
的方便包装器)
df %>%
group_by(Incident.ID..) %>%
mutate(diff = c(0, diff(ymd_hms(date)))) %>%
summarise(DateMin = min(date),
DateMax = max(date),
product = toString(product))
# A tibble: 2 x 4
# Incident.ID.. DateMin DateMax product
# <chr> <chr> <chr> <chr>
#1 INCFI0000029582 2014-09-25 08:39:45 2014-10-10 23:04:00 foo, bar, foo, foo
#2 INCFI0000029587 2014-09-25 08:33:32 2014-10-10 23:04:00 bar, bar, bar, foo