do() 被取代了!替代方法是使用 across()、nest_by() 和总结,如何?
do() superseded! Alternative is to use across(), nest_by(), and summarise, how?
我正在做一些很简单的事情。给定特定时间段的开始日期和结束日期的数据框,我想 expand/create 每个时间段的完整序列(每行都有一个因子),然后将其输出到一个大数据框中。
例如:
library(tidyverse)
library(lubridate)
# Dataset
start_dates = ymd_hms(c("2019-05-08 00:00:00",
"2020-01-17 00:00:00",
"2020-03-03 00:00:00",
"2020-05-28 00:00:00",
"2020-12-10 00:00:00",
"2021-05-07 00:00:00",
"2022-01-04 00:00:00"), tz = "UTC")
end_dates = ymd_hms(c( "2019-10-24 00:00:00",
"2020-03-03 00:00:00",
"2020-05-28 00:00:00",
"2020-12-10 00:00:00",
"2021-05-07 00:00:00",
"2022-01-04 00:00:00",
"2022-01-19 00:00:00"), tz = "UTC")
df1 = data.frame(studying = paste0("period",seq(1:7),sep = ""),start_dates,end_dates)
有人建议我使用 do(),它目前工作正常,但我讨厌它被取代。我也有一种使用 map2 的方法。但是阅读文件(https://dplyr.tidyverse.org/reference/do.html)建议您可以使用 nest_by()、across() 和 summarise() 来完成与 do() 相同的工作,我将如何获得相同的结果?我已经尝试了很多东西,但我似乎无法得到它。
# do() way to do it
df1 %>%
group_by(studying) %>%
do(data.frame(week=seq(.$start_dates,.$end_dates,by="1 week")))
# transmute() way to do it
df1 %>%
transmute(weeks = map2(start_dates,end_dates, seq, by = "1 week"), studying)
%>% unnest(cols = c(weeks))
不确定这是否正是您要找的,但这是我对 rowwise
和 unnest
的尝试
df1 %>%
rowwise() %>%
mutate(week = list(seq(start_dates, end_dates, by = "1 week"))) %>%
select(studying, week) %>%
unnest(cols = c(week))
你也可以使用tidyr::complete
:
df1 %>%
group_by(studying) %>%
complete(start_dates = seq(from = start_dates, to = end_dates, by = "1 week")) %>%
select(-end_dates, weeks = start_dates)
# A tibble: 134 x 2
# Groups: studying [7]
studying weeks
<chr> <dttm>
1 period1 2019-05-08 00:00:00
2 period1 2019-05-15 00:00:00
3 period1 2019-05-22 00:00:00
4 period1 2019-05-29 00:00:00
5 period1 2019-06-05 00:00:00
6 period1 2019-06-12 00:00:00
7 period1 2019-06-19 00:00:00
8 period1 2019-06-26 00:00:00
9 period1 2019-07-03 00:00:00
10 period1 2019-07-10 00:00:00
# ... with 124 more rows
正如 ?do
的文档所建议的,我们现在可以使用 summarise
并将 .
替换为 across()
:
library(tidyverse)
library(lubridate)
df1 %>%
group_by(studying) %>%
summarise(week = seq(across()$start_dates,
across()$end_dates,
by = "1 week"))
#> `summarise()` has grouped output by 'studying'. You can override using the
#> `.groups` argument.
#> # A tibble: 134 x 2
#> # Groups: studying [7]
#> studying week
#> <chr> <dttm>
#> 1 period1 2019-05-08 00:00:00
#> 2 period1 2019-05-15 00:00:00
#> 3 period1 2019-05-22 00:00:00
#> 4 period1 2019-05-29 00:00:00
#> 5 period1 2019-06-05 00:00:00
#> 6 period1 2019-06-12 00:00:00
#> 7 period1 2019-06-19 00:00:00
#> 8 period1 2019-06-26 00:00:00
#> 9 period1 2019-07-03 00:00:00
#> 10 period1 2019-07-10 00:00:00
#> # … with 124 more rows
由 reprex package (v0.3.0)
于 2022 年 1 月 19 日创建
虽然标记为 实验性 group_modify
的帮助文件确实说
‘group_modify()’ is an evolution of ‘do()’
事实上,问题中使用 group_modify
的示例代码与 do
几乎相同。
# with group_modify
df2 <- df1 %>%
group_by(studying) %>%
group_modify(~ data.frame(week = seq(.$start_dates, .$end_dates, by = "1 week")))
# with do
df0 <- df1 %>%
group_by(studying) %>%
do(data.frame(week = seq(.$start_dates, .$end_dates, by = "1 week")))
identical(df2, df0)
## [1] TRUE
另一种方法:
library(tidyverse)
df1 %>%
group_by(studying) %>%
summarise(df = tibble(weeks = seq(start_dates, end_dates, by = 'week'))) %>%
unnest(df)
#> `summarise()` has grouped output by 'studying'. You can override using the `.groups` argument.
#> # A tibble: 134 × 2
#> # Groups: studying [7]
#> studying weeks
#> <chr> <dttm>
#> 1 period1 2019-05-08 00:00:00
#> 2 period1 2019-05-15 00:00:00
#> 3 period1 2019-05-22 00:00:00
#> 4 period1 2019-05-29 00:00:00
#> 5 period1 2019-06-05 00:00:00
#> 6 period1 2019-06-12 00:00:00
#> 7 period1 2019-06-19 00:00:00
#> 8 period1 2019-06-26 00:00:00
#> 9 period1 2019-07-03 00:00:00
#> 10 period1 2019-07-10 00:00:00
#> # … with 124 more rows
由 reprex package (v2.0.1)
创建于 2022-01-20
我正在做一些很简单的事情。给定特定时间段的开始日期和结束日期的数据框,我想 expand/create 每个时间段的完整序列(每行都有一个因子),然后将其输出到一个大数据框中。
例如:
library(tidyverse)
library(lubridate)
# Dataset
start_dates = ymd_hms(c("2019-05-08 00:00:00",
"2020-01-17 00:00:00",
"2020-03-03 00:00:00",
"2020-05-28 00:00:00",
"2020-12-10 00:00:00",
"2021-05-07 00:00:00",
"2022-01-04 00:00:00"), tz = "UTC")
end_dates = ymd_hms(c( "2019-10-24 00:00:00",
"2020-03-03 00:00:00",
"2020-05-28 00:00:00",
"2020-12-10 00:00:00",
"2021-05-07 00:00:00",
"2022-01-04 00:00:00",
"2022-01-19 00:00:00"), tz = "UTC")
df1 = data.frame(studying = paste0("period",seq(1:7),sep = ""),start_dates,end_dates)
有人建议我使用 do(),它目前工作正常,但我讨厌它被取代。我也有一种使用 map2 的方法。但是阅读文件(https://dplyr.tidyverse.org/reference/do.html)建议您可以使用 nest_by()、across() 和 summarise() 来完成与 do() 相同的工作,我将如何获得相同的结果?我已经尝试了很多东西,但我似乎无法得到它。
# do() way to do it
df1 %>%
group_by(studying) %>%
do(data.frame(week=seq(.$start_dates,.$end_dates,by="1 week")))
# transmute() way to do it
df1 %>%
transmute(weeks = map2(start_dates,end_dates, seq, by = "1 week"), studying)
%>% unnest(cols = c(weeks))
不确定这是否正是您要找的,但这是我对 rowwise
和 unnest
df1 %>%
rowwise() %>%
mutate(week = list(seq(start_dates, end_dates, by = "1 week"))) %>%
select(studying, week) %>%
unnest(cols = c(week))
你也可以使用tidyr::complete
:
df1 %>%
group_by(studying) %>%
complete(start_dates = seq(from = start_dates, to = end_dates, by = "1 week")) %>%
select(-end_dates, weeks = start_dates)
# A tibble: 134 x 2
# Groups: studying [7]
studying weeks
<chr> <dttm>
1 period1 2019-05-08 00:00:00
2 period1 2019-05-15 00:00:00
3 period1 2019-05-22 00:00:00
4 period1 2019-05-29 00:00:00
5 period1 2019-06-05 00:00:00
6 period1 2019-06-12 00:00:00
7 period1 2019-06-19 00:00:00
8 period1 2019-06-26 00:00:00
9 period1 2019-07-03 00:00:00
10 period1 2019-07-10 00:00:00
# ... with 124 more rows
正如 ?do
的文档所建议的,我们现在可以使用 summarise
并将 .
替换为 across()
:
library(tidyverse)
library(lubridate)
df1 %>%
group_by(studying) %>%
summarise(week = seq(across()$start_dates,
across()$end_dates,
by = "1 week"))
#> `summarise()` has grouped output by 'studying'. You can override using the
#> `.groups` argument.
#> # A tibble: 134 x 2
#> # Groups: studying [7]
#> studying week
#> <chr> <dttm>
#> 1 period1 2019-05-08 00:00:00
#> 2 period1 2019-05-15 00:00:00
#> 3 period1 2019-05-22 00:00:00
#> 4 period1 2019-05-29 00:00:00
#> 5 period1 2019-06-05 00:00:00
#> 6 period1 2019-06-12 00:00:00
#> 7 period1 2019-06-19 00:00:00
#> 8 period1 2019-06-26 00:00:00
#> 9 period1 2019-07-03 00:00:00
#> 10 period1 2019-07-10 00:00:00
#> # … with 124 more rows
由 reprex package (v0.3.0)
于 2022 年 1 月 19 日创建虽然标记为 实验性 group_modify
的帮助文件确实说
‘group_modify()’ is an evolution of ‘do()’
事实上,问题中使用 group_modify
的示例代码与 do
几乎相同。
# with group_modify
df2 <- df1 %>%
group_by(studying) %>%
group_modify(~ data.frame(week = seq(.$start_dates, .$end_dates, by = "1 week")))
# with do
df0 <- df1 %>%
group_by(studying) %>%
do(data.frame(week = seq(.$start_dates, .$end_dates, by = "1 week")))
identical(df2, df0)
## [1] TRUE
另一种方法:
library(tidyverse)
df1 %>%
group_by(studying) %>%
summarise(df = tibble(weeks = seq(start_dates, end_dates, by = 'week'))) %>%
unnest(df)
#> `summarise()` has grouped output by 'studying'. You can override using the `.groups` argument.
#> # A tibble: 134 × 2
#> # Groups: studying [7]
#> studying weeks
#> <chr> <dttm>
#> 1 period1 2019-05-08 00:00:00
#> 2 period1 2019-05-15 00:00:00
#> 3 period1 2019-05-22 00:00:00
#> 4 period1 2019-05-29 00:00:00
#> 5 period1 2019-06-05 00:00:00
#> 6 period1 2019-06-12 00:00:00
#> 7 period1 2019-06-19 00:00:00
#> 8 period1 2019-06-26 00:00:00
#> 9 period1 2019-07-03 00:00:00
#> 10 period1 2019-07-10 00:00:00
#> # … with 124 more rows
由 reprex package (v2.0.1)
创建于 2022-01-20