在 R 中的分组日期内填充
Pad within grouped dates in R
library(tidyverse)
library(lubridate)
library(padr)
df <- tibble(`Action Item ID` = c("ABC", "DEF", "GHI", "JKL", "MNO", "PQR"),
`Date Created` = as.Date(c("2019-01-01", "2019-01-01",
"2019-06-01", "2019-06-01",
"2019-08-01", "2019-08-01")),
`Date Closed` = as.Date(c("2019-01-15", "2019-05-31",
"2019-06-15", "2019-07-05",
"2019-08-15", NA)),
`Current Status` = c(rep("Closed", 5), "Open")) %>%
pivot_longer(-c(`Action Item ID`, `Current Status`),
names_to = "Type",
values_to = "Date")
#> # A tibble: 12 x 4
#> `Action Item ID` `Current Status` Type Date
#> <chr> <chr> <chr> <date>
#> 1 ABC Closed Date Created 2019-01-01
#> 2 ABC Closed Date Closed 2019-01-15
#> 3 DEF Closed Date Created 2019-01-01
#> 4 DEF Closed Date Closed 2019-05-31
#> 5 GHI Closed Date Created 2019-06-01
#> 6 GHI Closed Date Closed 2019-06-15
#> 7 JKL Closed Date Created 2019-06-01
#> 8 JKL Closed Date Closed 2019-07-05
#> 9 MNO Closed Date Created 2019-08-01
#> 10 MNO Closed Date Closed 2019-08-15
#> 11 PQR Open Date Created 2019-08-01
#> 12 PQR Open Date Closed NA
我的数据框在上面,我正在尝试使用 padr R 包填充每个组中的日期。
df %>% group_by(`Action Item ID`) %>% pad()
#> Error: Not all grouping variables are column names of x.
这个错误对我来说意义不大。我正在寻找如下所示的输出:
#> # A tibble: ? x 4
#> `Action Item ID` `Current Status` Type Date
#> <chr> <chr> <chr> <date>
#> ABC Closed Date Created 2019-01-01
#> ABC NA NA 2019-01-02
#> ABC NA NA 2019-01-03
#> ... ... ... ...
#> ABC Closed Date Closed 2019-01-15
#> DEF Closed Date Created 2019-01-01
#> DEF NA NA 2019-01-02
#> ... ... ... ...
#> DEF NA NA 2019-05-30
#> DEF Closed Date Closed 2019-05-31
#> GHI Closed Date Created 2019-06-01
#> ... ... ... ...
有人知道哪里出了问题吗?
根据?pad
,有一个group
参数
group - Optional character vector that specifies the grouping variable(s). Padding will take place within the different groups. When interval is not specified, it will be determined applying get_interval on the datetime variable as a whole, ignoring groups (see last example).
所以,最好利用那个参数
library(dplyr)
library(padr)
df %>%
pad(group = "Action Item ID")
# A tibble: 233 x 4
# `Action Item ID` `Current Status` Type Date
# <chr> <chr> <chr> <date>
# 1 ABC Closed Date Created 2019-01-01
# 2 ABC <NA> <NA> 2019-01-02
# 3 ABC <NA> <NA> 2019-01-03
# 4 ABC <NA> <NA> 2019-01-04
# 5 ABC <NA> <NA> 2019-01-05
# 6 ABC <NA> <NA> 2019-01-06
# 7 ABC <NA> <NA> 2019-01-07
# 8 ABC <NA> <NA> 2019-01-08
# 9 ABC <NA> <NA> 2019-01-09
#10 ABC <NA> <NA> 2019-01-10
# … with 223 more rows
library(tidyverse)
library(lubridate)
library(padr)
df <- tibble(`Action Item ID` = c("ABC", "DEF", "GHI", "JKL", "MNO", "PQR"),
`Date Created` = as.Date(c("2019-01-01", "2019-01-01",
"2019-06-01", "2019-06-01",
"2019-08-01", "2019-08-01")),
`Date Closed` = as.Date(c("2019-01-15", "2019-05-31",
"2019-06-15", "2019-07-05",
"2019-08-15", NA)),
`Current Status` = c(rep("Closed", 5), "Open")) %>%
pivot_longer(-c(`Action Item ID`, `Current Status`),
names_to = "Type",
values_to = "Date")
#> # A tibble: 12 x 4
#> `Action Item ID` `Current Status` Type Date
#> <chr> <chr> <chr> <date>
#> 1 ABC Closed Date Created 2019-01-01
#> 2 ABC Closed Date Closed 2019-01-15
#> 3 DEF Closed Date Created 2019-01-01
#> 4 DEF Closed Date Closed 2019-05-31
#> 5 GHI Closed Date Created 2019-06-01
#> 6 GHI Closed Date Closed 2019-06-15
#> 7 JKL Closed Date Created 2019-06-01
#> 8 JKL Closed Date Closed 2019-07-05
#> 9 MNO Closed Date Created 2019-08-01
#> 10 MNO Closed Date Closed 2019-08-15
#> 11 PQR Open Date Created 2019-08-01
#> 12 PQR Open Date Closed NA
我的数据框在上面,我正在尝试使用 padr R 包填充每个组中的日期。
df %>% group_by(`Action Item ID`) %>% pad()
#> Error: Not all grouping variables are column names of x.
这个错误对我来说意义不大。我正在寻找如下所示的输出:
#> # A tibble: ? x 4
#> `Action Item ID` `Current Status` Type Date
#> <chr> <chr> <chr> <date>
#> ABC Closed Date Created 2019-01-01
#> ABC NA NA 2019-01-02
#> ABC NA NA 2019-01-03
#> ... ... ... ...
#> ABC Closed Date Closed 2019-01-15
#> DEF Closed Date Created 2019-01-01
#> DEF NA NA 2019-01-02
#> ... ... ... ...
#> DEF NA NA 2019-05-30
#> DEF Closed Date Closed 2019-05-31
#> GHI Closed Date Created 2019-06-01
#> ... ... ... ...
有人知道哪里出了问题吗?
根据?pad
,有一个group
参数
group - Optional character vector that specifies the grouping variable(s). Padding will take place within the different groups. When interval is not specified, it will be determined applying get_interval on the datetime variable as a whole, ignoring groups (see last example).
所以,最好利用那个参数
library(dplyr)
library(padr)
df %>%
pad(group = "Action Item ID")
# A tibble: 233 x 4
# `Action Item ID` `Current Status` Type Date
# <chr> <chr> <chr> <date>
# 1 ABC Closed Date Created 2019-01-01
# 2 ABC <NA> <NA> 2019-01-02
# 3 ABC <NA> <NA> 2019-01-03
# 4 ABC <NA> <NA> 2019-01-04
# 5 ABC <NA> <NA> 2019-01-05
# 6 ABC <NA> <NA> 2019-01-06
# 7 ABC <NA> <NA> 2019-01-07
# 8 ABC <NA> <NA> 2019-01-08
# 9 ABC <NA> <NA> 2019-01-09
#10 ABC <NA> <NA> 2019-01-10
# … with 223 more rows