对于每个 ID return r 中起始列的最早日期和结束列的最晚日期
For each ID return the earliest date from the start column and the latest date from the end column in r
我有一个数据集,每个 ID 都有多个开始日期和结束日期。我想从“startDate”列中获取最早的日期,从 endDate 列中获取最晚的日期。
data = data.frame(ID=c(1,1,1,1,2,2,2),
startDate= c("2018-01-31", "2018-01-31", "2018-01-31", "2019-06-06",
"2002-06-07", "2002-06-07", "2002-09-12"),
endDate = c(NA,NA,NA,"2019-07-09",NA,NA, "2002-10-02"))
这是我希望得到的输出:
data = data.frame(ID=c(1,2),
startDate= c("2018-01-31","2002-06-07"),
endDate = c("2019-07-09","2002-10-02"))
经过尝试,我已经找到了如何通过以下代码执行此操作,但如果可能的话,我更喜欢更高效的方法。我一直需要这样做,我宁愿不必创建两个单独的数据框。谢谢大家的帮助!
data_start <- data %>%
group_by(ID) %>%
arrange(startDate) %>%
slice(1L)
data_end <- data %>%
group_by(ID) %>%
arrange(desc(endDate)) %>%
slice(1L)
data <- left_join(data_start[,c(1,2)], data_end[,c(1,3)], by="ID")
您可以使用最小值和最大值,将变量用作日期
data %>% group_by(ID) %>%
summarise(startDate = min(as.Date(startDate),na.rm = T),
endDate = max(as.Date(endDate),na.rm = T))
或者 first
和 last
:
library(dplyr)
data %>%
group_by(ID) %>%
summarise(
startDate = first(startDate),
endDate = last(endDate)
)
# A tibble: 2 x 3
ID startDate endDate
* <dbl> <chr> <chr>
1 1 2018-01-31 2019-07-09
2 2 2002-06-07 2002-10-02
我有一个数据集,每个 ID 都有多个开始日期和结束日期。我想从“startDate”列中获取最早的日期,从 endDate 列中获取最晚的日期。
data = data.frame(ID=c(1,1,1,1,2,2,2),
startDate= c("2018-01-31", "2018-01-31", "2018-01-31", "2019-06-06",
"2002-06-07", "2002-06-07", "2002-09-12"),
endDate = c(NA,NA,NA,"2019-07-09",NA,NA, "2002-10-02"))
这是我希望得到的输出:
data = data.frame(ID=c(1,2),
startDate= c("2018-01-31","2002-06-07"),
endDate = c("2019-07-09","2002-10-02"))
经过尝试,我已经找到了如何通过以下代码执行此操作,但如果可能的话,我更喜欢更高效的方法。我一直需要这样做,我宁愿不必创建两个单独的数据框。谢谢大家的帮助!
data_start <- data %>%
group_by(ID) %>%
arrange(startDate) %>%
slice(1L)
data_end <- data %>%
group_by(ID) %>%
arrange(desc(endDate)) %>%
slice(1L)
data <- left_join(data_start[,c(1,2)], data_end[,c(1,3)], by="ID")
您可以使用最小值和最大值,将变量用作日期
data %>% group_by(ID) %>%
summarise(startDate = min(as.Date(startDate),na.rm = T),
endDate = max(as.Date(endDate),na.rm = T))
或者 first
和 last
:
library(dplyr)
data %>%
group_by(ID) %>%
summarise(
startDate = first(startDate),
endDate = last(endDate)
)
# A tibble: 2 x 3
ID startDate endDate
* <dbl> <chr> <chr>
1 1 2018-01-31 2019-07-09
2 2 2002-06-07 2002-10-02