左连接和 select R 中的下一个时间观察
Left join and select the next observation in time in R
假设我有两个数据框
df <- data.frame(ID=c("Ana", "Lola", "Ana"),
Date=c("2020-06-06", "2020-06- 06", "2020-06- 07"),
meat=c("fish", "poultry", "poultry"),
time_ordered=c("2020-06-06 12:24:39", "2020-06-06 12:34:36", "2020-06-07 12:24:39"))
df2 <- data.frame(ID=c("Ana","Ana", "Lola", "Ana"),
Date=c("2020-06-06", "2020-06-06", "2020-06- 06", "2020-06- 07"),
meat=c("fish", "fish", "poultry", "poultry"),
time_received=c("2020-06-06 12:24:40", "2020-06-06 12:26:49", "2020-06-07 12:36:39", "2020-06-07 13:04:39"))
假设我想在 ID 和肉上加入这两个数据帧。
然后,对于给定的观察,我想将 time_ordered 与其后的第一个 time_received 匹配。
例如,我应该有一行“ID = Ana, Data= 2020-06-06, Meat = fish, time_ordered = 2020-06-06 12:24:39, time received = 2020-06-06 12:24:40".
所以我不会将 time_received“2020-06-06 12:26:49”与任何东西匹配。
事实上,对于每个 (ID, Meat, time_observed),我想唯一匹配 (ID, Meat, min(time_received) > time_observed)
在此先感谢您!
通过df2
通过ID
、meat
和Date
加入df
,仅保留time_received > time_ordered
排列数据的行time_received
并仅保留唯一行。
library(dplyr)
library(lubridate)
df %>%
left_join(df2, by = c('ID', 'meat', 'Date')) %>%
mutate(Date = ymd(Date),
across(c(time_ordered, time_received), ymd_hms)) %>%
filter(time_received > time_ordered) %>%
arrange(ID, Date, meat, time_received) %>%
distinct(ID, Date, meat, .keep_all = TRUE)
# ID Date meat time_ordered time_received
#1 Ana 2020-06-06 fish 2020-06-06 12:24:39 2020-06-06 12:24:40
#2 Ana 2020-06-07 poultry 2020-06-07 12:24:39 2020-06-07 13:04:39
#3 Lola 2020-06-06 poultry 2020-06-06 12:34:36 2020-06-07 12:36:39
假设我有两个数据框
df <- data.frame(ID=c("Ana", "Lola", "Ana"),
Date=c("2020-06-06", "2020-06- 06", "2020-06- 07"),
meat=c("fish", "poultry", "poultry"),
time_ordered=c("2020-06-06 12:24:39", "2020-06-06 12:34:36", "2020-06-07 12:24:39"))
df2 <- data.frame(ID=c("Ana","Ana", "Lola", "Ana"),
Date=c("2020-06-06", "2020-06-06", "2020-06- 06", "2020-06- 07"),
meat=c("fish", "fish", "poultry", "poultry"),
time_received=c("2020-06-06 12:24:40", "2020-06-06 12:26:49", "2020-06-07 12:36:39", "2020-06-07 13:04:39"))
假设我想在 ID 和肉上加入这两个数据帧。 然后,对于给定的观察,我想将 time_ordered 与其后的第一个 time_received 匹配。 例如,我应该有一行“ID = Ana, Data= 2020-06-06, Meat = fish, time_ordered = 2020-06-06 12:24:39, time received = 2020-06-06 12:24:40".
所以我不会将 time_received“2020-06-06 12:26:49”与任何东西匹配。 事实上,对于每个 (ID, Meat, time_observed),我想唯一匹配 (ID, Meat, min(time_received) > time_observed)
在此先感谢您!
通过df2
通过ID
、meat
和Date
加入df
,仅保留time_received > time_ordered
排列数据的行time_received
并仅保留唯一行。
library(dplyr)
library(lubridate)
df %>%
left_join(df2, by = c('ID', 'meat', 'Date')) %>%
mutate(Date = ymd(Date),
across(c(time_ordered, time_received), ymd_hms)) %>%
filter(time_received > time_ordered) %>%
arrange(ID, Date, meat, time_received) %>%
distinct(ID, Date, meat, .keep_all = TRUE)
# ID Date meat time_ordered time_received
#1 Ana 2020-06-06 fish 2020-06-06 12:24:39 2020-06-06 12:24:40
#2 Ana 2020-06-07 poultry 2020-06-07 12:24:39 2020-06-07 13:04:39
#3 Lola 2020-06-06 poultry 2020-06-06 12:34:36 2020-06-07 12:36:39