如何通过两列合并其中一列

How to merge by two columns aggregating one of them

我正在苦苦思索如何使用两列进行合并。我有一个数据框,其中包含有关在某些日期使用了多少调色板的度量。我有另一个包含汽车行驶距离的数据框。然后我需要合并两者,加入的条件是:汽车和一辆车的距离总和,直到调色板测量发生的日期。 这是一个玩具示例:

#palette measure dataframe
measure = data.frame(car = c("A", "A", "A", "B"), data1 = c("20-09-2020", "15-10-2020", "13-05-2021", "20-10-2021"), palette = c(5,4,3,5))
#> measure
#  car      data1 palette 
#1   A 20-09-2020   5
#2   A 15-10-2020   4
#3   A 13-05-2021   3
#4   B 20-10-2021   5

#the distance dataframe
dist_ = data.frame(car = c("A", "C", "B", "A", "A", "A"), data2 = c("20-09-2020", "14-05-2020", "20-10-2021", "10-01-2021", "11-01-2021", "13-01-2021"), distance = c(10, 20, 10, 5, 3,8))
#> dist_
# car      data2 distance
#1   A 20-09-2020       10
#2   C 14-05-2020       20
#3   B 20-10-2021       10
#4   A 10-01-2021        5
#5   A 11-01-2021        3
#6   A 13-01-2021        8

#for result I'd like something like
#  car      data1 palette distance
#1   A 20-09-2020   5       10
#2   A 15-10-2020   4       0
#3   A 13-05-2020   3       16
#4   B 20-10-2021   5       10

请注意,在我有一个测量调色板的日期之前,距离会被求和。所以我可以说一辆车行驶了 16 公里,它的调色板是 3 厘米。

我想我可以使用 merge(x = measure, y = dist_, by.x=c("car", "date1"), by.y=c("car", "data2"),all.x = T) 之类的东西,但我不知道如何计算距离值,直到特定汽车的托盘测量日期为止。

关于我该怎么做的任何提示?

像这样的东西会起作用:

library(tidyverse)
library(lubridate)

result <- left_join(measure, dist_, by = c("car")) %>% 
  mutate(across(c("data1", "data2"), dmy)) %>% 
  filter(data1 >= data2) %>% 
  group_by(car, data2) %>% 
  mutate(threshold = min(data1)) %>% 
  ungroup() %>% 
  filter(data1 == threshold) %>% 
  group_by(car, data1, palette)%>% 
  summarise(distance = sum(distance))

result
# A tibble: 3 x 4
# Groups:   car, data1 [3]
  car   data1      palette distance
  <chr> <date>       <dbl>    <dbl>
1 A     2020-09-20       5       10
2 A     2021-05-13       3       16
3 B     2021-10-20       5       10

如果您想保留不匹配项,您可以像这样使用 measure 重新加入:

result.final <- measure %>% 
  mutate(data1 = dmy(data1))%>% 
  left_join(result, by = c("car", "data1", "palette")) 

result.final
  car      data1 palette distance
1   A 2020-09-20       5       10
2   A 2020-10-15       4       NA
3   A 2021-05-13       3       16
4   B 2021-10-20       5       10