R 列表中添加或删除的滚动计数

R Rolling Counts of Additions, or Deletions, to a List

df <- data.frame(date = as.Date(c(rep("2022-01-01", 3), 
                                  rep("2022-02-01", 3),
                                  rep("2022-03-01", 4))),
                 flavor = c("Almond", "Apple", "Apricot", 
                            "Almond", "Maple", "Mint",
                            "Apricot", "Pecan", "Praline", "Pumpkin"))
#>          date  flavor
#> 1  2022-01-01  Almond
#> 2  2022-01-01   Apple
#> 3  2022-01-01 Apricot
#> 4  2022-02-01  Almond
#> 5  2022-02-01   Maple
#> 6  2022-02-01    Mint
#> 7  2022-03-01 Apricot
#> 8  2022-03-01   Pecan
#> 9  2022-03-01 Praline
#> 10 2022-03-01 Pumpkin

上面的 R 数据框逐月跟踪冰淇淋店的冰淇淋口味。在 2 月份,添加了 1 月份没有的两种口味(枫糖、薄荷),并移除了 1 月份出现的两种口味(苹果、杏)。在 3 月份,添加了 2 月份没有的四种口味(杏子、山核桃、果仁糖、南瓜),并移除了 2 月份出现的三种口味(杏仁、枫糖、薄荷)。

#>          date  flavors.added  flavors.removed
#> 1  2022-01-01           <NA>             <NA>
#> 2  2022-02-01              2                2
#> 3  2022-03-01              4                3

如何编写 R 脚本来计算上面的摘要数据框?也就是说,我想要滚动计数每月添加但上个月不存在的冰淇淋口味,以及每月移除的上个月存在的口味计数。

使用data.table:

library(data.table)
df2 = setDT(df)[, .(flavors = list(flavor)), date]
for (i in 2:nrow(df2))
  set(
    df2, i = i, 
    j = c('flavors_added', 'flavors_removed'), 
    list(length(setdiff(df2$flavors[[i]], df2$flavors[[i-1]])), length(setdiff(df2$flavors[[i-1]], df2$flavors[[i]])))
  )

df2

#          date                       flavors flavors_added flavors_removed
#        <Date>                        <list>         <int>           <int>
# 1: 2022-01-01          Almond,Apple,Apricot            NA              NA
# 2: 2022-02-01             Almond,Maple,Mint             2               2
# 3: 2022-03-01 Apricot,Pecan,Praline,Pumpkin             4               3

dplyr中:

library(dplyr)
df %>% 
  group_by(date) %>% 
  summarise(flavors = list(flavor)) %>% 
  mutate(flavors.added = lengths(mapply(setdiff, flavors, lag(flavors))),
         flavors.removed = lengths(mapply(setdiff, lag(flavors), flavors)))

输出

# A tibble: 3 × 4
  date       flavors   flavors.added flavors.removed
  <date>     <list>            <int>           <int>
1 2022-01-01 <chr [3]>             3               0
2 2022-02-01 <chr [3]>             2               2
3 2022-03-01 <chr [4]>             4               3