R 列表中添加或删除的滚动计数
R Rolling Counts of Additions, or Deletions, to a List
df <- data.frame(date = as.Date(c(rep("2022-01-01", 3),
rep("2022-02-01", 3),
rep("2022-03-01", 4))),
flavor = c("Almond", "Apple", "Apricot",
"Almond", "Maple", "Mint",
"Apricot", "Pecan", "Praline", "Pumpkin"))
#> date flavor
#> 1 2022-01-01 Almond
#> 2 2022-01-01 Apple
#> 3 2022-01-01 Apricot
#> 4 2022-02-01 Almond
#> 5 2022-02-01 Maple
#> 6 2022-02-01 Mint
#> 7 2022-03-01 Apricot
#> 8 2022-03-01 Pecan
#> 9 2022-03-01 Praline
#> 10 2022-03-01 Pumpkin
上面的 R 数据框逐月跟踪冰淇淋店的冰淇淋口味。在 2 月份,添加了 1 月份没有的两种口味(枫糖、薄荷),并移除了 1 月份出现的两种口味(苹果、杏)。在 3 月份,添加了 2 月份没有的四种口味(杏子、山核桃、果仁糖、南瓜),并移除了 2 月份出现的三种口味(杏仁、枫糖、薄荷)。
#> date flavors.added flavors.removed
#> 1 2022-01-01 <NA> <NA>
#> 2 2022-02-01 2 2
#> 3 2022-03-01 4 3
如何编写 R 脚本来计算上面的摘要数据框?也就是说,我想要滚动计数每月添加但上个月不存在的冰淇淋口味,以及每月移除的上个月存在的口味计数。
使用data.table
:
library(data.table)
df2 = setDT(df)[, .(flavors = list(flavor)), date]
for (i in 2:nrow(df2))
set(
df2, i = i,
j = c('flavors_added', 'flavors_removed'),
list(length(setdiff(df2$flavors[[i]], df2$flavors[[i-1]])), length(setdiff(df2$flavors[[i-1]], df2$flavors[[i]])))
)
df2
# date flavors flavors_added flavors_removed
# <Date> <list> <int> <int>
# 1: 2022-01-01 Almond,Apple,Apricot NA NA
# 2: 2022-02-01 Almond,Maple,Mint 2 2
# 3: 2022-03-01 Apricot,Pecan,Praline,Pumpkin 4 3
在dplyr
中:
library(dplyr)
df %>%
group_by(date) %>%
summarise(flavors = list(flavor)) %>%
mutate(flavors.added = lengths(mapply(setdiff, flavors, lag(flavors))),
flavors.removed = lengths(mapply(setdiff, lag(flavors), flavors)))
输出
# A tibble: 3 × 4
date flavors flavors.added flavors.removed
<date> <list> <int> <int>
1 2022-01-01 <chr [3]> 3 0
2 2022-02-01 <chr [3]> 2 2
3 2022-03-01 <chr [4]> 4 3
df <- data.frame(date = as.Date(c(rep("2022-01-01", 3),
rep("2022-02-01", 3),
rep("2022-03-01", 4))),
flavor = c("Almond", "Apple", "Apricot",
"Almond", "Maple", "Mint",
"Apricot", "Pecan", "Praline", "Pumpkin"))
#> date flavor
#> 1 2022-01-01 Almond
#> 2 2022-01-01 Apple
#> 3 2022-01-01 Apricot
#> 4 2022-02-01 Almond
#> 5 2022-02-01 Maple
#> 6 2022-02-01 Mint
#> 7 2022-03-01 Apricot
#> 8 2022-03-01 Pecan
#> 9 2022-03-01 Praline
#> 10 2022-03-01 Pumpkin
上面的 R 数据框逐月跟踪冰淇淋店的冰淇淋口味。在 2 月份,添加了 1 月份没有的两种口味(枫糖、薄荷),并移除了 1 月份出现的两种口味(苹果、杏)。在 3 月份,添加了 2 月份没有的四种口味(杏子、山核桃、果仁糖、南瓜),并移除了 2 月份出现的三种口味(杏仁、枫糖、薄荷)。
#> date flavors.added flavors.removed
#> 1 2022-01-01 <NA> <NA>
#> 2 2022-02-01 2 2
#> 3 2022-03-01 4 3
如何编写 R 脚本来计算上面的摘要数据框?也就是说,我想要滚动计数每月添加但上个月不存在的冰淇淋口味,以及每月移除的上个月存在的口味计数。
使用data.table
:
library(data.table)
df2 = setDT(df)[, .(flavors = list(flavor)), date]
for (i in 2:nrow(df2))
set(
df2, i = i,
j = c('flavors_added', 'flavors_removed'),
list(length(setdiff(df2$flavors[[i]], df2$flavors[[i-1]])), length(setdiff(df2$flavors[[i-1]], df2$flavors[[i]])))
)
df2
# date flavors flavors_added flavors_removed
# <Date> <list> <int> <int>
# 1: 2022-01-01 Almond,Apple,Apricot NA NA
# 2: 2022-02-01 Almond,Maple,Mint 2 2
# 3: 2022-03-01 Apricot,Pecan,Praline,Pumpkin 4 3
在dplyr
中:
library(dplyr)
df %>%
group_by(date) %>%
summarise(flavors = list(flavor)) %>%
mutate(flavors.added = lengths(mapply(setdiff, flavors, lag(flavors))),
flavors.removed = lengths(mapply(setdiff, lag(flavors), flavors)))
输出
# A tibble: 3 × 4
date flavors flavors.added flavors.removed
<date> <list> <int> <int>
1 2022-01-01 <chr [3]> 3 0
2 2022-02-01 <chr [3]> 2 2
3 2022-03-01 <chr [4]> 4 3