计算负值,当最近的观察结果为正值时停止,全部按组完成

Counting negative values, stopping when there is a positive among most recent observations, all done by group

好吧,假设我们有一个如下所示的数据框:

Month       Product         sales   sales_lag YoY_change  chart_color
2021-12-01  Tshirt          82525   108748    -0.2411     Negative
2022-01-01  Tshirt          109411  138472    -0.2099     Negative
2022-02-01  Tshirt          106934  115497    -0.0741     Negative
2021-12-01  Pants           97863   78419      0.2480     Positive
2022-01-01  Pants           103296  100614     0.0267     Positive
2022-02-01  Pants           82306   76913      0.0701     Positive

我正在尝试找出一种方法来计算最长 运行 个月的产品为负数 YoY_change。所以在这种情况下,我正在寻找 return 像这样的 table 的东西:

          Negative_run
Tshirt    3
Pants     0

告诉我们最近一个月裤子同比增长正,但 T 恤连续 3 个月出现负增长。虽然它没有显示在这个数据集中,但它需要考虑一个产品在一个月内为正,在下个月为负。如果可能的话,我想在 tidyverse 中执行此操作,但对 R 宇宙中的其他选项持开放态度。

我会先创建一个列,用于确定 YOY 变化是否为负,然后进行汇总。

df1$NegativeY = as.numeric(ifelse(df1$YoY_change>0, 0,1))

df1 %>%
  group_by(Product) %>%
  summarise(Negative_run = sum(NegativeY))

df %>% 
  # calculate by product
  group_by(Product) %>% 
  # find runs of numbers
  mutate(g = data.table::rleid(chart_color)) %>% 
  # get the last run, and make sure that the last observation was negative
  filter(g == max(g), last(YoY_change < 0), .preserve = TRUE) %>% 
  # count the length of the run
  count()
# A tibble: 2 × 2
# Groups:   Product [2]
  Product     n
  <chr>   <int>
1 Pants       0
2 Tshirt      3

因为你想计算连续 Negative 的次数,我们需要某种 rle 函数。

library(dplyr)

df %>% 
  group_by(Product, 
           grp = with(rle(chart_color), rep(seq_along(lengths), lengths))) %>%
  mutate(Negative_run = ifelse(chart_color == "Positive", 0, seq_along(grp))) %>% 
  group_by(Product) %>% 
  summarize(Negative_run = max(Negative_run))

# A tibble: 2 × 2
  Product Negative_run
  <chr>          <dbl>
1 Pants              0
2 Tshirt             3

数据

df <- structure(list(Month = c("2021-12-01", "2022-01-01", "2022-02-01", 
"2021-12-01", "2022-01-01", "2022-02-01"), Product = c("Tshirt", 
"Tshirt", "Tshirt", "Pants", "Pants", "Pants"), sales = c(82525L, 
109411L, 106934L, 97863L, 103296L, 82306L), sales_lag = c(108748L, 
138472L, 115497L, 78419L, 100614L, 76913L), YoY_change = c(-0.2411, 
-0.2099, -0.0741, 0.248, 0.0267, 0.0701), chart_color = c("Negative", 
"Negative", "Negative", "Positive", "Positive", "Positive")), class = "data.frame", row.names = c(NA, 
-6L))