计算负值,当最近的观察结果为正值时停止,全部按组完成
Counting negative values, stopping when there is a positive among most recent observations, all done by group
好吧,假设我们有一个如下所示的数据框:
Month Product sales sales_lag YoY_change chart_color
2021-12-01 Tshirt 82525 108748 -0.2411 Negative
2022-01-01 Tshirt 109411 138472 -0.2099 Negative
2022-02-01 Tshirt 106934 115497 -0.0741 Negative
2021-12-01 Pants 97863 78419 0.2480 Positive
2022-01-01 Pants 103296 100614 0.0267 Positive
2022-02-01 Pants 82306 76913 0.0701 Positive
我正在尝试找出一种方法来计算最长 运行 个月的产品为负数 YoY_change。所以在这种情况下,我正在寻找 return 像这样的 table 的东西:
Negative_run
Tshirt 3
Pants 0
告诉我们最近一个月裤子同比增长正,但 T 恤连续 3 个月出现负增长。虽然它没有显示在这个数据集中,但它需要考虑一个产品在一个月内为正,在下个月为负。如果可能的话,我想在 tidyverse 中执行此操作,但对 R 宇宙中的其他选项持开放态度。
我会先创建一个列,用于确定 YOY 变化是否为负,然后进行汇总。
df1$NegativeY = as.numeric(ifelse(df1$YoY_change>0, 0,1))
df1 %>%
group_by(Product) %>%
summarise(Negative_run = sum(NegativeY))
df %>%
# calculate by product
group_by(Product) %>%
# find runs of numbers
mutate(g = data.table::rleid(chart_color)) %>%
# get the last run, and make sure that the last observation was negative
filter(g == max(g), last(YoY_change < 0), .preserve = TRUE) %>%
# count the length of the run
count()
# A tibble: 2 × 2
# Groups: Product [2]
Product n
<chr> <int>
1 Pants 0
2 Tshirt 3
因为你想计算连续 Negative
的次数,我们需要某种 rle
函数。
library(dplyr)
df %>%
group_by(Product,
grp = with(rle(chart_color), rep(seq_along(lengths), lengths))) %>%
mutate(Negative_run = ifelse(chart_color == "Positive", 0, seq_along(grp))) %>%
group_by(Product) %>%
summarize(Negative_run = max(Negative_run))
# A tibble: 2 × 2
Product Negative_run
<chr> <dbl>
1 Pants 0
2 Tshirt 3
数据
df <- structure(list(Month = c("2021-12-01", "2022-01-01", "2022-02-01",
"2021-12-01", "2022-01-01", "2022-02-01"), Product = c("Tshirt",
"Tshirt", "Tshirt", "Pants", "Pants", "Pants"), sales = c(82525L,
109411L, 106934L, 97863L, 103296L, 82306L), sales_lag = c(108748L,
138472L, 115497L, 78419L, 100614L, 76913L), YoY_change = c(-0.2411,
-0.2099, -0.0741, 0.248, 0.0267, 0.0701), chart_color = c("Negative",
"Negative", "Negative", "Positive", "Positive", "Positive")), class = "data.frame", row.names = c(NA,
-6L))
好吧,假设我们有一个如下所示的数据框:
Month Product sales sales_lag YoY_change chart_color
2021-12-01 Tshirt 82525 108748 -0.2411 Negative
2022-01-01 Tshirt 109411 138472 -0.2099 Negative
2022-02-01 Tshirt 106934 115497 -0.0741 Negative
2021-12-01 Pants 97863 78419 0.2480 Positive
2022-01-01 Pants 103296 100614 0.0267 Positive
2022-02-01 Pants 82306 76913 0.0701 Positive
我正在尝试找出一种方法来计算最长 运行 个月的产品为负数 YoY_change。所以在这种情况下,我正在寻找 return 像这样的 table 的东西:
Negative_run
Tshirt 3
Pants 0
告诉我们最近一个月裤子同比增长正,但 T 恤连续 3 个月出现负增长。虽然它没有显示在这个数据集中,但它需要考虑一个产品在一个月内为正,在下个月为负。如果可能的话,我想在 tidyverse 中执行此操作,但对 R 宇宙中的其他选项持开放态度。
我会先创建一个列,用于确定 YOY 变化是否为负,然后进行汇总。
df1$NegativeY = as.numeric(ifelse(df1$YoY_change>0, 0,1))
df1 %>%
group_by(Product) %>%
summarise(Negative_run = sum(NegativeY))
df %>%
# calculate by product
group_by(Product) %>%
# find runs of numbers
mutate(g = data.table::rleid(chart_color)) %>%
# get the last run, and make sure that the last observation was negative
filter(g == max(g), last(YoY_change < 0), .preserve = TRUE) %>%
# count the length of the run
count()
# A tibble: 2 × 2 # Groups: Product [2] Product n <chr> <int> 1 Pants 0 2 Tshirt 3
因为你想计算连续 Negative
的次数,我们需要某种 rle
函数。
library(dplyr)
df %>%
group_by(Product,
grp = with(rle(chart_color), rep(seq_along(lengths), lengths))) %>%
mutate(Negative_run = ifelse(chart_color == "Positive", 0, seq_along(grp))) %>%
group_by(Product) %>%
summarize(Negative_run = max(Negative_run))
# A tibble: 2 × 2
Product Negative_run
<chr> <dbl>
1 Pants 0
2 Tshirt 3
数据
df <- structure(list(Month = c("2021-12-01", "2022-01-01", "2022-02-01",
"2021-12-01", "2022-01-01", "2022-02-01"), Product = c("Tshirt",
"Tshirt", "Tshirt", "Pants", "Pants", "Pants"), sales = c(82525L,
109411L, 106934L, 97863L, 103296L, 82306L), sales_lag = c(108748L,
138472L, 115497L, 78419L, 100614L, 76913L), YoY_change = c(-0.2411,
-0.2099, -0.0741, 0.248, 0.0267, 0.0701), chart_color = c("Negative",
"Negative", "Negative", "Positive", "Positive", "Positive")), class = "data.frame", row.names = c(NA,
-6L))