检查每个月有多少客户在 7 天内购买了产品

check for each month, how many customers have bought a product within 7 days

我是编程初学者。我有一个 table,其中每一行都是一个订单(变量:id_customer 和日期)。我想设置一个函数来计算每个月在 7 天内下订单的客户数量。我该怎么做?

这是我的数据输出:

id_customer jour_commande
7 12-05-2021
10 13-07-2021
18 17-07-2021

enter image description here

我试过了,只是针对每个客户的两个订单之间的时间差:

data  %>%
  arrange(id_customer,jour_Commande) %>% 
  mutate(diff = jour_Commande - lag(jour_Commande)) %>% 
  group_by(id_customer,jour_Commande)

第一个客户一切顺利,但对其他客户来说,我的结果是负数。

有人可以帮我解决这个问题吗?

提前致谢!

这是一次尝试。我正在使用 dplyr 包中的风暴数据集作为替代,因为在屏幕截图中输入所有数据的工作量太大。

library(lubridate)
library(dplyr)

mydat <- storms %>%
  mutate(date = ymd(paste(year, month, day, sep = "-"))) %>%
  select(name, status, date) %>%
  distinct(name, status, .keep_all = TRUE)

mydat

# A tibble: 513 x 3
   name     status              date      
   <chr>    <chr>               <date>    
 1 Amy      tropical depression 1975-06-27
 2 Amy      tropical storm      1975-06-29
 3 Caroline tropical depression 1975-08-24
 4 Caroline tropical storm      1975-08-29
 5 Caroline hurricane           1975-08-30
 6 Doris    tropical storm      1975-08-29
 7 Doris    hurricane           1975-08-31
 8 Belle    tropical depression 1976-08-06
 9 Belle    tropical storm      1976-08-07
10 Belle    hurricane           1976-08-07
# ... with 503 more rows

mydat %>% 
  arrange(name, date) %>% 
  group_by(name) %>%
  mutate(diff = date - lag(date),
         within_7 = if_else(diff <= 7, TRUE, FALSE)) %>% 
  ungroup() %>%
  filter(!is.na(within_7)) %>%
  count(within_7)

# A tibble: 2 x 2
  within_7     n
  <lgl>    <int>
1 FALSE       50
2 TRUE       249

编辑:循环 month-by-month

library(tidyr)
library(purrr)
mydat %>% 
  mutate(month = month(date),
         year = year(date)) %>%
  group_by(year, month) %>%
  nest() %>%
  mutate(count_data = map(data, ~ .x %>%
                            arrange(name, date) %>% 
                            group_by(name) %>%
                            mutate(diff = date - lag(date),
                                   within_7 = if_else(diff <= 7, TRUE, FALSE)) %>% 
                            ungroup() %>%
                            filter(!is.na(within_7)) %>%
                            count(within_7))) %>%
  unnest(count_data) %>%
  ungroup()

# A tibble: 98 x 5
   month  year data             within_7     n
   <dbl> <dbl> <list>           <lgl>    <int>
 1     6  1975 <tibble [2 x 3]> TRUE         1
 2     8  1975 <tibble [5 x 3]> TRUE         3
 3     8  1976 <tibble [3 x 3]> TRUE         2
 4     9  1976 <tibble [3 x 3]> TRUE         2
 5     8  1977 <tibble [3 x 3]> TRUE         2
 6     9  1977 <tibble [3 x 3]> TRUE         2
 7    10  1977 <tibble [3 x 3]> TRUE         2
 8     7  1978 <tibble [2 x 3]> TRUE         1
 9     8  1978 <tibble [5 x 3]> TRUE         3
10    10  1978 <tibble [2 x 3]> TRUE         1
# ... with 88 more rows