检查每个月有多少客户在 7 天内购买了产品
check for each month, how many customers have bought a product within 7 days
我是编程初学者。我有一个 table,其中每一行都是一个订单(变量:id_customer 和日期)。我想设置一个函数来计算每个月在 7 天内下订单的客户数量。我该怎么做?
这是我的数据输出:
id_customer
jour_commande
7
12-05-2021
10
13-07-2021
18
17-07-2021
enter image description here
我试过了,只是针对每个客户的两个订单之间的时间差:
data %>%
arrange(id_customer,jour_Commande) %>%
mutate(diff = jour_Commande - lag(jour_Commande)) %>%
group_by(id_customer,jour_Commande)
第一个客户一切顺利,但对其他客户来说,我的结果是负数。
有人可以帮我解决这个问题吗?
提前致谢!
这是一次尝试。我正在使用 dplyr 包中的风暴数据集作为替代,因为在屏幕截图中输入所有数据的工作量太大。
library(lubridate)
library(dplyr)
mydat <- storms %>%
mutate(date = ymd(paste(year, month, day, sep = "-"))) %>%
select(name, status, date) %>%
distinct(name, status, .keep_all = TRUE)
mydat
# A tibble: 513 x 3
name status date
<chr> <chr> <date>
1 Amy tropical depression 1975-06-27
2 Amy tropical storm 1975-06-29
3 Caroline tropical depression 1975-08-24
4 Caroline tropical storm 1975-08-29
5 Caroline hurricane 1975-08-30
6 Doris tropical storm 1975-08-29
7 Doris hurricane 1975-08-31
8 Belle tropical depression 1976-08-06
9 Belle tropical storm 1976-08-07
10 Belle hurricane 1976-08-07
# ... with 503 more rows
mydat %>%
arrange(name, date) %>%
group_by(name) %>%
mutate(diff = date - lag(date),
within_7 = if_else(diff <= 7, TRUE, FALSE)) %>%
ungroup() %>%
filter(!is.na(within_7)) %>%
count(within_7)
# A tibble: 2 x 2
within_7 n
<lgl> <int>
1 FALSE 50
2 TRUE 249
编辑:循环 month-by-month
library(tidyr)
library(purrr)
mydat %>%
mutate(month = month(date),
year = year(date)) %>%
group_by(year, month) %>%
nest() %>%
mutate(count_data = map(data, ~ .x %>%
arrange(name, date) %>%
group_by(name) %>%
mutate(diff = date - lag(date),
within_7 = if_else(diff <= 7, TRUE, FALSE)) %>%
ungroup() %>%
filter(!is.na(within_7)) %>%
count(within_7))) %>%
unnest(count_data) %>%
ungroup()
# A tibble: 98 x 5
month year data within_7 n
<dbl> <dbl> <list> <lgl> <int>
1 6 1975 <tibble [2 x 3]> TRUE 1
2 8 1975 <tibble [5 x 3]> TRUE 3
3 8 1976 <tibble [3 x 3]> TRUE 2
4 9 1976 <tibble [3 x 3]> TRUE 2
5 8 1977 <tibble [3 x 3]> TRUE 2
6 9 1977 <tibble [3 x 3]> TRUE 2
7 10 1977 <tibble [3 x 3]> TRUE 2
8 7 1978 <tibble [2 x 3]> TRUE 1
9 8 1978 <tibble [5 x 3]> TRUE 3
10 10 1978 <tibble [2 x 3]> TRUE 1
# ... with 88 more rows
我是编程初学者。我有一个 table,其中每一行都是一个订单(变量:id_customer 和日期)。我想设置一个函数来计算每个月在 7 天内下订单的客户数量。我该怎么做?
这是我的数据输出:
id_customer | jour_commande |
---|---|
7 | 12-05-2021 |
10 | 13-07-2021 |
18 | 17-07-2021 |
enter image description here
我试过了,只是针对每个客户的两个订单之间的时间差:
data %>%
arrange(id_customer,jour_Commande) %>%
mutate(diff = jour_Commande - lag(jour_Commande)) %>%
group_by(id_customer,jour_Commande)
第一个客户一切顺利,但对其他客户来说,我的结果是负数。
有人可以帮我解决这个问题吗?
提前致谢!
这是一次尝试。我正在使用 dplyr 包中的风暴数据集作为替代,因为在屏幕截图中输入所有数据的工作量太大。
library(lubridate)
library(dplyr)
mydat <- storms %>%
mutate(date = ymd(paste(year, month, day, sep = "-"))) %>%
select(name, status, date) %>%
distinct(name, status, .keep_all = TRUE)
mydat
# A tibble: 513 x 3
name status date
<chr> <chr> <date>
1 Amy tropical depression 1975-06-27
2 Amy tropical storm 1975-06-29
3 Caroline tropical depression 1975-08-24
4 Caroline tropical storm 1975-08-29
5 Caroline hurricane 1975-08-30
6 Doris tropical storm 1975-08-29
7 Doris hurricane 1975-08-31
8 Belle tropical depression 1976-08-06
9 Belle tropical storm 1976-08-07
10 Belle hurricane 1976-08-07
# ... with 503 more rows
mydat %>%
arrange(name, date) %>%
group_by(name) %>%
mutate(diff = date - lag(date),
within_7 = if_else(diff <= 7, TRUE, FALSE)) %>%
ungroup() %>%
filter(!is.na(within_7)) %>%
count(within_7)
# A tibble: 2 x 2
within_7 n
<lgl> <int>
1 FALSE 50
2 TRUE 249
编辑:循环 month-by-month
library(tidyr)
library(purrr)
mydat %>%
mutate(month = month(date),
year = year(date)) %>%
group_by(year, month) %>%
nest() %>%
mutate(count_data = map(data, ~ .x %>%
arrange(name, date) %>%
group_by(name) %>%
mutate(diff = date - lag(date),
within_7 = if_else(diff <= 7, TRUE, FALSE)) %>%
ungroup() %>%
filter(!is.na(within_7)) %>%
count(within_7))) %>%
unnest(count_data) %>%
ungroup()
# A tibble: 98 x 5
month year data within_7 n
<dbl> <dbl> <list> <lgl> <int>
1 6 1975 <tibble [2 x 3]> TRUE 1
2 8 1975 <tibble [5 x 3]> TRUE 3
3 8 1976 <tibble [3 x 3]> TRUE 2
4 9 1976 <tibble [3 x 3]> TRUE 2
5 8 1977 <tibble [3 x 3]> TRUE 2
6 9 1977 <tibble [3 x 3]> TRUE 2
7 10 1977 <tibble [3 x 3]> TRUE 2
8 7 1978 <tibble [2 x 3]> TRUE 1
9 8 1978 <tibble [5 x 3]> TRUE 3
10 10 1978 <tibble [2 x 3]> TRUE 1
# ... with 88 more rows