显示客户每天的花费以及他们前一天是否花费 (SQL)
Show customer spend per day and whether they have spent the previous day (SQL)
我正在尝试为每个每天消费的客户创建一个新行,以及一个指示他们前一天是否消费的列。如果客户一天消费两次,他们在 table 中仍然只有 1 行。如果客户前一天花了钱,那么它将显示为 TRUE。
下面是原文table:
+---------------------+-------------+-----------------+
| datetime | customer_id | amount |
+---------------------+-------------+-----------------+
| 2018-03-01 03:00:00 | 3786 | 14.00000 |
| 2018-03-02 17:00:00 | 5678 | 25.00000 |
| 2018-07-09 18:00:00 | 5647 | 1000.99000 |
| 2018-08-17 19:00:00 | 5267 | 45.00000 |
| 2018-08-25 08:00:00 | 3456 | 78.00000 |
| 2018-08-25 17:00:00 | 3456 | 25.00000 |
| 2018-08-26 03:00:00 | 3456 | 34.90000 |
| 2019-02-03 08:00:00 | 3468 | 0.00000 |
| 2019-03-09 06:00:00 | 1111 | 100.00000 |
| 2019-05-25 14:00:00 | 3456 | 15.00000 |
| 2019-07-02 14:00:00 | 88889 | 45.00000 |
| 2019-07-04 03:00:00 | 8979 | 9.00000 |
| 2019-07-09 14:00:00 | 4567 | 9.99000 |
| 2019-08-25 08:00:00 | 1234 | 88.00000 |
| 2019-08-30 09:31:00 | 1234 | 30.00000 |
| 2019-08-30 12:00:00 | 9876 | 55.00000 |
| 2019-09-01 13:00:00 | 88889 | 23.00000 |
+---------------------+-------------+-----------------+
这是 CREATE 语句:
CREATE TABLE IF NOT EXISTS `spend` ( `datetime` datetime NOT NULL, `customer_id` int(11) NOT NULL, `amount` decimal(10, 5) NOT NULL, PRIMARY KEY (`datetime`)) DEFAULT CHARSET=utf8mb4;
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-03-01 03:00:00', 3786, 14.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-03-02 17:00:00', 5678, 25.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-07-09 18:00:00', 5647, 1000.99000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-17 19:00:00', 5267, 45.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-25 08:00:00', 3456, 78.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-25 17:00:00', 3456, 25.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-26 03:00:00', 3456, 34.90000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-02-03 08:00:00', 3468, 0.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-03-09 06:00:00', 1111, 100.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-05-25 14:00:00', 3456, 15.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-07-02 14:00:00', 88889, 45.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-07-04 03:00:00', 8979, 9.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-07-09 14:00:00', 4567, 9.99000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-08-25 08:00:00', 1234, 88.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-08-30 09:31:00', 1234, 30.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-08-30 12:00:00', 9876, 55.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-09-01 13:00:00', 88889, 23.00000);
这是我目前得到的:
SELECT CAST(datetime AS DATE) AS day,
COUNT(DISTINCT customer_id) AS daily_spend,
FROM spend
WHERE amount is not NULL
ORDER BY date;
此代码目前无法正常工作,但我正在尽力修复它。
我浏览了一些帖子,但我能找到的最接近的是:
我正在尝试生成如下所示的 table:
+------------+-------------+--------------------+
| day | customer_id | spent_previous_day |
+------------+-------------+--------------------+
| 2018-03-01 | 3786 | FALSE |
+------------+-------------+--------------------+
| 2018-03-02 | 5678 | FALSE |
+------------+-------------+--------------------+
| 2018-07-09 | 5647 | FALSE |
+------------+-------------+--------------------+
| 2018-08-17 | 5267 | FALSE |
+------------+-------------+--------------------+
| 2018-08-25 | 3456 | FALSE |
+------------+-------------+--------------------+
| 2018-08-26 | 3456 | TRUE |
+------------+-------------+--------------------+
| 2019-02-03 | 3468 | FALSE |
+------------+-------------+--------------------+
| 2019-03-09 | 1111 | FALSE |
+------------+-------------+--------------------+
| 2019-05-25 | 3456 | FALSE |
+------------+-------------+--------------------+
| 2019-07-02 | 88889 | FALSE |
+------------+-------------+--------------------+
| 2019-07-04 | 8979 | FALSE |
+------------+-------------+--------------------+
| 2019-07-09 | 4567 | FALSE |
+------------+-------------+--------------------+
| 2019-08-25 | 1234 | FALSE |
+------------+-------------+--------------------+
| 2019-08-30 | 1234 | FALSE |
+------------+-------------+--------------------+
| 2019-08-30 | 9876 | FALSE |
+------------+-------------+--------------------+
| 2019-09-01 | 88889 | FALSE |
+------------+-------------+--------------------+
编辑:
这是我根据收到的建议使用的当前代码。
select customer_id, CAST(datetime AS DATE) AS day,
max(date(datetime)) over (partition by customer_id
order by CAST(datetime AS DATE)
range between interval 1 day preceding and interval 1 day preceding
) is not null AS spent_previous_day
from spend
这是结果 table:
+------------+-------------+--------------------+
| day | customer_id | spent_previous_day |
+------------+-------------+--------------------+
| 2019-03-09 | 1111 | 0 |
+------------+-------------+--------------------+
| 2019-08-25 | 1234 | 0 |
+------------+-------------+--------------------+
| 2019-08-30 | 1234 | 0 |
+------------+-------------+--------------------+
| 2018-08-25 | 3456 | 0 |
+------------+-------------+--------------------+
| 2018-08-25 | 3456 | 0 |
+------------+-------------+--------------------+
| 2018-08-26 | 3456 | 1 |
+------------+-------------+--------------------+
| 2019-05-25 | 3456 | 0 |
+------------+-------------+--------------------+
| 2019-02-03 | 3468 | 0 |
+------------+-------------+--------------------+
我试过 GROUP BY day, customer_id
,但出现错误。
假设客户不会在同一天进行多次购买,只需使用 lag()
:
select t.*,
( date(lag(datetime) over (partition by customer_id order by datetime)) = date(datetime) - interval 1 day
) as prev_day_flag
from spend t;
如果你可以重复,那么试试这个而不是 lag()
:
max(date(datetime)) over (partition by customer_id
order by date(datetime)
range between interval 1 day preceding and interval 1 day preceding
) is not null
编辑:
如果您希望每个客户每天一行:
select s.*,
( date(lag(dte) over (partition by customer_id order by dte)) = dte - interval 1 day
) as prev_day_flag
from (select customer_id, date(datetime) as dte, sum(amount) as amount
from spend s
group by customer_id, date(datetime)
) s;
我正在尝试为每个每天消费的客户创建一个新行,以及一个指示他们前一天是否消费的列。如果客户一天消费两次,他们在 table 中仍然只有 1 行。如果客户前一天花了钱,那么它将显示为 TRUE。
下面是原文table:
+---------------------+-------------+-----------------+
| datetime | customer_id | amount |
+---------------------+-------------+-----------------+
| 2018-03-01 03:00:00 | 3786 | 14.00000 |
| 2018-03-02 17:00:00 | 5678 | 25.00000 |
| 2018-07-09 18:00:00 | 5647 | 1000.99000 |
| 2018-08-17 19:00:00 | 5267 | 45.00000 |
| 2018-08-25 08:00:00 | 3456 | 78.00000 |
| 2018-08-25 17:00:00 | 3456 | 25.00000 |
| 2018-08-26 03:00:00 | 3456 | 34.90000 |
| 2019-02-03 08:00:00 | 3468 | 0.00000 |
| 2019-03-09 06:00:00 | 1111 | 100.00000 |
| 2019-05-25 14:00:00 | 3456 | 15.00000 |
| 2019-07-02 14:00:00 | 88889 | 45.00000 |
| 2019-07-04 03:00:00 | 8979 | 9.00000 |
| 2019-07-09 14:00:00 | 4567 | 9.99000 |
| 2019-08-25 08:00:00 | 1234 | 88.00000 |
| 2019-08-30 09:31:00 | 1234 | 30.00000 |
| 2019-08-30 12:00:00 | 9876 | 55.00000 |
| 2019-09-01 13:00:00 | 88889 | 23.00000 |
+---------------------+-------------+-----------------+
这是 CREATE 语句:
CREATE TABLE IF NOT EXISTS `spend` ( `datetime` datetime NOT NULL, `customer_id` int(11) NOT NULL, `amount` decimal(10, 5) NOT NULL, PRIMARY KEY (`datetime`)) DEFAULT CHARSET=utf8mb4;
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-03-01 03:00:00', 3786, 14.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-03-02 17:00:00', 5678, 25.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-07-09 18:00:00', 5647, 1000.99000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-17 19:00:00', 5267, 45.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-25 08:00:00', 3456, 78.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-25 17:00:00', 3456, 25.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-26 03:00:00', 3456, 34.90000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-02-03 08:00:00', 3468, 0.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-03-09 06:00:00', 1111, 100.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-05-25 14:00:00', 3456, 15.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-07-02 14:00:00', 88889, 45.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-07-04 03:00:00', 8979, 9.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-07-09 14:00:00', 4567, 9.99000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-08-25 08:00:00', 1234, 88.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-08-30 09:31:00', 1234, 30.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-08-30 12:00:00', 9876, 55.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-09-01 13:00:00', 88889, 23.00000);
这是我目前得到的:
SELECT CAST(datetime AS DATE) AS day,
COUNT(DISTINCT customer_id) AS daily_spend,
FROM spend
WHERE amount is not NULL
ORDER BY date;
此代码目前无法正常工作,但我正在尽力修复它。
我浏览了一些帖子,但我能找到的最接近的是:
我正在尝试生成如下所示的 table:
+------------+-------------+--------------------+
| day | customer_id | spent_previous_day |
+------------+-------------+--------------------+
| 2018-03-01 | 3786 | FALSE |
+------------+-------------+--------------------+
| 2018-03-02 | 5678 | FALSE |
+------------+-------------+--------------------+
| 2018-07-09 | 5647 | FALSE |
+------------+-------------+--------------------+
| 2018-08-17 | 5267 | FALSE |
+------------+-------------+--------------------+
| 2018-08-25 | 3456 | FALSE |
+------------+-------------+--------------------+
| 2018-08-26 | 3456 | TRUE |
+------------+-------------+--------------------+
| 2019-02-03 | 3468 | FALSE |
+------------+-------------+--------------------+
| 2019-03-09 | 1111 | FALSE |
+------------+-------------+--------------------+
| 2019-05-25 | 3456 | FALSE |
+------------+-------------+--------------------+
| 2019-07-02 | 88889 | FALSE |
+------------+-------------+--------------------+
| 2019-07-04 | 8979 | FALSE |
+------------+-------------+--------------------+
| 2019-07-09 | 4567 | FALSE |
+------------+-------------+--------------------+
| 2019-08-25 | 1234 | FALSE |
+------------+-------------+--------------------+
| 2019-08-30 | 1234 | FALSE |
+------------+-------------+--------------------+
| 2019-08-30 | 9876 | FALSE |
+------------+-------------+--------------------+
| 2019-09-01 | 88889 | FALSE |
+------------+-------------+--------------------+
编辑: 这是我根据收到的建议使用的当前代码。
select customer_id, CAST(datetime AS DATE) AS day,
max(date(datetime)) over (partition by customer_id
order by CAST(datetime AS DATE)
range between interval 1 day preceding and interval 1 day preceding
) is not null AS spent_previous_day
from spend
这是结果 table:
+------------+-------------+--------------------+
| day | customer_id | spent_previous_day |
+------------+-------------+--------------------+
| 2019-03-09 | 1111 | 0 |
+------------+-------------+--------------------+
| 2019-08-25 | 1234 | 0 |
+------------+-------------+--------------------+
| 2019-08-30 | 1234 | 0 |
+------------+-------------+--------------------+
| 2018-08-25 | 3456 | 0 |
+------------+-------------+--------------------+
| 2018-08-25 | 3456 | 0 |
+------------+-------------+--------------------+
| 2018-08-26 | 3456 | 1 |
+------------+-------------+--------------------+
| 2019-05-25 | 3456 | 0 |
+------------+-------------+--------------------+
| 2019-02-03 | 3468 | 0 |
+------------+-------------+--------------------+
我试过 GROUP BY day, customer_id
,但出现错误。
假设客户不会在同一天进行多次购买,只需使用 lag()
:
select t.*,
( date(lag(datetime) over (partition by customer_id order by datetime)) = date(datetime) - interval 1 day
) as prev_day_flag
from spend t;
如果你可以重复,那么试试这个而不是 lag()
:
max(date(datetime)) over (partition by customer_id
order by date(datetime)
range between interval 1 day preceding and interval 1 day preceding
) is not null
编辑:
如果您希望每个客户每天一行:
select s.*,
( date(lag(dte) over (partition by customer_id order by dte)) = dte - interval 1 day
) as prev_day_flag
from (select customer_id, date(datetime) as dte, sum(amount) as amount
from spend s
group by customer_id, date(datetime)
) s;