如何按月和年对数据进行分组
How to group data by the month and year
在我进入这个问题之前,这里有一个 2 秒的背景:我一直在做这个 RFM 分析,感谢我们的同行,终于能够为每个 customer_id 输出一个 RFM 分数我的数据集,以及他们各自的 R、F 和 M 分数。在这里,如果你很好奇或想自己使用它:
SELECT *,
SUBSTRING(rfm_combined,1,1) AS recency_score,
SUBSTRING(rfm_combined,2,1) AS frequency_score,
SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (
SELECT
customer_id,
rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
(SELECT
customer_id,
ntile(5) over (order by last_order_date) AS rfm_recency,
ntile(5) over (order by count_order) AS rfm_frequency,
ntile(5) over (order by total_spent) AS rfm_monetary
FROM
(SELECT
customer_id,
MAX(oms_order_date) AS last_order_date,
COUNT(*) AS count_order,
SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM
l_dmw_order_report
WHERE
order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
AND line_status NOT IN ('CANCELLED', 'HOLD')
AND oms_order_date BETWEEN '2018-01-01' AND '2018-12-31'
GROUP BY customer_id))
ORDER BY customer_id desc)
这是一张图片:
enter image description here
现在,我的问题是我需要保持这种格式的输出,但也要按月和年对数据进行分组。我最初按 customer_id 对这些数据进行分组,因为我希望 RFM 和个人分数仅按唯一 customer_id 显示,但现在我需要按月+年和 customer_id (即第一列是 2018 年 1 月,然后列出该 month/year 组合的所有唯一 customer_id 行。然后是 2018 年 2 月,依此类推)。有人有什么建议吗?
非常感谢,如果有任何问题,请告诉我!!
最好的,
Z
如果您想按 year-month
和 customer_id
分组,请按此顺序更改您的 GROUP BY
:
SELECT *,
SUBSTRING(rfm_combined,1,1) AS recency_score,
SUBSTRING(rfm_combined,2,1) AS frequency_score,
SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (
SELECT
YearMonth,
customer_id,
rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
(SELECT
YearMonth,
customer_id,
ntile(5) over (order by last_order_date) AS rfm_recency,
ntile(5) over (order by count_order) AS rfm_frequency,
ntile(5) over (order by total_spent) AS rfm_monetary
FROM
(SELECT
to_char(oms_order_date, 'YYYY-MM') AS YearMonth,
customer_id,
MAX(oms_order_date) AS last_order_date,
COUNT(*) AS count_order,
SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM
l_dmw_order_report
WHERE
order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
AND line_status NOT IN ('CANCELLED', 'HOLD')
AND oms_order_date BETWEEN '2018-01-01' AND '2018-12-31'
GROUP BY to_char(oms_order_date, 'YYYY-MM'), customer_id))
ORDER BY YearMonth, customer_id desc)
根据安东尼奥的要求:
SELECT *,
SUBSTRING(rfm_combined,1,1) AS recency_score,
SUBSTRING(rfm_combined,2,1) AS frequency_score,
SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (
SELECT
to_char(oms_order_date, 'YYYY-MM'),
customer_id,
rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
(SELECT
customer_id,
ntile(5) over (order by last_order_date) AS rfm_recency,
ntile(5) over (order by count_order) AS rfm_frequency,
ntile(5) over (order by total_spent) AS rfm_monetary
FROM
(SELECT
customer_id,
MAX(oms_order_date) AS last_order_date,
COUNT(*) AS count_order,
SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM
l_dmw_order_report
WHERE
order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
AND line_status NOT IN ('CANCELLED', 'HOLD')
AND oms_order_date BETWEEN '2018-01-01' AND '2018-12-31'
GROUP BY to_char(oms_order_date, 'YYYY-MM'), customer_id))
ORDER BY customer_id desc)
LIMIT 100
错误说明:
“42703:列 "oms_order_date" 在 derived_table2 中不存在”
我知道这是 table 中的一个专栏。确认使用:
SELECT oms_order_date 来自 l_dmw_order_report
在我进入这个问题之前,这里有一个 2 秒的背景:我一直在做这个 RFM 分析,感谢我们的同行,终于能够为每个 customer_id 输出一个 RFM 分数我的数据集,以及他们各自的 R、F 和 M 分数。在这里,如果你很好奇或想自己使用它:
SELECT *,
SUBSTRING(rfm_combined,1,1) AS recency_score,
SUBSTRING(rfm_combined,2,1) AS frequency_score,
SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (
SELECT
customer_id,
rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
(SELECT
customer_id,
ntile(5) over (order by last_order_date) AS rfm_recency,
ntile(5) over (order by count_order) AS rfm_frequency,
ntile(5) over (order by total_spent) AS rfm_monetary
FROM
(SELECT
customer_id,
MAX(oms_order_date) AS last_order_date,
COUNT(*) AS count_order,
SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM
l_dmw_order_report
WHERE
order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
AND line_status NOT IN ('CANCELLED', 'HOLD')
AND oms_order_date BETWEEN '2018-01-01' AND '2018-12-31'
GROUP BY customer_id))
ORDER BY customer_id desc)
这是一张图片: enter image description here
现在,我的问题是我需要保持这种格式的输出,但也要按月和年对数据进行分组。我最初按 customer_id 对这些数据进行分组,因为我希望 RFM 和个人分数仅按唯一 customer_id 显示,但现在我需要按月+年和 customer_id (即第一列是 2018 年 1 月,然后列出该 month/year 组合的所有唯一 customer_id 行。然后是 2018 年 2 月,依此类推)。有人有什么建议吗?
非常感谢,如果有任何问题,请告诉我!!
最好的, Z
如果您想按 year-month
和 customer_id
分组,请按此顺序更改您的 GROUP BY
:
SELECT *,
SUBSTRING(rfm_combined,1,1) AS recency_score,
SUBSTRING(rfm_combined,2,1) AS frequency_score,
SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (
SELECT
YearMonth,
customer_id,
rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
(SELECT
YearMonth,
customer_id,
ntile(5) over (order by last_order_date) AS rfm_recency,
ntile(5) over (order by count_order) AS rfm_frequency,
ntile(5) over (order by total_spent) AS rfm_monetary
FROM
(SELECT
to_char(oms_order_date, 'YYYY-MM') AS YearMonth,
customer_id,
MAX(oms_order_date) AS last_order_date,
COUNT(*) AS count_order,
SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM
l_dmw_order_report
WHERE
order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
AND line_status NOT IN ('CANCELLED', 'HOLD')
AND oms_order_date BETWEEN '2018-01-01' AND '2018-12-31'
GROUP BY to_char(oms_order_date, 'YYYY-MM'), customer_id))
ORDER BY YearMonth, customer_id desc)
根据安东尼奥的要求:
SELECT *,
SUBSTRING(rfm_combined,1,1) AS recency_score,
SUBSTRING(rfm_combined,2,1) AS frequency_score,
SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (
SELECT
to_char(oms_order_date, 'YYYY-MM'),
customer_id,
rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
(SELECT
customer_id,
ntile(5) over (order by last_order_date) AS rfm_recency,
ntile(5) over (order by count_order) AS rfm_frequency,
ntile(5) over (order by total_spent) AS rfm_monetary
FROM
(SELECT
customer_id,
MAX(oms_order_date) AS last_order_date,
COUNT(*) AS count_order,
SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM
l_dmw_order_report
WHERE
order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
AND line_status NOT IN ('CANCELLED', 'HOLD')
AND oms_order_date BETWEEN '2018-01-01' AND '2018-12-31'
GROUP BY to_char(oms_order_date, 'YYYY-MM'), customer_id))
ORDER BY customer_id desc)
LIMIT 100
错误说明: “42703:列 "oms_order_date" 在 derived_table2 中不存在”
我知道这是 table 中的一个专栏。确认使用: SELECT oms_order_date 来自 l_dmw_order_report