RFM 分析——价值不断变化
RFM Analysis - Value keeps changing
进行以下查询。
SELECT
customer_id,
NTILE(5) OVER (ORDER BY MAX(oms_order_date)) AS r_score
FROM
mdwh.us_raw.l_dmw_order_report
WHERE
quantity_ordered > 0
AND customer_id IS NOT NULL
AND customer_id != ('')
AND UPPER(line_status) NOT IN ('','RETURN', 'CANCELLED')
AND UPPER(item_description_1) NOT IN ('','FREIGHT', 'RETURN LABEL FEE', 'VISIBLE STITCH')
AND (quantity_ordered * unit_price_amount) > 0
AND extended_amount < 1000 --NO BULK ORDERS
AND oms_order_date BETWEEN '2020-01-01' AND '2020-01-01'
AND SUBSTRING(upc,1,6) IN (SELECT item_code FROM item_master_zs WHERE new_division BETWEEN '11' AND '39')
GROUP BY
customer_id
ORDER BY
customer_id
我在这里所做的就是,在给定一些条件的情况下,为我提供唯一的客户 ID,然后将他们的最新购买日期聚类为五分位数,并在第二列中为我提供分数。但是每次我 运行 查询时, r_score 值一直在变化?我究竟做错了什么..?这是 table 的片段(同样,r_score 值不断变化):
ntile()
的问题在于,它通过在不同的组中放置相同的值来确保组 完全 相同的大小。
因此,我通常手动计算,使用 rank()
:
ceil(rank() over (order by max(oms_order_date)) * 5.0 /
count(*) over ()
) as r_score
如果您使用 row_number()
,您将得到 ntile()
的等价物。
如果您想使用 ntile()
,您可以使用额外的 order by
键,这样排序键是唯一的。
===================
2/17/20 5:18PM 编辑
这是我使用的新代码:
SELECT
customer_id,
CEIL(RANK() OVER (ORDER BY MAX(oms_order_date)) * 5 / COUNT(*) OVER ()) AS r_score,
CEIL(RANK() OVER (ORDER BY COUNT(client_web_order_number)) * 5 / COUNT(*) OVER ()) AS f_score,
CEIL(RANK() OVER (ORDER BY AVG(extended_amount)) * 5 / COUNT(*) OVER ()) AS m_score,
(r_score || f_score || m_score) AS rfm_score
FROM
mdwh.us_raw.l_dmw_order_report t1
WHERE
quantity_ordered > 0
AND customer_id IS NOT NULL
AND customer_id != ('')
AND oms_order_date IS NOT NULL
AND UPPER(line_status) NOT IN ('','RETURN', 'CANCELLED')
AND UPPER(item_description_1) NOT IN ('','FREIGHT', 'RETURN LABEL FEE', 'VISIBLE STITCH')
AND (quantity_ordered * unit_price_amount) > 0
AND extended_amount < 1000 --NO BULK ORDERS
AND oms_order_date BETWEEN '2020-01-01' AND '2020-01-10'
AND SUBSTRING(upc,1,6) IN (SELECT item_code FROM item_master_zs WHERE new_division BETWEEN '11' AND '39')
GROUP BY
customer_id
ORDER BY
customer_id
现在的问题是我得到了一些空白的行 r_score,并且最大值是 4 而不是 5..
进行以下查询。
SELECT
customer_id,
NTILE(5) OVER (ORDER BY MAX(oms_order_date)) AS r_score
FROM
mdwh.us_raw.l_dmw_order_report
WHERE
quantity_ordered > 0
AND customer_id IS NOT NULL
AND customer_id != ('')
AND UPPER(line_status) NOT IN ('','RETURN', 'CANCELLED')
AND UPPER(item_description_1) NOT IN ('','FREIGHT', 'RETURN LABEL FEE', 'VISIBLE STITCH')
AND (quantity_ordered * unit_price_amount) > 0
AND extended_amount < 1000 --NO BULK ORDERS
AND oms_order_date BETWEEN '2020-01-01' AND '2020-01-01'
AND SUBSTRING(upc,1,6) IN (SELECT item_code FROM item_master_zs WHERE new_division BETWEEN '11' AND '39')
GROUP BY
customer_id
ORDER BY
customer_id
我在这里所做的就是,在给定一些条件的情况下,为我提供唯一的客户 ID,然后将他们的最新购买日期聚类为五分位数,并在第二列中为我提供分数。但是每次我 运行 查询时, r_score 值一直在变化?我究竟做错了什么..?这是 table 的片段(同样,r_score 值不断变化):
ntile()
的问题在于,它通过在不同的组中放置相同的值来确保组 完全 相同的大小。
因此,我通常手动计算,使用 rank()
:
ceil(rank() over (order by max(oms_order_date)) * 5.0 /
count(*) over ()
) as r_score
如果您使用 row_number()
,您将得到 ntile()
的等价物。
如果您想使用 ntile()
,您可以使用额外的 order by
键,这样排序键是唯一的。
===================
2/17/20 5:18PM 编辑
这是我使用的新代码:
SELECT
customer_id,
CEIL(RANK() OVER (ORDER BY MAX(oms_order_date)) * 5 / COUNT(*) OVER ()) AS r_score,
CEIL(RANK() OVER (ORDER BY COUNT(client_web_order_number)) * 5 / COUNT(*) OVER ()) AS f_score,
CEIL(RANK() OVER (ORDER BY AVG(extended_amount)) * 5 / COUNT(*) OVER ()) AS m_score,
(r_score || f_score || m_score) AS rfm_score
FROM
mdwh.us_raw.l_dmw_order_report t1
WHERE
quantity_ordered > 0
AND customer_id IS NOT NULL
AND customer_id != ('')
AND oms_order_date IS NOT NULL
AND UPPER(line_status) NOT IN ('','RETURN', 'CANCELLED')
AND UPPER(item_description_1) NOT IN ('','FREIGHT', 'RETURN LABEL FEE', 'VISIBLE STITCH')
AND (quantity_ordered * unit_price_amount) > 0
AND extended_amount < 1000 --NO BULK ORDERS
AND oms_order_date BETWEEN '2020-01-01' AND '2020-01-10'
AND SUBSTRING(upc,1,6) IN (SELECT item_code FROM item_master_zs WHERE new_division BETWEEN '11' AND '39')
GROUP BY
customer_id
ORDER BY
customer_id
现在的问题是我得到了一些空白的行 r_score,并且最大值是 4 而不是 5..