如何使用 MYSQL 获取每个用户的最后 n 天分数变化以及多列的排名?
How to get the last n day score change for each user along with rank for multiple columns using MYSQL?
我有一个 MYSQL 数据库,用于跟踪所有用户的每日总分(以及一些其他类似 score/count 类型的指标,如“badgesEarned”,我在这里只包含 2 个字段我需要跟踪的 5 个中的一个)。它仅包含用户活跃日期(获得分数或徽章)的数据。所以数据库不会有每个日期的数据。
这是一个玩具示例:
Example Database Table: "User"
现在我的目标是获取每个用户最近 7 天的分数变化(我还需要做最后 30 天和 365 天,但在本例中我们只使用 7 天)。由于数据库 table 存储了每个用户所有活跃天数的总分快照,我写了一个 SQL 查询来找到两个合适的 rows/snapshots 并得到 [=47= 中的差异] 它们之间。这两行将是当前日期行(或者如果不存在,则使用它之前的行)与第 (current_date - 7) 行(或者如果不存在,则使用之前的行)它)。
更糟糕的是,我还必须通过 dense_rank() SQL 方法跟踪每个玩家的“等级”,并将其作为列添加到最终结果中table.
到目前为止,我可以通过 2 种不同的 SQL 查询实现此目的。
我的主要问题是 - 就performance/good practice/efficiency而言,其中一个比另一个“更好”吗?还是它们都很可怕,而我一开始就完全走错了路,完全错过了更有效的方法?我对 SQL 的东西不是很好,所以如果问题和代码示例令人恐惧,请提前道歉:
第一种方法:
仅使用多个嵌套子查询(无连接)。
SELECT *, dense_rank() OVER (ORDER BY t3.score DESC) AS ranking
FROM
(
SELECT t1.userId,
(SELECT t2.score
FROM User t2
WHERE t2.date <= CURDATE() AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1)
-
(SELECT t2.score
FROM User t2
WHERE t2.date <= DATE_ADD(CURDATE(), INTERVAL - 7 DAY) AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1) as score,
(SELECT t2.badgesEarned
FROM User t2
WHERE t2.date <= CURDATE() AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1)
-
(SELECT t2.badgesEarned
FROM User t2
WHERE t2.date <= DATE_ADD(CURDATE(), INTERVAL - 7 DAY) AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1) as badgesEarned
FROM User t1
GROUP BY t1.userId) t3
第二种方法:
为每个日期点获取 2 个单独的 table,然后执行内部联接以减去相关列。
SELECT *, dense_rank() OVER (ORDER BY T0.score_delta DESC) AS ranking
FROM
(SELECT T1.userId,
(T1.score - T2.score),
(T1.badgesEarned - T2.badgesEarned)
FROM
(select *
from (
select *, row_number() over (partition by userId order by date desc) as ranking
from User
where date<=date_add(CURDATE(),interval -7 day)
) t
where t.ranking = 1) as T2
INNER JOIN
(select *
from (
select *, row_number() over (partition by userId order by date desc) as ranking
from User
where date<=CURDATE()
) t
where t.ranking = 1) as T1
on T1.userId= T2.userId ) T0
附带问题: 我的一位同事建议我在代码本身中处理列减法 - 例如,我会调用数据库两次,得到两个 tables(一个用于 CURDATE(),另一个用于 CURDATE-7),然后遍历所有用户对象并减去相关字段以构建我的最终结果列表。我不确定这是否是更好的方法,所以我应该这样做而不是通过 SQL 方式处理它吗?
如果您想玩弄虚拟数据,这里是数据库的 SQL小提琴:http://sqlfiddle.com/#!9/86c58f0/1
此外,上面的两个代码段 运行 在我的 MySQL 8.0 workbench 上没问题。
我不太明白你的预期结果。但是你能不能只使用 window 函数,结合 RANGE 子句?
我只是在创建中央backbonetable,然后由你来减去你需要相互减去的东西,最后到dense_rank () 你需要dense_rank()。基本上,我认为您需要将包含 DENSE_RANK()
的最终 select 放入我的 with_a_week_before
in-line table.[=13 中的 select =]
WITH
-- your input
usr(userid,dt,score,badgesearned) AS (
SELECT 1234,DATE '2020-08-06', 100, 10
UNION ALL SELECT 1234,DATE '2020-08-07', 120, 12
UNION ALL SELECT 1234,DATE '2020-08-08', 130, 13
UNION ALL SELECT 1234,DATE '2020-08-12', 140, 14
UNION ALL SELECT 1234,DATE '2020-08-14', 150, 15
UNION ALL SELECT 100,DATE '2020-08-05', 100, 10
UNION ALL SELECT 100,DATE '2020-08-10', 100, 10
UNION ALL SELECT 100,DATE '2020-08-14', 200, 10
UNION ALL SELECT 1,DATE '2020-08-05', 140, 14
UNION ALL SELECT 1,DATE '2020-08-08', 145, 14
UNION ALL SELECT 1,DATE '2020-08-12', 150, 15
)
,
with_a_week_before AS (
SELECT
*
, FIRST_VALUE(score) OVER(
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS score_a_week
, FIRST_VALUE(badgesearned) OVER(
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS badgesearned_a_week
, FIRST_VALUE(dt) OVER( -- check the date of the previous row
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS dt_a_week
FROM usr
)
SELECT * FROM with_a_week_before ORDER BY userid
-- out userid | dt | score | badgesearned | score_a_week | badgesearned_a_week | dt_a_week
-- out --------+------------+-------+--------------+--------------+---------------------+------------
-- out 1 | 2020-08-05 | 140 | 14 | 140 | 14 | 2020-08-05
-- out 1 | 2020-08-08 | 145 | 14 | 140 | 14 | 2020-08-05
-- out 1 | 2020-08-12 | 150 | 15 | 140 | 14 | 2020-08-05
-- out 100 | 2020-08-05 | 100 | 10 | 100 | 10 | 2020-08-05
-- out 100 | 2020-08-10 | 100 | 10 | 100 | 10 | 2020-08-05
-- out 100 | 2020-08-14 | 200 | 10 | 100 | 10 | 2020-08-10
-- out 1234 | 2020-08-06 | 100 | 10 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-07 | 120 | 12 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-08 | 130 | 13 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-12 | 140 | 14 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-14 | 150 | 15 | 120 | 12 | 2020-08-07
我有一个 MYSQL 数据库,用于跟踪所有用户的每日总分(以及一些其他类似 score/count 类型的指标,如“badgesEarned”,我在这里只包含 2 个字段我需要跟踪的 5 个中的一个)。它仅包含用户活跃日期(获得分数或徽章)的数据。所以数据库不会有每个日期的数据。
这是一个玩具示例: Example Database Table: "User"
现在我的目标是获取每个用户最近 7 天的分数变化(我还需要做最后 30 天和 365 天,但在本例中我们只使用 7 天)。由于数据库 table 存储了每个用户所有活跃天数的总分快照,我写了一个 SQL 查询来找到两个合适的 rows/snapshots 并得到 [=47= 中的差异] 它们之间。这两行将是当前日期行(或者如果不存在,则使用它之前的行)与第 (current_date - 7) 行(或者如果不存在,则使用之前的行)它)。
更糟糕的是,我还必须通过 dense_rank() SQL 方法跟踪每个玩家的“等级”,并将其作为列添加到最终结果中table.
到目前为止,我可以通过 2 种不同的 SQL 查询实现此目的。
我的主要问题是 - 就performance/good practice/efficiency而言,其中一个比另一个“更好”吗?还是它们都很可怕,而我一开始就完全走错了路,完全错过了更有效的方法?我对 SQL 的东西不是很好,所以如果问题和代码示例令人恐惧,请提前道歉:
第一种方法: 仅使用多个嵌套子查询(无连接)。
SELECT *, dense_rank() OVER (ORDER BY t3.score DESC) AS ranking
FROM
(
SELECT t1.userId,
(SELECT t2.score
FROM User t2
WHERE t2.date <= CURDATE() AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1)
-
(SELECT t2.score
FROM User t2
WHERE t2.date <= DATE_ADD(CURDATE(), INTERVAL - 7 DAY) AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1) as score,
(SELECT t2.badgesEarned
FROM User t2
WHERE t2.date <= CURDATE() AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1)
-
(SELECT t2.badgesEarned
FROM User t2
WHERE t2.date <= DATE_ADD(CURDATE(), INTERVAL - 7 DAY) AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1) as badgesEarned
FROM User t1
GROUP BY t1.userId) t3
第二种方法: 为每个日期点获取 2 个单独的 table,然后执行内部联接以减去相关列。
SELECT *, dense_rank() OVER (ORDER BY T0.score_delta DESC) AS ranking
FROM
(SELECT T1.userId,
(T1.score - T2.score),
(T1.badgesEarned - T2.badgesEarned)
FROM
(select *
from (
select *, row_number() over (partition by userId order by date desc) as ranking
from User
where date<=date_add(CURDATE(),interval -7 day)
) t
where t.ranking = 1) as T2
INNER JOIN
(select *
from (
select *, row_number() over (partition by userId order by date desc) as ranking
from User
where date<=CURDATE()
) t
where t.ranking = 1) as T1
on T1.userId= T2.userId ) T0
附带问题: 我的一位同事建议我在代码本身中处理列减法 - 例如,我会调用数据库两次,得到两个 tables(一个用于 CURDATE(),另一个用于 CURDATE-7),然后遍历所有用户对象并减去相关字段以构建我的最终结果列表。我不确定这是否是更好的方法,所以我应该这样做而不是通过 SQL 方式处理它吗?
如果您想玩弄虚拟数据,这里是数据库的 SQL小提琴:http://sqlfiddle.com/#!9/86c58f0/1
此外,上面的两个代码段 运行 在我的 MySQL 8.0 workbench 上没问题。
我不太明白你的预期结果。但是你能不能只使用 window 函数,结合 RANGE 子句?
我只是在创建中央backbonetable,然后由你来减去你需要相互减去的东西,最后到dense_rank () 你需要dense_rank()。基本上,我认为您需要将包含 DENSE_RANK()
的最终 select 放入我的 with_a_week_before
in-line table.[=13 中的 select =]
WITH
-- your input
usr(userid,dt,score,badgesearned) AS (
SELECT 1234,DATE '2020-08-06', 100, 10
UNION ALL SELECT 1234,DATE '2020-08-07', 120, 12
UNION ALL SELECT 1234,DATE '2020-08-08', 130, 13
UNION ALL SELECT 1234,DATE '2020-08-12', 140, 14
UNION ALL SELECT 1234,DATE '2020-08-14', 150, 15
UNION ALL SELECT 100,DATE '2020-08-05', 100, 10
UNION ALL SELECT 100,DATE '2020-08-10', 100, 10
UNION ALL SELECT 100,DATE '2020-08-14', 200, 10
UNION ALL SELECT 1,DATE '2020-08-05', 140, 14
UNION ALL SELECT 1,DATE '2020-08-08', 145, 14
UNION ALL SELECT 1,DATE '2020-08-12', 150, 15
)
,
with_a_week_before AS (
SELECT
*
, FIRST_VALUE(score) OVER(
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS score_a_week
, FIRST_VALUE(badgesearned) OVER(
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS badgesearned_a_week
, FIRST_VALUE(dt) OVER( -- check the date of the previous row
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS dt_a_week
FROM usr
)
SELECT * FROM with_a_week_before ORDER BY userid
-- out userid | dt | score | badgesearned | score_a_week | badgesearned_a_week | dt_a_week
-- out --------+------------+-------+--------------+--------------+---------------------+------------
-- out 1 | 2020-08-05 | 140 | 14 | 140 | 14 | 2020-08-05
-- out 1 | 2020-08-08 | 145 | 14 | 140 | 14 | 2020-08-05
-- out 1 | 2020-08-12 | 150 | 15 | 140 | 14 | 2020-08-05
-- out 100 | 2020-08-05 | 100 | 10 | 100 | 10 | 2020-08-05
-- out 100 | 2020-08-10 | 100 | 10 | 100 | 10 | 2020-08-05
-- out 100 | 2020-08-14 | 200 | 10 | 100 | 10 | 2020-08-10
-- out 1234 | 2020-08-06 | 100 | 10 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-07 | 120 | 12 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-08 | 130 | 13 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-12 | 140 | 14 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-14 | 150 | 15 | 120 | 12 | 2020-08-07