如何为最多包含 5 个时间相关成员的每个分区创建平均值?
How to create an average per partitions containing a maximum of 5 time dependent members?
我的目标是 select 平均 恰好 5 条记录,前提是它们满足左连接标准到另一个 table。
假设我们有 table 个(左)记录:
RECNUM ID DATE JOB
1 | cat | 2019.01.01 | meow
2 | dog | 2019.01.01 | bark
我们有 table 两个 (右)记录:
RECNUM ID Action_ID DATE REWARD
1 | cat | 1 | 2019.01.02 | 20
2 | cat | 99 | 2018.12.30 | 1
3 | cat | 23 | 2019.12.28 | 20
4 | cat | 54 | 2018.01.01 | 20
5 | cat | 32 | 2018.01.02 | 20
6 | cat | 21 | 2018.01.03 | 20
7 | cat | 43 | 2018.12.28 | 1
8 | cat | 65 | 2018.12.29 | 1
9 | cat | 87 | 2018.09.12 | 1
10 | cat | 98 | 2018.10.11 | 1
11 | dog | 56 | 2018.09.01 | 99
12 | dog | 42 | 2019.09.02 | 99
结果应该return:
ID | AVG(Reward_from_latest_5_jobs)
cat | 1
满足的条件应该是:
对于 left table 中的每个 JOB,尝试在 right table 并计算它们的平均值。
所以换句话说,狗叫了,我们不知道要给他什么奖励,我们试着统计他最近得到的五次奖励的平均值。
如果找到少于 5 个,则不要 return anything/put null,如果更多,则丢弃最旧的。
我想要的方式是这样的:
SELECT a."ID", COUNT(b."Action_ID"), AVG(b."REWARD")
FROM
(
SELECT "ID", "DATE"
FROM :left_table
) a
LEFT JOIN
(
SELECT "ID", "Action_ID", "DATE", "REWARD"
FROM :right_table
) b
ON(
a."ID" = b."ID"
)
WHERE a."DATE" > b."DATE"
GROUP BY a."ID"
HAVING COUNT(b."Action_ID") >= 5;
但随后它会计算所有符合条件的 Action_ID(s),而不仅仅是最近的五个。你能告诉我如何达到预期的结果吗?我可以使用 sub-tables 而不必在一个 SQL 语句中完成。此用例不允许使用过程。
非常感谢任何意见。
您可以使用 window 函数,然后聚合:
select
id,
avg(reward) avg_reward
from (
select
t1.id,
t2.reward,
count(*) over(partition by t1.id) cnt,
rank() over(partition by t1.id order by t2.date desc) rn
from leftable t1
inner join righttable t2 on t1.id = t2.id and t2.date >= t1.date
) t
where cnt >= 5 and rn <= 5
group by id
内部查询根据您的要求加入 table,对每个 id
的可用记录总数进行 window 计数,并对每个 [=11= 的记录进行排名] 递减 date
.
然后外部查询筛选至少有 5 条记录的 id
s,并为每个 id
.
计算前 5 条记录的平均值
使用window函数获得前5名:
select id, avg(reward)
from (select r.*,
row_number() over (partition by l.id order by r.date desc) as seqnum
from table1 l join
table2 r
on l.id = r.id and l.date > r.date
) r
where seqnum <= 5
group by id
having count(*) >= 5;
然后是一个 having
子句来过滤掉那些没有五行的 ID。
这里是一个连接的方法(如果你想做更多的连接,只需对每个连接重复这个方法
SELECT ONE.ID,
CASE WHEN MAX(J1.RN) < 5 THEN NULL ELSE AVG(J1.REWARD) END AS REWARD_AVG
-- we could also use count
--CASE WHEN COUNT(*) = 5 THEN AVG(J1.REWARD) ELSE NULL END AS REWARD_AVG
FROM TABLE_ONE ONE
JOIN (
SELECT
ID,
REWARD,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATE DESC) AS RN
FROM TABLE_TWO
WHERE TABLE_TWO.DATE < ONE.DATE
) AS J1 ON J1.ID = ONE.ID and RN <= 5 -- take first five only
GROUP BY ONE.ID
我的目标是 select 平均 恰好 5 条记录,前提是它们满足左连接标准到另一个 table。 假设我们有 table 个(左)记录:
RECNUM ID DATE JOB
1 | cat | 2019.01.01 | meow
2 | dog | 2019.01.01 | bark
我们有 table 两个 (右)记录:
RECNUM ID Action_ID DATE REWARD
1 | cat | 1 | 2019.01.02 | 20
2 | cat | 99 | 2018.12.30 | 1
3 | cat | 23 | 2019.12.28 | 20
4 | cat | 54 | 2018.01.01 | 20
5 | cat | 32 | 2018.01.02 | 20
6 | cat | 21 | 2018.01.03 | 20
7 | cat | 43 | 2018.12.28 | 1
8 | cat | 65 | 2018.12.29 | 1
9 | cat | 87 | 2018.09.12 | 1
10 | cat | 98 | 2018.10.11 | 1
11 | dog | 56 | 2018.09.01 | 99
12 | dog | 42 | 2019.09.02 | 99
结果应该return:
ID | AVG(Reward_from_latest_5_jobs)
cat | 1
满足的条件应该是: 对于 left table 中的每个 JOB,尝试在 right table 并计算它们的平均值。 所以换句话说,狗叫了,我们不知道要给他什么奖励,我们试着统计他最近得到的五次奖励的平均值。 如果找到少于 5 个,则不要 return anything/put null,如果更多,则丢弃最旧的。
我想要的方式是这样的:
SELECT a."ID", COUNT(b."Action_ID"), AVG(b."REWARD")
FROM
(
SELECT "ID", "DATE"
FROM :left_table
) a
LEFT JOIN
(
SELECT "ID", "Action_ID", "DATE", "REWARD"
FROM :right_table
) b
ON(
a."ID" = b."ID"
)
WHERE a."DATE" > b."DATE"
GROUP BY a."ID"
HAVING COUNT(b."Action_ID") >= 5;
但随后它会计算所有符合条件的 Action_ID(s),而不仅仅是最近的五个。你能告诉我如何达到预期的结果吗?我可以使用 sub-tables 而不必在一个 SQL 语句中完成。此用例不允许使用过程。 非常感谢任何意见。
您可以使用 window 函数,然后聚合:
select
id,
avg(reward) avg_reward
from (
select
t1.id,
t2.reward,
count(*) over(partition by t1.id) cnt,
rank() over(partition by t1.id order by t2.date desc) rn
from leftable t1
inner join righttable t2 on t1.id = t2.id and t2.date >= t1.date
) t
where cnt >= 5 and rn <= 5
group by id
内部查询根据您的要求加入 table,对每个 id
的可用记录总数进行 window 计数,并对每个 [=11= 的记录进行排名] 递减 date
.
然后外部查询筛选至少有 5 条记录的 id
s,并为每个 id
.
使用window函数获得前5名:
select id, avg(reward)
from (select r.*,
row_number() over (partition by l.id order by r.date desc) as seqnum
from table1 l join
table2 r
on l.id = r.id and l.date > r.date
) r
where seqnum <= 5
group by id
having count(*) >= 5;
然后是一个 having
子句来过滤掉那些没有五行的 ID。
这里是一个连接的方法(如果你想做更多的连接,只需对每个连接重复这个方法
SELECT ONE.ID,
CASE WHEN MAX(J1.RN) < 5 THEN NULL ELSE AVG(J1.REWARD) END AS REWARD_AVG
-- we could also use count
--CASE WHEN COUNT(*) = 5 THEN AVG(J1.REWARD) ELSE NULL END AS REWARD_AVG
FROM TABLE_ONE ONE
JOIN (
SELECT
ID,
REWARD,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATE DESC) AS RN
FROM TABLE_TWO
WHERE TABLE_TWO.DATE < ONE.DATE
) AS J1 ON J1.ID = ONE.ID and RN <= 5 -- take first five only
GROUP BY ONE.ID