根据给定日期的最大状态计数,使用分组数据
Count based on the max status on a given date, with grouped data
我的示例是一个票务系统,保存状态更新条目和创建票证。
Fiddle:
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=a5ff4600adbab185eb14b08586f1bd29
ID
TICKETID
STATUS
TICKET_CREATED
STATUS_CHANGED
1
1
other_error
01-JAN-20
01-JAN-20 08.00.00
2
2
tech_error
01-JAN-20
01-JAN-20 09.00.00
3
3
unknown
01-JAN-20
01-JAN-20 09.10.00
4
4
unknown
01-JAN-20
01-JAN-20 09.20.00
5
4
tech_error
01-JAN-20
02-JAN-20 09.30.00
6
1
solved
01-JAN-20
02-JAN-20 10.00.00
7
2
solved
01-JAN-20
02-JAN-20 07.00.00
8
5
tech_error
02-JAN-20
02-JAN-20 08.00.00
9
6
unknown
02-JAN-20
02-JAN-20 08.30.00
10
6
solved
02-JAN-20
02-JAN-20 09.30.00
11
5
solved
02-JAN-20
03-JAN-20 08.00.00
12
4
unknown
01-JAN-20
03-JAN-20 09.00.00
我想根据工单创建日期来评估数据,获取特定日期的三件事:
- (完成) 在给定日期总共创建了多少票
- (完成) 在给定日期
在状态 'unknown' 中创建了多少票
- (未完成) 在给定日期有多少票处于 'unknown' 状态?棘手!因为重要的是给定日期午夜以下最大
STATUS_CHANGED
的状态。
2021 年 1 月 1 日的预期结果:
TICKET_CREATED
Total Created
Tickets created in Unknown status
Total tickets in Unknown status
01-JAN-20
4
2
2
解释:20 年 1 月 1 日,工单 3 和 4 在当天结束时处于 'unknown' 状态
2021 年 1 月 2 日的预期结果:
TICKET_CREATED
Total Created
Tickets created in Unknown status
Total tickets in Unknown status
02-JAN-20
2
1
1
解释:在 2020 年 1 月 2 日,只有工单 3 在当天结束时处于 'unknown' 状态
第 1 + 2 部分的当前解决方案:
select ticket_created,
count(*) as "Total Created",
sum(case when status = 'unknown' then 1 else 0 end) as "Unknown tickets created",
'?' as "Total tickets in Unknown status"
from myTable
where id in
(select min(id) as id
from myTable
where ticket_created = to_date('01.01.2020', 'DD.MM.YYYY')
group by ticketid)
group by ticket_created
你能给我一些关于如何处理第 3 点的提示吗?
假设我正确理解了您的逻辑,这就是我实现您的目标的方式:
with ticket_info as (select id,
ticketid,
status,
ticket_created,
status_changed,
row_number() over (partition by ticketid, trunc(status_changed) order by status_changed desc) rn_per_id_day_desc,
row_number() over (partition by ticketid order by status_changed) rn_per_id_asc
from mytable)
select ticket_created,
count(distinct case when trunc(ticket_created) = to_date('01/01/2020', 'dd/mm/yyyy') then ticketid end) as "Total Created",
count(case when rn_per_id_asc = 1 and status = 'unknown' then 1 end) as "Unknown tickets created",
count(case when rn_per_id_day_desc = 1 and status = 'unknown' then 1 end) as "Total tickets in Unknown status"
from ticket_info
where status_changed >= to_timestamp('01/01/2020', 'dd/mm/yyyy')
and status_changed < to_timestamp('01/01/2020', 'dd/mm/yyyy') + interval '1' day
group by ticket_created;
你可以看到,首先,我使用了几个 row_number()
分析函数来为行提供标签 - 一个按照行的更改顺序为每个 id 标记行(这允许我们识别每个 id 的第一行,即票证创建的行),另一个按降序标记每个 id 和天的行(这允许我们识别每个 id 当天的最后一行)。
使用该信息,我们可以计算出您的所有三个案例:
- 一天创建的工单 - 此处我使用了不同的计数,但您可以将其更改为
count(case when rn_per_id_asc = 1 then 1 end)
,这可能更有效且更容易理解。
- 当天创建的工单为“未知”- 此处我使用了条件计数:如果它是第一行且状态未知,则对其进行计数
- 在一天结束时处于“未知”状态的工单 - 这里我使用了另一个条件计数:如果它是当天的最后一行并且状态未知,请计算它。
预计到达时间:第三部分的逻辑被修改为计算当天结束时状态未知的有效工单,我认为这应该可以解决问题:
with date_of_interest as (select start_date + level -1 dt,
start_date + level next_dt
from (select to_date('01/01/2020', 'dd/mm/yyyy') start_date,
to_date('03/01/2020', 'dd/mm/yyyy') end_date
from dual)
connect by level <= (end_date - start_date) + 1),
ticket_info as (select mt.id,
mt.ticketid,
mt.status,
mt.ticket_created,
mt.status_changed,
row_number() over (partition by mt.ticketid, doi.dt order by mt.status_changed) rn_per_id_asc,
row_number() over (partition by mt.ticketid, doi.dt order by mt.status_changed desc) rn_per_id_desc,
doi.dt,
doi.next_dt
from mytable mt
inner join date_of_interest doi on mt.status_changed < doi.next_dt
)
select dt,
count(case when ticket_created = dt and rn_per_id_asc = 1 then 1 end) as "Total Created",
count(case when ticket_created = dt and rn_per_id_asc = 1 and status = 'unknown' then 1 end) as "Unknown tickets created",
count(case when rn_per_id_desc = 1 and status = 'unknown' then 1 end) as "Total tickets in Unknown status"
from ticket_info
group by dt
order by dt;
您会注意到我已经将查询更新为 运行 多天 - 如果查询一次只针对一个日期 运行 ,您可以替换date_of_interest 像这样的子查询:
with date_of_interest as (select dt,
dt + 1 next_dt
from (select to_date('03/01/2020', 'dd/mm/yyyy') dt
from dual)),
已更新db<>fiddle
N.B。这不会是最有效的做事方式;随着时间的推移,随着越来越多的记录出现,查询会变慢。如果你能想出一种方法来轻松识别活动工单,尤其是如果你能在索引中获取该信息,那就更好了。
这是一个单独计算第 3 个指标的解决方案。
然后将它们加入您已经知道的指标。
with cte_ranges as (
select id, status, ticketid, ticket_created
, status_changed as started
, coalesce(
lead(status_changed) over (partition by ticketid order by status_changed)
, current_timestamp) as ended
from myTable
where trunc(ticket_created) between DATE'2020-01-01' and DATE'2020-01-02'
)
select q.ticket_date as "Ticket Created"
, q.total_tickets as "Total Created"
, q.total_unknown as "Unknown tickets created"
, endofday.total_unknown "Total tickets in Unknown status"
from
(
select trunc(t.ticket_created) as ticket_date
, count(distinct t.ticketid) as total_tickets
, count(distinct case when t.status = 'unknown' then t.ticketid end) as total_unknown
from cte_ranges t
group by trunc(t.ticket_created)
) q
left join (
select trunc(cast(dt as date)) as ticket_date
, count(distinct case when status = 'unknown' then ticketid end) as total_unknown
from cte_ranges
join (
select distinct
cast(trunc(ticket_created)+1 as timestamp) - interval '1' second as dt
from cte_ranges
) cutoff on dt between started and ended
group by cast(dt as date)
) endofday
on endofday.ticket_date = q.ticket_date;
Ticket Created
Total Created
Unknown tickets created
Total tickets in Unknown status
01-JAN-20
4
2
2
02-JAN-20
2
1
1
db<>fiddle here
诀窍是首先使用 LEAD
来计算状态处于活动状态的范围。
然后将截止时间(一天的最后一分钟)加入这些范围。
这样你就可以得到状态仍然有效的天数。
两个子查询都使用了 CTE。
这样您只需更改 CTE 中的日期标准。
我的示例是一个票务系统,保存状态更新条目和创建票证。
Fiddle: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=a5ff4600adbab185eb14b08586f1bd29
ID | TICKETID | STATUS | TICKET_CREATED | STATUS_CHANGED |
---|---|---|---|---|
1 | 1 | other_error | 01-JAN-20 | 01-JAN-20 08.00.00 |
2 | 2 | tech_error | 01-JAN-20 | 01-JAN-20 09.00.00 |
3 | 3 | unknown | 01-JAN-20 | 01-JAN-20 09.10.00 |
4 | 4 | unknown | 01-JAN-20 | 01-JAN-20 09.20.00 |
5 | 4 | tech_error | 01-JAN-20 | 02-JAN-20 09.30.00 |
6 | 1 | solved | 01-JAN-20 | 02-JAN-20 10.00.00 |
7 | 2 | solved | 01-JAN-20 | 02-JAN-20 07.00.00 |
8 | 5 | tech_error | 02-JAN-20 | 02-JAN-20 08.00.00 |
9 | 6 | unknown | 02-JAN-20 | 02-JAN-20 08.30.00 |
10 | 6 | solved | 02-JAN-20 | 02-JAN-20 09.30.00 |
11 | 5 | solved | 02-JAN-20 | 03-JAN-20 08.00.00 |
12 | 4 | unknown | 01-JAN-20 | 03-JAN-20 09.00.00 |
我想根据工单创建日期来评估数据,获取特定日期的三件事:
- (完成) 在给定日期总共创建了多少票
- (完成) 在给定日期 在状态 'unknown' 中创建了多少票
- (未完成) 在给定日期有多少票处于 'unknown' 状态?棘手!因为重要的是给定日期午夜以下最大
STATUS_CHANGED
的状态。
2021 年 1 月 1 日的预期结果:
TICKET_CREATED | Total Created | Tickets created in Unknown status | Total tickets in Unknown status |
---|---|---|---|
01-JAN-20 | 4 | 2 | 2 |
解释:20 年 1 月 1 日,工单 3 和 4 在当天结束时处于 'unknown' 状态
2021 年 1 月 2 日的预期结果:
TICKET_CREATED | Total Created | Tickets created in Unknown status | Total tickets in Unknown status |
---|---|---|---|
02-JAN-20 | 2 | 1 | 1 |
解释:在 2020 年 1 月 2 日,只有工单 3 在当天结束时处于 'unknown' 状态
第 1 + 2 部分的当前解决方案:
select ticket_created,
count(*) as "Total Created",
sum(case when status = 'unknown' then 1 else 0 end) as "Unknown tickets created",
'?' as "Total tickets in Unknown status"
from myTable
where id in
(select min(id) as id
from myTable
where ticket_created = to_date('01.01.2020', 'DD.MM.YYYY')
group by ticketid)
group by ticket_created
你能给我一些关于如何处理第 3 点的提示吗?
假设我正确理解了您的逻辑,这就是我实现您的目标的方式:
with ticket_info as (select id,
ticketid,
status,
ticket_created,
status_changed,
row_number() over (partition by ticketid, trunc(status_changed) order by status_changed desc) rn_per_id_day_desc,
row_number() over (partition by ticketid order by status_changed) rn_per_id_asc
from mytable)
select ticket_created,
count(distinct case when trunc(ticket_created) = to_date('01/01/2020', 'dd/mm/yyyy') then ticketid end) as "Total Created",
count(case when rn_per_id_asc = 1 and status = 'unknown' then 1 end) as "Unknown tickets created",
count(case when rn_per_id_day_desc = 1 and status = 'unknown' then 1 end) as "Total tickets in Unknown status"
from ticket_info
where status_changed >= to_timestamp('01/01/2020', 'dd/mm/yyyy')
and status_changed < to_timestamp('01/01/2020', 'dd/mm/yyyy') + interval '1' day
group by ticket_created;
你可以看到,首先,我使用了几个 row_number()
分析函数来为行提供标签 - 一个按照行的更改顺序为每个 id 标记行(这允许我们识别每个 id 的第一行,即票证创建的行),另一个按降序标记每个 id 和天的行(这允许我们识别每个 id 当天的最后一行)。
使用该信息,我们可以计算出您的所有三个案例:
- 一天创建的工单 - 此处我使用了不同的计数,但您可以将其更改为
count(case when rn_per_id_asc = 1 then 1 end)
,这可能更有效且更容易理解。 - 当天创建的工单为“未知”- 此处我使用了条件计数:如果它是第一行且状态未知,则对其进行计数
- 在一天结束时处于“未知”状态的工单 - 这里我使用了另一个条件计数:如果它是当天的最后一行并且状态未知,请计算它。
预计到达时间:第三部分的逻辑被修改为计算当天结束时状态未知的有效工单,我认为这应该可以解决问题:
with date_of_interest as (select start_date + level -1 dt,
start_date + level next_dt
from (select to_date('01/01/2020', 'dd/mm/yyyy') start_date,
to_date('03/01/2020', 'dd/mm/yyyy') end_date
from dual)
connect by level <= (end_date - start_date) + 1),
ticket_info as (select mt.id,
mt.ticketid,
mt.status,
mt.ticket_created,
mt.status_changed,
row_number() over (partition by mt.ticketid, doi.dt order by mt.status_changed) rn_per_id_asc,
row_number() over (partition by mt.ticketid, doi.dt order by mt.status_changed desc) rn_per_id_desc,
doi.dt,
doi.next_dt
from mytable mt
inner join date_of_interest doi on mt.status_changed < doi.next_dt
)
select dt,
count(case when ticket_created = dt and rn_per_id_asc = 1 then 1 end) as "Total Created",
count(case when ticket_created = dt and rn_per_id_asc = 1 and status = 'unknown' then 1 end) as "Unknown tickets created",
count(case when rn_per_id_desc = 1 and status = 'unknown' then 1 end) as "Total tickets in Unknown status"
from ticket_info
group by dt
order by dt;
您会注意到我已经将查询更新为 运行 多天 - 如果查询一次只针对一个日期 运行 ,您可以替换date_of_interest 像这样的子查询:
with date_of_interest as (select dt,
dt + 1 next_dt
from (select to_date('03/01/2020', 'dd/mm/yyyy') dt
from dual)),
已更新db<>fiddle
N.B。这不会是最有效的做事方式;随着时间的推移,随着越来越多的记录出现,查询会变慢。如果你能想出一种方法来轻松识别活动工单,尤其是如果你能在索引中获取该信息,那就更好了。
这是一个单独计算第 3 个指标的解决方案。
然后将它们加入您已经知道的指标。
with cte_ranges as ( select id, status, ticketid, ticket_created , status_changed as started , coalesce( lead(status_changed) over (partition by ticketid order by status_changed) , current_timestamp) as ended from myTable where trunc(ticket_created) between DATE'2020-01-01' and DATE'2020-01-02' ) select q.ticket_date as "Ticket Created" , q.total_tickets as "Total Created" , q.total_unknown as "Unknown tickets created" , endofday.total_unknown "Total tickets in Unknown status" from ( select trunc(t.ticket_created) as ticket_date , count(distinct t.ticketid) as total_tickets , count(distinct case when t.status = 'unknown' then t.ticketid end) as total_unknown from cte_ranges t group by trunc(t.ticket_created) ) q left join ( select trunc(cast(dt as date)) as ticket_date , count(distinct case when status = 'unknown' then ticketid end) as total_unknown from cte_ranges join ( select distinct cast(trunc(ticket_created)+1 as timestamp) - interval '1' second as dt from cte_ranges ) cutoff on dt between started and ended group by cast(dt as date) ) endofday on endofday.ticket_date = q.ticket_date;
Ticket Created | Total Created | Unknown tickets created | Total tickets in Unknown status |
---|---|---|---|
01-JAN-20 | 4 | 2 | 2 |
02-JAN-20 | 2 | 1 | 1 |
db<>fiddle here
诀窍是首先使用 LEAD
来计算状态处于活动状态的范围。
然后将截止时间(一天的最后一分钟)加入这些范围。
这样你就可以得到状态仍然有效的天数。
两个子查询都使用了 CTE。 这样您只需更改 CTE 中的日期标准。