根据给定日期的最大状态计数,使用分组数据

Count based on the max status on a given date, with grouped data

我的示例是一个票务系统,保存状态更新条目和创建票证。

Fiddle: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=a5ff4600adbab185eb14b08586f1bd29

ID TICKETID STATUS TICKET_CREATED STATUS_CHANGED
1 1 other_error 01-JAN-20 01-JAN-20 08.00.00
2 2 tech_error 01-JAN-20 01-JAN-20 09.00.00
3 3 unknown 01-JAN-20 01-JAN-20 09.10.00
4 4 unknown 01-JAN-20 01-JAN-20 09.20.00
5 4 tech_error 01-JAN-20 02-JAN-20 09.30.00
6 1 solved 01-JAN-20 02-JAN-20 10.00.00
7 2 solved 01-JAN-20 02-JAN-20 07.00.00
8 5 tech_error 02-JAN-20 02-JAN-20 08.00.00
9 6 unknown 02-JAN-20 02-JAN-20 08.30.00
10 6 solved 02-JAN-20 02-JAN-20 09.30.00
11 5 solved 02-JAN-20 03-JAN-20 08.00.00
12 4 unknown 01-JAN-20 03-JAN-20 09.00.00

我想根据工单创建日期来评估数据,获取特定日期的三件事:

  1. (完成) 在给定日期总共创建了多少票
  2. (完成) 在给定日期
  3. 在状态 'unknown' 中创建了多少票
  4. (未完成) 在给定日期有多少票处于 'unknown' 状态?棘手!因为重要的是给定日期午夜以下最大 STATUS_CHANGED 的状态。

2021 年 1 月 1 日的预期结果:

TICKET_CREATED Total Created Tickets created in Unknown status Total tickets in Unknown status
01-JAN-20 4 2 2

解释:20 年 1 月 1 日,工单 3 和 4 在当天结束时处于 'unknown' 状态

2021 年 1 月 2 日的预期结果:

TICKET_CREATED Total Created Tickets created in Unknown status Total tickets in Unknown status
02-JAN-20 2 1 1

解释:在 2020 年 1 月 2 日,只有工单 3 在当天结束时处于 'unknown' 状态

第 1 + 2 部分的当前解决方案:

select ticket_created, 
count(*) as "Total Created",
sum(case when status = 'unknown' then 1 else 0 end) as "Unknown tickets created",
'?' as "Total tickets in Unknown status"
from myTable
where id in
    (select min(id) as id
    from myTable
    where ticket_created = to_date('01.01.2020', 'DD.MM.YYYY')
    group by ticketid)
group by ticket_created

你能给我一些关于如何处理第 3 点的提示吗?

假设我正确理解了您的逻辑,这就是我实现您的目标的方式:

with ticket_info as (select id,
                            ticketid,
                            status,
                            ticket_created,
                            status_changed,
                            row_number() over (partition by ticketid, trunc(status_changed) order by status_changed desc) rn_per_id_day_desc,
                            row_number() over (partition by ticketid order by status_changed) rn_per_id_asc
                     from   mytable)
select ticket_created,
       count(distinct case when trunc(ticket_created) = to_date('01/01/2020', 'dd/mm/yyyy') then ticketid end) as "Total Created",
       count(case when rn_per_id_asc = 1 and status = 'unknown' then 1 end) as "Unknown tickets created",
       count(case when rn_per_id_day_desc = 1 and status = 'unknown' then 1 end) as "Total tickets in Unknown status"
from   ticket_info
where  status_changed >= to_timestamp('01/01/2020', 'dd/mm/yyyy')
and    status_changed < to_timestamp('01/01/2020', 'dd/mm/yyyy') + interval '1' day
group by ticket_created;

db<>fiddle

你可以看到,首先,我使用了几个 row_number() 分析函数来为行提供标签 - 一个按照行的更改顺序为每个 id 标记行(这允许我们识别每个 id 的第一行,即票证创建的行),另一个按降序标记每个 id 和天的行(这允许我们识别每个 id 当天的最后一行)。

使用该信息,我们可以计算出您的所有三个案例:

  1. 一天创建的工单 - 此处我使用了不同的计数,但您可以将其更改为 count(case when rn_per_id_asc = 1 then 1 end),这可能更有效且更容易理解。
  2. 当天创建的工单为“未知”- 此处我使用了条件计数:如果它是第一行且状态未知,则对其进行计数
  3. 在一天结束时处于“未知”状态的工单 - 这里我使用了另一个条件计数:如果它是当天的最后一行并且状态未知,请计算它。

预计到达时间:第三部分的逻辑被修改为计算当天结束时状态未知的有效工单,我认为这应该可以解决问题:

with date_of_interest as (select start_date + level -1 dt,
                                 start_date + level next_dt
                          from   (select to_date('01/01/2020', 'dd/mm/yyyy') start_date,
                                         to_date('03/01/2020', 'dd/mm/yyyy') end_date
                                  from   dual)
                          connect by level <= (end_date - start_date) + 1),
          ticket_info as (select mt.id,
                                 mt.ticketid,
                                 mt.status,
                                 mt.ticket_created,
                                 mt.status_changed,
                                 row_number() over (partition by mt.ticketid, doi.dt order by mt.status_changed) rn_per_id_asc,
                                 row_number() over (partition by mt.ticketid, doi.dt order by mt.status_changed desc) rn_per_id_desc,
                                 doi.dt,
                                 doi.next_dt
                          from   mytable mt
                                 inner join date_of_interest doi on mt.status_changed < doi.next_dt
                          )
select dt,
       count(case when ticket_created = dt and rn_per_id_asc = 1 then 1 end) as "Total Created",
       count(case when ticket_created = dt and rn_per_id_asc = 1 and status = 'unknown' then 1 end) as "Unknown tickets created",
       count(case when rn_per_id_desc = 1 and status = 'unknown' then 1 end) as "Total tickets in Unknown status"
from   ticket_info
group by dt
order by dt;

您会注意到我已经将查询更新为 运行 多天 - 如果查询一次只针对一个日期 运行 ,您可以替换date_of_interest 像这样的子查询:

with date_of_interest as (select dt,
                                 dt + 1 next_dt
                          from   (select to_date('03/01/2020', 'dd/mm/yyyy') dt
                                  from   dual)),

已更新db<>fiddle

N.B。这不会是最有效的做事方式;随着时间的推移,随着越来越多的记录出现,查询会变慢。如果你能想出一种方法来轻松识别活动工单,尤其是如果你能在索引中获取该信息,那就更好了。

这是一个单独计算第 3 个指标的解决方案。
然后将它们加入您已经知道的指标。

with cte_ranges as (
  select id, status, ticketid, ticket_created
  , status_changed as started
  , coalesce(
     lead(status_changed) over (partition by ticketid order by status_changed)
    , current_timestamp) as ended
  from myTable
  where trunc(ticket_created) between DATE'2020-01-01' and DATE'2020-01-02'
)
select q.ticket_date   as "Ticket Created"
     , q.total_tickets as "Total Created"
     , q.total_unknown as "Unknown tickets created"
     , endofday.total_unknown "Total tickets in Unknown status"
from
(
  select trunc(t.ticket_created) as ticket_date
  , count(distinct t.ticketid) as total_tickets
  , count(distinct case when t.status = 'unknown' then t.ticketid end) as total_unknown
  from cte_ranges t
  group by trunc(t.ticket_created) 
) q
left join (
  select trunc(cast(dt as date)) as ticket_date
  , count(distinct case when status = 'unknown' then ticketid end) as total_unknown
  from cte_ranges
  join (
    select distinct 
     cast(trunc(ticket_created)+1 as timestamp) - interval '1' second as dt 
    from cte_ranges
  ) cutoff on dt between started and ended
  group by cast(dt as date)
) endofday 
on endofday.ticket_date = q.ticket_date;
Ticket Created Total Created Unknown tickets created Total tickets in Unknown status
01-JAN-20 4 2 2
02-JAN-20 2 1 1

db<>fiddle here

诀窍是首先使用 LEAD 来计算状态处于活动状态的范围。

然后将截止时间(一天的最后一分钟)加入这些范围。
这样你就可以得到状态仍然有效的天数。

两个子查询都使用了 CTE。 这样您只需更改 CTE 中的日期标准。