SQL 查询以获取 id 内的日期对

SQL query to get date pairs within id

我有一个包含以下行的 table:

    | item_id | change_type | change_date | change_id | other columns...
    | :------ | :---------- | :---------- | :-------- |
    |     123 |         off |  2019-06-04 |       321 |
    |     123 |          on |  2019-07-11 |       741 |
    |     123 |         off |  2019-07-13 |       987 |
    |     123 |          on |  2019-08-01 |       951 |
    |     123 |         off |  2019-08-07 |       357 |
    |     456 |         off |  2019-08-01 |       125 |
    |     456 |          on |  2019-11-18 |       878 |
    |     789 |          on |  2019-12-18 |       373 |
    |     012 |         off |  2019-12-25 |       654 |
    |     698 |         off |  2019-08-01 |       741 |
    |     698 |          on |  2018-01-03 |       147 |

我正在尝试 运行 产生以下结果的查询:

    | item_id | on_date    | off_date   | on_id | off_id | other columns...
    | :------ | :--------- | :--------- | :---- | :----- |
    |     123 |            | 2019-06-04 |       |    321 |
    |     123 | 2019-07-11 | 2019-07-13 |   741 |    987 |
    |     123 | 2019-08-01 | 2019-08-07 |   951 |    357 |
    |     456 |            | 2019-08-01 |       |    125 |
    |     456 | 2019-11-18 |            |   878 |        |
    |     789 | 2019-12-18 |            |   373 |        |
    |     012 |            | 2019-12-25 |       |    654 |
    |     698 | 2018-01-03 | 2019-08-01 |   147 |    741 |

The result I need is a table wherein the dates "on" and dates "off" are noted in decending order (grouped by item_id), with the "off" dates on the same row as the previous (in time) "on" date.

我最接近的是以下变体:

尝试一:

SELECT
    changes_main.item_id,
    `on_date`,
    `off_date`,
    `on_id`,
    `off_id`
FROM (
    SELECT DISTINCT `item_id`
    FROM item_changes
) AS changes_main
LEFT OUTER JOIN (
    SELECT
        `item_id`, -- for joining purposes only
        `change_date` AS `on_date`,
        `change_id` AS `on_id`
    FROM item_changes
    WHERE `change_type` = 'on'
) AS changes_ons ON changes_ons.item_id = changes_main.item_id
RIGHT OUTER JOIN ( -- although LEFT or RIGHT doesn't seem to matter
    SELECT
        `item_id`, -- for joining purposes only
        `change_date` AS `off_date`,
        `change_id` AS `off_id`
    FROM item_changes
    WHERE `change_type` = 'off'
) AS changes_offs ON changes_offs.item_id = changes_main.item_id
;

但是,这实际上是在 on_dateoff_date 之间实现了 CROSS JOIN

第二次尝试的唯一变化是添加了一个 WHERE 子句。这是我从 this question.

那里得到的想法

尝试二:

-- Same exact query as the above, however with the following
-- WHERE statement placed where the semicolon is above:
WHERE
    `off_date` = (
        SELECT MIN(offs2.change_date)
        FROM item_changes AS offs2
        WHERE offs2.change_type = 'off' AND
        offs2.change_date > changes_ons.on_date
    )
;

问题在于,如果 item_id 中的 "on/off" 数量不是偶数,那么多余的 "on" 或 "off" 就会被过滤掉。

我尝试了上述 WHERE 子句的变体,包括 OR off_date IS NULLOR on_date IS NULL

更新:

第三次尝试是使用 UNION 和一些 SCALAR SUBQUERIES。这是我最接近我需要的结果。但是,仍然不足(例如,它不包括 change_id,也没有创建完美匹配)。

SELECT
    changes_on.item_id,
    changes_on.change_date AS `on_date`,
    (SELECT MIN(offs2.change_date)
        FROM item_changes AS offs2
        WHERE offs2.change_type = 'off' AND
        offs2.change_date > changes_ons.change_date
    ) AS `off_date`,
    changes_on.change_id AS `on_id`,
    NULL AS `off_id` -- odd
FROM item_changes AS changes_on
WHERE `change_type` = 'on'

UNION

SELECT
    changes_offs.item_id,
    changes_offs.change_date AS `off_date`,
    (SELECT MIN(ons2.change_date)
        FROM item_changes AS ons2
        WHERE ons2.change_type = 'on' AND
        ons2.change_date < changes_offs.on_date
    ) AS `off_date`,
    NULL AS `on_id`, -- odd
    changes_offs.change_id AS `off_id`
FROM item_changes AS changes_offs
WHERE `change_type` = 'off'
;

助理/输入/指导将不胜感激。

根据每行前 "on" 的数量分配一个组。然后使用条件聚合:

select item_id,
       max(case when change_type = 'on' then date end) as on_date,
       max(case when change_type = 'on' then change_id end) as on_change_id,
       max(case when change_type = 'off' then date end) as off_date,
       max(case when change_type = 'off' then change_id end) as off_change_id
from (select t.*,
             sum(case when change_type = 'on' then 1 else 0 end) over (partition by item_id order by change_date) as grp
      from t
     ) t
group by item_id, grp;

编辑:

在 MySQL 的早期版本中,您可以将其表示为:

select item_id,
       max(case when change_type = 'on' then date end) as on_date,
       max(case when change_type = 'on' then change_id end) as on_change_id,
       max(case when change_type = 'off' then date end) as off_date,
       max(case when change_type = 'off' then change_id end) as off_change_id
from (select t.*,
             (select count(*)
              from t t2
              where t2.item_id = t.item_id and
                    t2.change_date <= t.change_date and
                    t2.change_type = 'on'
            ) as grp
      from t
     ) t
group by item_id, grp;

性能不如使用 window 函数,但 (item_id, change_type, change_date) 上的索引会有所帮助。