sql 从 "record per month" 转换为 "record from/until"

sql convert from "record per month" to "record from/until"

我们有一个数据库存储员工每月的价值(例如兼职百分比):

+-----+------+-------+----------+
| emp | year | month | parttime |
+-----+------+-------+----------+
|   1 | 2015 |     1 |      100 |
|   1 | 2015 |     2 |      100 |
|   1 | 2015 |     3 |      100 |
|   1 | 2015 |     4 |      100 |
|   2 | 2015 |     1 |       80 |
|   2 | 2015 |     2 |      100 |
|   2 | 2015 |     3 |      100 |
|   2 | 2015 |     4 |       80 |
|   3 | 2015 |     1 |       60 |
|   3 | 2015 |     2 |       60 |
|   3 | 2015 |     3 |       80 |
|   3 | 2015 |     4 |      100 |
+-----+------+-------+----------+

出于报告目的,我需要以 from/until 形式显示值:

+-----+---------+---------+----------+
| emp |  from   |   to    | parttime |
+-----+---------+---------+----------+
|   1 | 2015.01 | 2015.04 |      100 |
|   2 | 2015.01 | 2015.01 |       80 |
|   2 | 2015.02 | 2015.03 |      100 |
|   2 | 2015.04 | 2015.04 |       80 |
|   3 | 2015.01 | 2015.02 |       60 |
|   3 | 2015.03 | 2015.03 |       80 |
|   3 | 2015.04 | 2015.04 |      100 |
+-----+---------+---------+----------+

我的第一次尝试是用一种简单的 min/max 方法来解决它。但员工编号。 2 的循环值 80 有点棘手。

任何ideas/examples?数据库基于 db/2 或 microsoft.

谢谢

菲利普

我已经根据您的示例数据在 Postgres 上测试了此解决方案,但我几乎可以肯定这将适用于 DB2。它可能需要一些小的改动,但不确定。

要逐步了解它是如何工作的,您可以从执行最内部的块开始。

SELECT 
    emp, 
    (year||'.'||CASE WHEN length(min_month::text) = 1 THEN '0'||min_month::text ELSE min_month::text END) AS from, 
    (year||'.'||CASE WHEN length(max_month::text) = 1 THEN '0'||max_month::text ELSE max_month::text END) AS to, 
    parttime 
FROM(
    SELECT 
        emp,
        year,
        parttime,
        first_different,
        min(month) AS min_month,
        max(month) AS max_month 
    FROM( 
        SELECT 
            a.*,
            b.* 
        FROM(
            SELECT *
            FROM tablename 
            ) a,
            LATERAL 
            (
            SELECT 
                min(CASE WHEN a.parttime IS DISTINCT FROM b.parttime THEN b.month END) AS first_different
            FROM 
                tablename b 
            WHERE 
                a.emp = b.emp 
                AND a.year = b.year 
                AND a.month < b.month 
            ) b 
        ) foo 
    GROUP BY 1,2,3,4
    ORDER BY 1 
    ) goo 
ORDER BY 1,2;

结果:

 emp |  from   |   to    | parttime 
-----+---------+---------+----------
   1 | 2015.01 | 2015.04 |      100
   2 | 2015.01 | 2015.01 |       80
   2 | 2015.02 | 2015.03 |      100
   2 | 2015.04 | 2015.04 |       80
   3 | 2015.01 | 2015.02 |       60
   3 | 2015.03 | 2015.03 |       80
   3 | 2015.04 | 2015.04 |      100

这就是所谓的差距和孤岛问题。一种快速解决方案:

DECLARE @Employee TABLE
(emp int, year int, month int, parttime int)

INSERT INTO @Employee
VALUES
(1, 2015, 1, 100),
(1, 2015, 2, 100),
(1, 2015, 3, 100),
(1, 2015, 4, 100),
(2, 2015, 1,  80),
(2, 2015, 2, 100),
(2, 2015, 3, 100),
(2, 2015, 4,  80),
(3, 2015, 1,  60),
(3, 2015, 2,  60),
(3, 2015, 3,  80),
(3, 2015, 4, 100)


;WITH cte
AS 
(
    SELECT *
        ,e.[month] - ROW_NUMBER() OVER (ORDER BY e.emp, e.[parttime]) AS Grp
    FROM @Employee e
)
SELECT 
    emp, 
    CAST([year] AS varchar(50)) + '.' + CAST(MIN([month])AS varchar(50)) AS [from],
    CAST([year] AS varchar(50)) + '.' + CAST(MAX([month])AS varchar(50)) AS [to],
    parttime 
FROM cte
GROUP BY emp, parttime, year, Grp
ORDER BY emp, [from]

第一步:检测用户或兼职更改发生的位置(1 = 更改,0 = 与上一行相同的值)。您可以使用分析函数 LAG 来执行此操作。

第二步:使用分析函数 SUM 根据更改标志构建组。

第三步:每组显示一条记录,其中包含在组中找到的最小值和最大值 year/month。

+-----+------+-------+----------+-------+-------+
| emp | year | month | parttime | step1 | step2 |
|     |      |       |          |  chg  |  grp  |
+-----+------+-------+----------+-------+-------+
|   1 | 2015 |     1 |      100 |     1 |     1 |
|   1 | 2015 |     2 |      100 |     0 |     1 |
|   1 | 2015 |     3 |      100 |     0 |     1 |
|   1 | 2015 |     4 |      100 |     0 |     1 |
|   2 | 2015 |     1 |       80 |     1 |     2 |
|   2 | 2015 |     2 |      100 |     1 |     3 |
|   2 | 2015 |     3 |      100 |     0 |     3 |
|   2 | 2015 |     4 |       80 |     1 |     4 |
|   3 | 2015 |     1 |       60 |     1 |     5 |
|   3 | 2015 |     2 |       60 |     0 |     5 |
|   3 | 2015 |     3 |       80 |     1 |     6 |
|   3 | 2015 |     4 |      100 |     1 |     7 |
+-----+------+-------+----------+-------+-------+
select
  emp,
  min(format(year, '0000') + '.' + format(month, '00')) as from_month,
  max(format(year, '0000') + '.' + format(month, '00')) as to_month,
  parttime
from
(
  select
    emp, year, month, parttime,
    sum(chg) over (order by emp, year, month) as grp
  from
  (
    select 
      emp, year, month, parttime, 
      case when lag(emp) over (order by emp, year, month) = emp 
           and lag(parttime) lag(emp) over (order by emp, year, month) = parttime 
        then 0
        else 1
      end as chg
    from mytable
  ) changes
) groups
group by grp, emp, parttime
order by grp;

如果您的数据库存储的是完整日期而不只是 year/month(或至少是等效的组合类型),这会更容易。或者,如果您可以对原始基础数据进行操作:

SELECT emp, partTime, MIN(monthStart) AS monthStart, MAX(monthNext) AS monthEnd
FROM (SELECT emp, partTime,
             DATEADD(month, month - 1, DATEADD(year, year - 1, CAST('00010101' AS DATE))) AS monthStart,
             DATEADD(month, month, DATEADD(year, year - 1, CAST('00010101' AS DATE))) AS monthNext,
             ROW_NUMBER() OVER(PARTITION BY emp ORDER BY year, month)  -
             ROW_NUMBER() OVER(PARTITION BY emp, partTime ORDER BY year, month) AS groupId
      FROM Monthly_Hours) AS Grouping
GROUP BY emp, partTime, groupId
ORDER BY emp, monthStart

SQL Fiddle Example

请注意,我专门在范围内使用了独占上限。 Date/time/timestamp 类型,如所有正的、连续范围的类型(除显式整数计数之外的任何类型)都应始终以这种方式处理(这使得推理和查询它们更容易)。

这个答案略有不足,因为没有直接报告缺少的月份(不显示为 0)- 如有必要,可以通过多种方法更正此问题,尽管这需要更多的工作。