关联独立事件的序列 - 计算时间交点

Correlate Sequences of Independent Events - Calculate Time Intersection

我们正在构建一个 PowerBI 报告解决方案,我 (well Stack) 解决了 并且公司提出了一个新的报告想法。不确定处理它的最佳方法,因为我对 PowerBI 知之甚少,而且企业似乎需要非常复杂的报告。

我们有来自不同数据源的两个事件序列。它们都包含发生在车辆上的独立事件。一个描述车辆所在的位置 - 另一个描述具有事件原因代码的事件事件。企业希望报告因各种原因在每个地点花费的时间。车辆可以完全独立于发生的事件事件而改变位置——事件实际上是日期时间,并且全天随机发生。每种类型的事件都有一个 startime/endtime 和一个 vehicleID。

车辆位置事件

+------------------+-----------+------------+-----------------+----------------+
| LocationDetailID | VehicleID | LocationID |  StartDateTime  |  EndDateTime   |
+------------------+-----------+------------+-----------------+----------------+
|                1 |         1 |          1 |        2012-1-1 |      2016-1-1  |
|                2 |         1 |          2 |        2016-1-1 |      2016-4-1  |
|                3 |         1 |          1 |        2016-4-1 |      2016-11-1 |
|                4 |         2 |          1 |        2011-1-1 |      2016-11-1 |
+------------------+-----------+------------+-----------------+----------------+

车辆状态事件

+---------+---------------+-------------+-----------+--------------+
| EventID | StartDateTime | EndDateTime | VehicleID | ReasonCodeID |
+---------+---------------+-------------+-----------+--------------+
|       1 | 2012-1-1      | 2013-1-1    |         1 |            1 |
|       2 | 2013-1-1      | 2015-1-1    |         1 |            3 |
|       3 | 2015-1-1      | 2016-5-1    |         1 |            4 |
|       4 | 2016-5-1      | 2016-11-1   |         1 |            2 |
|       5 | 2015-9-1      | 2016-2-1    |         2 |            1 |
+---------+---------------+-------------+-----------+--------------+

我是否可以将这两个流关联在一起并计算每个位置每个 ReasonCode 每辆车的总时间?这似乎要求我能够将这两个事件联系起来 - 因此位置的变化可能会在给定的 ReasonCode 中途发生。

计算示例ReasonCodeID 4

因此位置 1 中的第一个 Period 与 ReasonCodeID 4 的 365 天(2015-1-1 到 2016-1-1)相交。位置 1 的第二个周期与 30 天相交(2016-4-1 到 2016-5-1)。 在位置 2 与 ReasonCodeID 4 的 91 天相交(2016-1-1 到 2016-4-1

期望的输出如下。

+-----------+--------------+------------+------------+
| VehicleID | ReasonCodeID | LocationID | Total Days |
+-----------+--------------+------------+------------+
|         1 |            1 |          1 |        366 |
|         1 |            3 |          1 |        730 |
|         1 |            4 |          1 |        395 |
|         1 |            4 |          2 |         91 |
|         1 |            2 |          1 |        184 |
|         2 |            1 |          1 |        154 |
+-----------+--------------+------------+------------+

我创建了一个显示结构 here

的 SQL fiddle

车辆有相关的表格,我确信企业会希望它们按车辆 class 等分组,但如果我能理解在这种情况下如何计算交点,那将为我提供休息的基础报告。

我认为此解决方案需要 CROSS JOIN 实施。 tables 之间的关系是 Many to Many,这意味着创建第三个 table 来桥接 LocationEventsVehicleStatusEvents tables 所以我认为在表达式中指定关系可能更容易。

我在两个 table 之间使用 CROSS JOIN,然后仅过滤结果以获取 VehicleID 列在两个 table 中相同的那些行。我还过滤了 VehicleStatusEvents 范围日期与 LocationEvents 范围日期相交的行。

过滤完成后,我将添加一列来计算每个交叉点之间的天数。最后,该度量总结了每个 VehicleID、ReasonCodeID 和 LocationID 的天数。

为了实施 CROSS JOIN,您必须重命名 VehicleIDStartDateTimeEndDateTime 两个 table 中的任何一个。这是避免歧义列名错误所必需的。

我将列重命名如下:

VehicleID : LocationVehicleIDStatusVehicleID
StartDateTime : LocationStartDateTimeStatusStartDateTime
EndDateTime : LocationEndDateTimeStatusEndDateTime

在此之后,您可以在 Total Days 度量中使用 CROSSJOIN:

Total Days =
SUMX (
    FILTER (
        ADDCOLUMNS (
            FILTER (
                CROSSJOIN ( LocationEvents, VehicleStatusEvents ),
                LocationEvents[LocationVehicleID] = VehicleStatusEvents[StatusVehicleID]
                    && LocationEvents[LocationStartDateTime] <= VehicleStatusEvents[StatusEndDateTime]
                    && LocationEvents[LocationEndDateTime] >= VehicleStatusEvents[StatusStartDateTime]
            ),
            "CountOfDays", IF (
                [LocationStartDateTime] <= [StatusStartDateTime]
                    && [LocationEndDateTime] >= [StatusEndDateTime],
                DATEDIFF ( [StatusStartDateTime], [StatusEndDateTime], DAY ),
                IF (
                    [LocationStartDateTime] > [StatusStartDateTime]
                        && [LocationEndDateTime] >= [StatusEndDateTime],
                    DATEDIFF ( [LocationStartDateTime], [StatusEndDateTime], DAY ),
                    IF (
                        [LocationStartDateTime] <= [StatusStartDateTime]
                            && [LocationEndDateTime] <= [StatusEndDateTime],
                        DATEDIFF ( [StatusStartDateTime], [LocationEndDateTime], DAY ),
                        IF (
                            [LocationStartDateTime] >= [StatusStartDateTime]
                                && [LocationEndDateTime] <= [StatusEndDateTime],
                            DATEDIFF ( [LocationStartDateTime], [LocationEndDateTime], DAY ),
                            BLANK ()
                        )
                    )
                )
            )
        ),
        LocationEvents[LocationID] = [LocationID]
            && VehicleStatusEvents[ReasonCodeID] = [ReasonCodeID]
    ),
    [CountOfDays]
)

然后在 Power BI 中,您可以使用此度量构建矩阵(或任何其他可视化):

如果你不完全理解度量表达式,这里是T-SQL翻译:

SELECT
    dt.VehicleID,
    dt.ReasonCodeID,
    dt.LocationID,
    SUM(dt.Diff) [Total Days]
FROM 
(
    SELECT
        CASE
            WHEN a.StartDateTime <= b.StartDateTime AND a.EndDateTime >= b.EndDateTime  -- Inside range
               THEN DATEDIFF(DAY, b.StartDateTime, b.EndDateTime)
            WHEN a.StartDateTime > b.StartDateTime AND a.EndDateTime >= b.EndDateTime  -- |-----|*****|....|
               THEN DATEDIFF(DAY, a.StartDateTime, b.EndDateTime)
            WHEN a.StartDateTime <= b.StartDateTime AND a.EndDateTime <= b.EndDateTime  -- |...|****|-----|
               THEN DATEDIFF(DAY, b.StartDateTime, a.EndDateTime)
            WHEN a.StartDateTime >= b.StartDateTime AND a.EndDateTime <= b.EndDateTime  -- |---|****|-----
               THEN DATEDIFF(DAY, a.StartDateTime, a.EndDateTime)
        END Diff,
        a.VehicleID,
        b.ReasonCodeID,
        a.LocationID --a.StartDateTime, a.EndDateTime, b.StartDateTime, b.EndDateTime
    FROM LocationEvents a
        CROSS JOIN VehicleStatusEvents b
    WHERE a.VehicleID = b.VehicleID
        AND 
        (
            (a.StartDateTime <= b.EndDateTime)
                AND (a.EndDateTime >= b.StartDateTime)
        )
) dt
GROUP BY dt.VehicleID,
         dt.ReasonCodeID,
         dt.LocationID

请注意,在 T-SQL 中,您也可以使用 INNER JOIN 运算符。

如果有帮助请告诉我。

select      coalesce(l.VehicleID,s.VehicleID)   as VehicleID
           ,s.ReasonCodeID
           ,l.LocationID

           ,sum
            (
                datediff
                (
                    day
                   ,case when s.StartDateTime > l.StartDateTime then s.StartDateTime else l.StartDateTime end
                   ,case when s.EndDateTime   < l.EndDateTime   then s.EndDateTime   else l.EndDateTime   end
                )
            )   as TotalDays

from                    VehicleLocationEvents   as l

            full join   VehicleStatusEvents     as s

            on          s.VehicleID =
                        l.VehicleID

                    and case when s.StartDateTime > l.StartDateTime then s.StartDateTime else l.StartDateTime end   <=
                        case when s.EndDateTime   < l.EndDateTime   then s.EndDateTime   else l.EndDateTime   end   

group by    coalesce(l.VehicleID,s.VehicleID)
           ,s.ReasonCodeID
           ,l.LocationID

select      VehicleID
           ,ReasonCodeID
           ,LocationID
           ,sum (datediff (day,max_StartDateTime,min_EndDateTime))  as TotalDays

from       (select      coalesce(l.VehicleID,s.VehicleID)   as VehicleID
                       ,s.ReasonCodeID
                       ,l.LocationID

                       ,case when s.StartDateTime > l.StartDateTime then s.StartDateTime else l.StartDateTime end   as max_StartDateTime
                       ,case when s.EndDateTime   < l.EndDateTime   then s.EndDateTime   else l.EndDateTime   end   as min_EndDateTime

            from                    VehicleLocationEvents   as l

                        full join   VehicleStatusEvents     as s

                        on          s.VehicleID =
                                    l.VehicleID
            ) ls

where       max_StartDateTime <= min_EndDateTime

group by    VehicleID
           ,ReasonCodeID
           ,LocationID