根据顺序行模式的匹配将 SQL 行折叠成一行

Collapsing SQL rows into one row based on match of sequential row pattern

我正在从事一个项目,该项目涉及将一系列动作组合成序列。具体的编程问题是,我想弄清楚当这些行匹配某个序列时,如何使用SQL(最好是mySQL)将多个行组合成一个table。需要注意的是 table 由另外两列分组(谁做的以及他们在哪一天做的)。

对于这个例子,我们正在处理人们早上例行活动的动作列表。我们希望将我们的操作列表组合成两个操作序列:wakeup/snoozed 闹钟和开始日。 wakeup/snoozed 闹钟序列是(唤醒行)->(贪睡闹钟)行。一天的开始顺序是(起床行)->(吃早餐行)->(刷牙行)。

我们的原始操作 table 看起来像这样:

我们想要的输出如下所示:

到目前为止,我已经研究了 window 函数和使用 mySQL 的迭代,但是这两个似乎都没有扩展到我有多个序列要匹配的事实。

我觉得这在 SQL 中可能是不可能的,但我想我会 post 在这里,以防其他人知道如何解决这个数据处理问题。最终目标是将这些结果存储在一个视图中,这样以后每当我们查询操作 table 时,它都会查询“已处理”的操作(即在它们被分组到序列之后的操作)。

如果我没理解错的话,你可以使用lead(),然后是一些过滤逻辑。首先,分配新的动作:

select t.*,
       (case when (actionid, next_actionid) = ('wokeUp', 'snoozedAlarm')
             then 'wokeup and snoozed'
             when (actionid, next_actionid, next2_actionid) = ('wokeUp', 'ateBreakfast', 'brushedTeeth')
             then 'started day'
             else actionid
         end) as action,
from (select t.*,
             lead(actionId, 1) over (partition by person, day order by id) as next_actionid,
             lead(actionId, 2) over (partition by person, day order by id) as next2_actionid
      from t
     ) t;

接下来,使用此信息进行过滤:

with newactions as (
      select t.*,
             (case when (actionid, next_actionid) = ('wokeUp', 'snoozedAlarm')
                   then 'wokeup and snoozed'
                   when (actionid, next_actionid, next2_actionid) = ('wokeUp', 'ateBreakfast', 'brushedTeeth')
                   then 'started day'
                   else actionid
               end) as action,
             (case when (actionid, next_actionid) = ('wokeUp', 'snoozedAlarm')
                   then 2
                   when (actionid, next_actionid, next2_actionid) = ('wokeUp', 'ateBreakfast', 'brushedTeeth')
                   then 1
                   else 0
               end) as duration
      from (select t.*,
                   lead(actionId, 1) over (partition by person, day order by id) as next_actionid,
                   lead(actionId, 2) over (partition by person, day order by id) as next2_actionid
            from t
           ) t
      )
    select na.*
    from (select na.*,
                 lag(duration) over (partition by person order by id) as prev_duration,
                 lag(duration) over (partition by person order by id) as prev2_duration
          from newactions na
         ) na
    where not (prev_duration >= 1 or
               prev2_duration >= 2
              )