根据顺序行模式的匹配将 SQL 行折叠成一行
Collapsing SQL rows into one row based on match of sequential row pattern
我正在从事一个项目,该项目涉及将一系列动作组合成序列。具体的编程问题是,我想弄清楚当这些行匹配某个序列时,如何使用SQL(最好是mySQL)将多个行组合成一个table。需要注意的是 table 由另外两列分组(谁做的以及他们在哪一天做的)。
对于这个例子,我们正在处理人们早上例行活动的动作列表。我们希望将我们的操作列表组合成两个操作序列:wakeup/snoozed 闹钟和开始日。 wakeup/snoozed 闹钟序列是(唤醒行)->(贪睡闹钟)行。一天的开始顺序是(起床行)->(吃早餐行)->(刷牙行)。
我们的原始操作 table 看起来像这样:
我们想要的输出如下所示:
到目前为止,我已经研究了 window 函数和使用 mySQL 的迭代,但是这两个似乎都没有扩展到我有多个序列要匹配的事实。
我觉得这在 SQL 中可能是不可能的,但我想我会 post 在这里,以防其他人知道如何解决这个数据处理问题。最终目标是将这些结果存储在一个视图中,这样以后每当我们查询操作 table 时,它都会查询“已处理”的操作(即在它们被分组到序列之后的操作)。
如果我没理解错的话,你可以使用lead()
,然后是一些过滤逻辑。首先,分配新的动作:
select t.*,
(case when (actionid, next_actionid) = ('wokeUp', 'snoozedAlarm')
then 'wokeup and snoozed'
when (actionid, next_actionid, next2_actionid) = ('wokeUp', 'ateBreakfast', 'brushedTeeth')
then 'started day'
else actionid
end) as action,
from (select t.*,
lead(actionId, 1) over (partition by person, day order by id) as next_actionid,
lead(actionId, 2) over (partition by person, day order by id) as next2_actionid
from t
) t;
接下来,使用此信息进行过滤:
with newactions as (
select t.*,
(case when (actionid, next_actionid) = ('wokeUp', 'snoozedAlarm')
then 'wokeup and snoozed'
when (actionid, next_actionid, next2_actionid) = ('wokeUp', 'ateBreakfast', 'brushedTeeth')
then 'started day'
else actionid
end) as action,
(case when (actionid, next_actionid) = ('wokeUp', 'snoozedAlarm')
then 2
when (actionid, next_actionid, next2_actionid) = ('wokeUp', 'ateBreakfast', 'brushedTeeth')
then 1
else 0
end) as duration
from (select t.*,
lead(actionId, 1) over (partition by person, day order by id) as next_actionid,
lead(actionId, 2) over (partition by person, day order by id) as next2_actionid
from t
) t
)
select na.*
from (select na.*,
lag(duration) over (partition by person order by id) as prev_duration,
lag(duration) over (partition by person order by id) as prev2_duration
from newactions na
) na
where not (prev_duration >= 1 or
prev2_duration >= 2
)
我正在从事一个项目,该项目涉及将一系列动作组合成序列。具体的编程问题是,我想弄清楚当这些行匹配某个序列时,如何使用SQL(最好是mySQL)将多个行组合成一个table。需要注意的是 table 由另外两列分组(谁做的以及他们在哪一天做的)。
对于这个例子,我们正在处理人们早上例行活动的动作列表。我们希望将我们的操作列表组合成两个操作序列:wakeup/snoozed 闹钟和开始日。 wakeup/snoozed 闹钟序列是(唤醒行)->(贪睡闹钟)行。一天的开始顺序是(起床行)->(吃早餐行)->(刷牙行)。
我们的原始操作 table 看起来像这样:
我们想要的输出如下所示:
到目前为止,我已经研究了 window 函数和使用 mySQL 的迭代,但是这两个似乎都没有扩展到我有多个序列要匹配的事实。
我觉得这在 SQL 中可能是不可能的,但我想我会 post 在这里,以防其他人知道如何解决这个数据处理问题。最终目标是将这些结果存储在一个视图中,这样以后每当我们查询操作 table 时,它都会查询“已处理”的操作(即在它们被分组到序列之后的操作)。
如果我没理解错的话,你可以使用lead()
,然后是一些过滤逻辑。首先,分配新的动作:
select t.*,
(case when (actionid, next_actionid) = ('wokeUp', 'snoozedAlarm')
then 'wokeup and snoozed'
when (actionid, next_actionid, next2_actionid) = ('wokeUp', 'ateBreakfast', 'brushedTeeth')
then 'started day'
else actionid
end) as action,
from (select t.*,
lead(actionId, 1) over (partition by person, day order by id) as next_actionid,
lead(actionId, 2) over (partition by person, day order by id) as next2_actionid
from t
) t;
接下来,使用此信息进行过滤:
with newactions as (
select t.*,
(case when (actionid, next_actionid) = ('wokeUp', 'snoozedAlarm')
then 'wokeup and snoozed'
when (actionid, next_actionid, next2_actionid) = ('wokeUp', 'ateBreakfast', 'brushedTeeth')
then 'started day'
else actionid
end) as action,
(case when (actionid, next_actionid) = ('wokeUp', 'snoozedAlarm')
then 2
when (actionid, next_actionid, next2_actionid) = ('wokeUp', 'ateBreakfast', 'brushedTeeth')
then 1
else 0
end) as duration
from (select t.*,
lead(actionId, 1) over (partition by person, day order by id) as next_actionid,
lead(actionId, 2) over (partition by person, day order by id) as next2_actionid
from t
) t
)
select na.*
from (select na.*,
lag(duration) over (partition by person order by id) as prev_duration,
lag(duration) over (partition by person order by id) as prev2_duration
from newactions na
) na
where not (prev_duration >= 1 or
prev2_duration >= 2
)