如何使用 MySQL 识别序列中具有重复值的实体?
How to identify entities which have repeated values in sequence using MySQL?
我有一个 table:
UNIT_ID | YEAR | MONTH | VAR
---------+------+-------+------
1 | 2015 | 1 | 0
1 | 2015 | 2 | 0
1 | 2015 | 3 | 0
2 | 2015 | 1 | 10
2 | 2015 | 2 | 10
2 | 2015 | 3 | 10
1 | 2015 | 4 | 5
1 | 2015 | 5 | 5
1 | 2015 | 6 | 5
2 | 2015 | 4 | 10
2 | 2015 | 5 | 3
2 | 2015 | 6 | 3
3 | 2016 | 1 | 3
3 | 2016 | 2 | 3
3 | 2016 | 3 | 3
3 | 2016 | 4 | 3
2 | 2016 | 6 | 0
2 | 2016 | 7 | 0
2 | 2016 | 8 | 0
我想知道哪些单位的序列大于 3 个零或大于 4 个重复值。按年份分组。所以,我的结果 table 会是这样的:
1 | 2015 | true
2 | 2015 | true
2 | 2016 | true
我找到了 this solution 但不幸的是我无法适应我的情况。我还需要查询在 MySQL.
你可以加入他们 4 次。最后一个连接是 left join
以允许 3 0
的大小写。
select a.unit_id, a.year, 'true'
from tbl a
join tbl b on a.unit_id = b.unit_id and a.year = b.year and a.month+1 = b.month and a.var = b.var
join tbl c on b.unit_id = c.unit_id and b.year = c.year and b.month+1 = c.month and b.var = c.var
left join tbl d on c.unit_id = d.unit_id and c.year = d.year and c.month+1 = d.month and c.var = d.var
where a.var = 0 or d.var is not null;
更快更通用的解决方案。它只扫描 table 一次,并使用用户定义的变量(@pu 表示前一个 unit_id,@py 表示前一年,等等)来记住前一行:
select distinct unit_id, year
from (
select unit_id, `year`, `month`, `var`,
if(unit_id=@pu and `year`=@py and `month`=@pm+1 and `var`=@pv, @i:=@i+1, @i:=1)*
if(@pu:=unit_id,1,1)*if(@py:=`year`,1,1)*if(@pm:=`month`,1,1)*if(@pv:=`var`,1,1) as c
from table1 a
join (select @pu:=null, @py:=null, @pm:=null, @pv:=null, @i:=1) b
order by unit_id, `year`, `month`, `var`) a
group by unit_id, `year`, `var`
having (`var` = 0 and max(c) >= 3) or (`var` != 0 and max(c) >= 4);
我有一个 table:
UNIT_ID | YEAR | MONTH | VAR
---------+------+-------+------
1 | 2015 | 1 | 0
1 | 2015 | 2 | 0
1 | 2015 | 3 | 0
2 | 2015 | 1 | 10
2 | 2015 | 2 | 10
2 | 2015 | 3 | 10
1 | 2015 | 4 | 5
1 | 2015 | 5 | 5
1 | 2015 | 6 | 5
2 | 2015 | 4 | 10
2 | 2015 | 5 | 3
2 | 2015 | 6 | 3
3 | 2016 | 1 | 3
3 | 2016 | 2 | 3
3 | 2016 | 3 | 3
3 | 2016 | 4 | 3
2 | 2016 | 6 | 0
2 | 2016 | 7 | 0
2 | 2016 | 8 | 0
我想知道哪些单位的序列大于 3 个零或大于 4 个重复值。按年份分组。所以,我的结果 table 会是这样的:
1 | 2015 | true
2 | 2015 | true
2 | 2016 | true
我找到了 this solution 但不幸的是我无法适应我的情况。我还需要查询在 MySQL.
你可以加入他们 4 次。最后一个连接是 left join
以允许 3 0
的大小写。
select a.unit_id, a.year, 'true'
from tbl a
join tbl b on a.unit_id = b.unit_id and a.year = b.year and a.month+1 = b.month and a.var = b.var
join tbl c on b.unit_id = c.unit_id and b.year = c.year and b.month+1 = c.month and b.var = c.var
left join tbl d on c.unit_id = d.unit_id and c.year = d.year and c.month+1 = d.month and c.var = d.var
where a.var = 0 or d.var is not null;
更快更通用的解决方案。它只扫描 table 一次,并使用用户定义的变量(@pu 表示前一个 unit_id,@py 表示前一年,等等)来记住前一行:
select distinct unit_id, year
from (
select unit_id, `year`, `month`, `var`,
if(unit_id=@pu and `year`=@py and `month`=@pm+1 and `var`=@pv, @i:=@i+1, @i:=1)*
if(@pu:=unit_id,1,1)*if(@py:=`year`,1,1)*if(@pm:=`month`,1,1)*if(@pv:=`var`,1,1) as c
from table1 a
join (select @pu:=null, @py:=null, @pm:=null, @pv:=null, @i:=1) b
order by unit_id, `year`, `month`, `var`) a
group by unit_id, `year`, `var`
having (`var` = 0 and max(c) >= 3) or (`var` != 0 and max(c) >= 4);