按同一列中的 ID 和 TIMESTAMPDIFF 分组 table
Group by IDs and TIMESTAMPDIFF one column in same table
我正在尝试找出 "how many unique messages has been sent to a person on a specific boat within a timeframe, and what is the minimum days between those texts" 并显示它,包括计数。
人用'id'表示,船用'id2'表示,消息用'text'表示。
CREATE TABLE `stacktable` (
`timestamp` DATETIME NOT NULL,
`id` VARCHAR(15) NOT NULL,
`id2` VARCHAR(3) NULL DEFAULT NULL,
`text` VARCHAR(255) NULL DEFAULT NULL,
`id3` INT(10) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id3`)
);
insert into stacktable (timestamp,id,id2,text) VALUES
('2015-01-01 00:00:01',1,10,'ABC'),
('2015-01-01 00:00:01',2,11,'ABC'),
('2015-01-01 00:00:01',3,12,'ABC'),
('2015-01-01 00:00:02',3,12,'ABC'),
('2015-01-01 00:00:02',1,10,'ABC'),
('2015-01-04 00:00:01',1,10,'ABC'),
('2015-01-04 00:00:01',1,10,'BCD'),
('2015-01-04 00:00:01',2,11,'ABC'),
('2015-01-04 00:00:01',2,11,'BCD'),
('2015-01-04 00:00:01',3,12,'ABC'),
('2015-01-04 00:00:01',3,12,'BCD'),
('2015-01-04 00:00:01',3,13,'CDE'),
('2015-01-07 00:00:01',2,11,'BCD'),
('2015-01-07 00:00:01',3,12,'BCD'),
('2015-01-07 00:00:01',3,13,'CDE'),
('2015-01-07 00:00:01',3,13,'DEF'),
('2015-01-08 00:00:01',3,12,'ABC'),
('2015-01-08 00:00:01',4,14,'EFG'),
('2015-01-09 00:00:01',4,14,'EFG'),
('2015-01-09 00:00:02',4,15,'FGH'),
('2015-01-10 00:00:01',4,14,'EFG'),
('2015-01-10 00:00:01',4,14,'FGH'),
('2015-01-10 00:00:01',4,15,'FGH'),
('2015-01-11 00:00:01',4,14,'EFG'),
('2015-01-15 00:00:01',4,14,'EFG');
展示我正在努力实现的目标:
select * from stacktable where id = 1
timestamp id id2 text id3
2015-01-01 00:00:01 1 10 ABC 1 First entry for id+id2+text (ABC)
2015-01-01 00:00:02 1 10 ABC 5 Second entry for same keys id+id2+text 1 second later
2015-01-04 00:00:01 1 10 ABC 6 Third entry for same keys id+id2+text 2 days later
2015-01-04 00:00:01 1 10 BCD 7 First entry for id+id2+text (BCD)
我只想统计有"same id,id2 and text within a period of 2 days"的记录,还要显示"minimum diffdate in days between the hits".
我想要的输出是:
id id2 text count(*) mindiffdatebetweenhits
-------------------------------------------
1 10 ABC 3 0 count id3s 1,5 and 6, minimumdaydiff is between id3 1 and 5 = 0 days
3 12 ABC 3 0 count id3s 3,4 and 10, minimumdaydiff is between id3 3 and 4 = 0 days
4 14 EFG 4 1 count id3s 18,19,21 and 24, minimumdaydiff is equal between all hits = 1 day
4 15 FGH 2 0 count id3s 20 and 23, minimumdaydiff is between id3 20 and 23 = 0 days
如何获得所需的输出?
应该这样做,假设只有一行的序列要被丢弃:
select id, id2, text, seq, count(id) as total, min(diff) as mindiff
from (
select t1.row, t2.row row2, t1.id, t1.id2, t1.text, t1.id3,
TIMESTAMPDIFF(DAY, t1.timestamp, t2.timestamp) as diff,
IF (TIMESTAMPDIFF(DAY, t1.timestamp, t2.timestamp) > 2, @seq * (1 and @seq := @seq +1), @seq) as seq
from (select (@row := @row + 1) as row, id, id2, text, id3, timestamp
from (select id, id2, text, id3, timestamp
from stacktable
order by id, id2, text) sorted,
(select @row := 0) setup) t1
left join (select (@row2 := @row2 + 1) as row, id, id2, text, id3, timestamp
from (select id, id2, text, id3, timestamp
from stacktable
order by id, id2, text) sorted,
(select @row2 := 0) setup) t2
on (t1.id = t2.id and t1.id2 = t2.id2 and t1.text=t2.text and t1.row = t2.row - 1),
(select @seq := 1) setup_sequence
) t3
group by id, id2, text, seq
having total > 1
为了方便阅读,查询使用相同的子查询两次,t1 和 t2,它所做的只是对 table:
的行进行排序和编号
select (@row := @row + 1) as row, id, id2, text, id3, timestamp
from (select id, id2, text, id3, timestamp
from stacktable
order by id, id2, text) sorted,
(select @row := 0) setup
参见fiddle。请注意,序列计数器在所有序列之间并不是唯一的。这不是错误。它仅在相同 id,id2,text 的序列之间是唯一的。
序列计数器更新有点棘手:@seq * (1 and @seq := @seq +1)。它依赖于在更新之前为乘法设置的第一个 @seq。我不确定这是跨引擎的确定性或一致性。但是,也可以通过将 t1 的记录与前一条记录而不是下一条记录(在 t2 中)连接起来来更改查询以避免它。 (未试用)
我正在尝试找出 "how many unique messages has been sent to a person on a specific boat within a timeframe, and what is the minimum days between those texts" 并显示它,包括计数。
人用'id'表示,船用'id2'表示,消息用'text'表示。
CREATE TABLE `stacktable` (
`timestamp` DATETIME NOT NULL,
`id` VARCHAR(15) NOT NULL,
`id2` VARCHAR(3) NULL DEFAULT NULL,
`text` VARCHAR(255) NULL DEFAULT NULL,
`id3` INT(10) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id3`)
);
insert into stacktable (timestamp,id,id2,text) VALUES
('2015-01-01 00:00:01',1,10,'ABC'),
('2015-01-01 00:00:01',2,11,'ABC'),
('2015-01-01 00:00:01',3,12,'ABC'),
('2015-01-01 00:00:02',3,12,'ABC'),
('2015-01-01 00:00:02',1,10,'ABC'),
('2015-01-04 00:00:01',1,10,'ABC'),
('2015-01-04 00:00:01',1,10,'BCD'),
('2015-01-04 00:00:01',2,11,'ABC'),
('2015-01-04 00:00:01',2,11,'BCD'),
('2015-01-04 00:00:01',3,12,'ABC'),
('2015-01-04 00:00:01',3,12,'BCD'),
('2015-01-04 00:00:01',3,13,'CDE'),
('2015-01-07 00:00:01',2,11,'BCD'),
('2015-01-07 00:00:01',3,12,'BCD'),
('2015-01-07 00:00:01',3,13,'CDE'),
('2015-01-07 00:00:01',3,13,'DEF'),
('2015-01-08 00:00:01',3,12,'ABC'),
('2015-01-08 00:00:01',4,14,'EFG'),
('2015-01-09 00:00:01',4,14,'EFG'),
('2015-01-09 00:00:02',4,15,'FGH'),
('2015-01-10 00:00:01',4,14,'EFG'),
('2015-01-10 00:00:01',4,14,'FGH'),
('2015-01-10 00:00:01',4,15,'FGH'),
('2015-01-11 00:00:01',4,14,'EFG'),
('2015-01-15 00:00:01',4,14,'EFG');
展示我正在努力实现的目标:
select * from stacktable where id = 1
timestamp id id2 text id3
2015-01-01 00:00:01 1 10 ABC 1 First entry for id+id2+text (ABC)
2015-01-01 00:00:02 1 10 ABC 5 Second entry for same keys id+id2+text 1 second later
2015-01-04 00:00:01 1 10 ABC 6 Third entry for same keys id+id2+text 2 days later
2015-01-04 00:00:01 1 10 BCD 7 First entry for id+id2+text (BCD)
我只想统计有"same id,id2 and text within a period of 2 days"的记录,还要显示"minimum diffdate in days between the hits".
我想要的输出是:
id id2 text count(*) mindiffdatebetweenhits
-------------------------------------------
1 10 ABC 3 0 count id3s 1,5 and 6, minimumdaydiff is between id3 1 and 5 = 0 days
3 12 ABC 3 0 count id3s 3,4 and 10, minimumdaydiff is between id3 3 and 4 = 0 days
4 14 EFG 4 1 count id3s 18,19,21 and 24, minimumdaydiff is equal between all hits = 1 day
4 15 FGH 2 0 count id3s 20 and 23, minimumdaydiff is between id3 20 and 23 = 0 days
如何获得所需的输出?
应该这样做,假设只有一行的序列要被丢弃:
select id, id2, text, seq, count(id) as total, min(diff) as mindiff
from (
select t1.row, t2.row row2, t1.id, t1.id2, t1.text, t1.id3,
TIMESTAMPDIFF(DAY, t1.timestamp, t2.timestamp) as diff,
IF (TIMESTAMPDIFF(DAY, t1.timestamp, t2.timestamp) > 2, @seq * (1 and @seq := @seq +1), @seq) as seq
from (select (@row := @row + 1) as row, id, id2, text, id3, timestamp
from (select id, id2, text, id3, timestamp
from stacktable
order by id, id2, text) sorted,
(select @row := 0) setup) t1
left join (select (@row2 := @row2 + 1) as row, id, id2, text, id3, timestamp
from (select id, id2, text, id3, timestamp
from stacktable
order by id, id2, text) sorted,
(select @row2 := 0) setup) t2
on (t1.id = t2.id and t1.id2 = t2.id2 and t1.text=t2.text and t1.row = t2.row - 1),
(select @seq := 1) setup_sequence
) t3
group by id, id2, text, seq
having total > 1
为了方便阅读,查询使用相同的子查询两次,t1 和 t2,它所做的只是对 table:
的行进行排序和编号select (@row := @row + 1) as row, id, id2, text, id3, timestamp
from (select id, id2, text, id3, timestamp
from stacktable
order by id, id2, text) sorted,
(select @row := 0) setup
参见fiddle。请注意,序列计数器在所有序列之间并不是唯一的。这不是错误。它仅在相同 id,id2,text 的序列之间是唯一的。
序列计数器更新有点棘手:@seq * (1 and @seq := @seq +1)。它依赖于在更新之前为乘法设置的第一个 @seq。我不确定这是跨引擎的确定性或一致性。但是,也可以通过将 t1 的记录与前一条记录而不是下一条记录(在 t2 中)连接起来来更改查询以避免它。 (未试用)