如何根据字段值对行进行分组?也许是一个 RANK() 案例?
How can I group lines based on field value? A RANK() case, maybe?
对于不好的实现,我们有这样的情况:
+-----------+-------- +-----------+-----------+-----------+---------+------------------------+---------------------+---------------------+-----------+----------+---------+--------
| id | event_n | reader_id | entity_id | school_id | root_id | event_value | created_at | updated_at | event_flo | event_int| user_id | teacher
+-----------+-------- +-----------+-----------+-----------+---------+------------------------+---------------------+---------------------+-----------+----------+---------+--------
| 345678270 | search | 2511678 | 193765 | <null> | 65478 | Du mmy | 2021-04-05 19:24:11 | 2021-04-05 19:24:11 | <null> | <null> | 2634876 | <null>
| 345678286 | search | 2511678 | 193765 | <null> | 65478 | Du mmyntex | 2021-04-05 19:24:13 | 2021-04-05 19:24:13 | <null> | <null> | 2634876 | <null>
| 345678366 | search | 2511678 | 193765 | <null> | 65478 | Du mmyntexas | 2021-04-05 19:24:17 | 2021-04-05 19:24:17 | <null> | <null> | 2634876 | <null>
| 345678370 | search | 2511678 | 193765 | <null> | 65478 | Du mmyntexas de ma | 2021-04-05 19:24:17 | 2021-04-05 19:24:17 | <null> | <null> | 2634876 | <null>
| 345678388 | search | 2511678 | 193765 | <null> | 65478 | Du mmyntexas de make | 2021-04-05 19:24:18 | 2021-04-05 19:24:18 | <null> | <null> | 2634876 | <null>
| 345678823 | search | 2587432 | 61567 | <null> | 65478 | du mmyntexas do clock | 2021-04-05 19:24:52 | 2021-04-05 19:24:52 | <null> | <null> | 2713377 | <null>
| 345678315 | search | 2511678 | 193765 | <null> | 65478 | Du mmyntexasd | 2021-04-05 19:24:14 | 2021-04-05 19:24:14 | <null> | <null> | 2634876 | <null>
应该只记录较长的 event_value
列。我们已经解决了这个问题。但是,任何人都可以看到按这些寄存器分组的方法,所以我可以删除它们,只留下最大的长度?
我考虑过使用 RANK()
函数,但找不到对这些值进行分组的方法。
类似
DELETE t1
FROM tablename t1
JOIN tablename t2
-- adjust joining columns list
USING (event_n, reader_id, entity_id, user_id)
WHERE LOCATE(t1.event_value, t2.event_value) = 1
AND t1.created_at < t2.created_at
-- adjust time gap length
AND t1.created_at + INTERVAL 1 MINUTE > t2.created_at
或使用 WHERE EXISTS 相同。
对于不好的实现,我们有这样的情况:
+-----------+-------- +-----------+-----------+-----------+---------+------------------------+---------------------+---------------------+-----------+----------+---------+--------
| id | event_n | reader_id | entity_id | school_id | root_id | event_value | created_at | updated_at | event_flo | event_int| user_id | teacher
+-----------+-------- +-----------+-----------+-----------+---------+------------------------+---------------------+---------------------+-----------+----------+---------+--------
| 345678270 | search | 2511678 | 193765 | <null> | 65478 | Du mmy | 2021-04-05 19:24:11 | 2021-04-05 19:24:11 | <null> | <null> | 2634876 | <null>
| 345678286 | search | 2511678 | 193765 | <null> | 65478 | Du mmyntex | 2021-04-05 19:24:13 | 2021-04-05 19:24:13 | <null> | <null> | 2634876 | <null>
| 345678366 | search | 2511678 | 193765 | <null> | 65478 | Du mmyntexas | 2021-04-05 19:24:17 | 2021-04-05 19:24:17 | <null> | <null> | 2634876 | <null>
| 345678370 | search | 2511678 | 193765 | <null> | 65478 | Du mmyntexas de ma | 2021-04-05 19:24:17 | 2021-04-05 19:24:17 | <null> | <null> | 2634876 | <null>
| 345678388 | search | 2511678 | 193765 | <null> | 65478 | Du mmyntexas de make | 2021-04-05 19:24:18 | 2021-04-05 19:24:18 | <null> | <null> | 2634876 | <null>
| 345678823 | search | 2587432 | 61567 | <null> | 65478 | du mmyntexas do clock | 2021-04-05 19:24:52 | 2021-04-05 19:24:52 | <null> | <null> | 2713377 | <null>
| 345678315 | search | 2511678 | 193765 | <null> | 65478 | Du mmyntexasd | 2021-04-05 19:24:14 | 2021-04-05 19:24:14 | <null> | <null> | 2634876 | <null>
应该只记录较长的 event_value
列。我们已经解决了这个问题。但是,任何人都可以看到按这些寄存器分组的方法,所以我可以删除它们,只留下最大的长度?
我考虑过使用 RANK()
函数,但找不到对这些值进行分组的方法。
类似
DELETE t1
FROM tablename t1
JOIN tablename t2
-- adjust joining columns list
USING (event_n, reader_id, entity_id, user_id)
WHERE LOCATE(t1.event_value, t2.event_value) = 1
AND t1.created_at < t2.created_at
-- adjust time gap length
AND t1.created_at + INTERVAL 1 MINUTE > t2.created_at
或使用 WHERE EXISTS 相同。