需要解决方案以避免重复扫描巨大 table

Question

我有一个事件 table，它有 40 列并填充多达 20 亿条记录。在那种情况下 table 我想查询组合事件，即事件 A 与事件 B。有时我可能想找到更多组合，例如事件 A 与 B 和 C。它可能会达到 5 或 6 组合。

我不想扫描 table 每个组合事件，即扫描事件 A 和扫描事件 B。我还需要一种通用方法来进行更多组合扫描。

注意：这 20 亿条记录是根据事件日期进行分区的，数据是平均分配的。

例如：

需要找到具有事件 A、B、C 的 ID，并且需要找到只有 A、B 的 ID。

这个数字组合是动态的。我不想为每个事件扫描 table 并最终与结果相交。

Answer 1

SELECT * from table as A
JOIN table AS B
    ON A.Id = B.Id AND A.Date = B.Date
WHERE Date = '1-Jan'
AND A.Event = 'A'
AND B.Event = 'B'

这将为您提供行，其中日期为“1 月 1 日”，两个事件的 ID 相同。如果您想按更多事件进行过滤，可以一次又一次地加入 table。

Answer 2

having clause allows you to filter using the result of an aggregate function。我使用的是常规计数，但您可能需要不同的计数，具体取决于您的 table 设计。

示例：

-- Returns ids with 3 or more events.
SELECT
    x.Id,
    COUNT(*) AS EventCount
FROM
(
    VALUES
        (1, '2017-01-01', 'A'),
        (1, '2017-01-01', 'B'),
        (1, '2017-01-03', 'C'),
        (1, '2017-01-04', 'C'),
        (1, '2017-01-05', 'E'),
        (2, '2017-01-01', 'A'),
        (2, '2017-01-01', 'B'),
        (3, '2017-01-01', 'A')
) AS x(Id, [Date], [Event])
GROUP BY
    x.Id
HAVING 
    COUNT(*) > 2
;

Returns

Id  EventCount
1   5

Answer 3

使用相当于 mysql group_concat 功能的 sql 服务器可能会有一些好处。例如

drop table t
create table t (id int, dt date, event varchar(1))
insert into t values
(1,'2017-01-01','a'),(1,'2017-01-01','b'),(1,'2017-01-01','c'),(1,'2017-01-02','c'),(1,'2017-01-03','d'),
(2,'2017-02-01','a'),(2,'2017-02-01','b')

select id,
        stuff(
    (
    select cast(',' as varchar(max)) + t1.event
    from t as t1
    WHERE t1.id  = t.id
    order by t1.id
    for xml path('')
    ), 1, 1, '') AS groupconcat
from t
group by t.id

Results in

id          groupconcat
----------- -----------
1           a,b,c,c,d
2           a,b

如果你再添加一个patindex

select * from
(
select id,
        stuff(
    (
    select cast(',' as varchar(max)) + t1.event
    from t as t1
    WHERE t1.id  = t.id
    order by t1.id
    for xml path('')
    ), 1, 1, '') AS groupconcat
from t
group by t.id
) s
where patindex('a,b,c%',groupconcat) > 0

你明白了

id          groupconcat
----------- ------------
1           a,b,c,c,d

需要解决方案以避免重复扫描巨大 table

Need solution to avoid repeated scanning in huge table

sql

sql-server

events

tagging

combinations