由于查询限制而丢失信息

Question

我有一个包含 col1-col10 列的 table tablename。并非每一行都填充了 col4，但每一行都填充了 col1、col2、col3。当 col4 满足条件时，我想获取所有 {col1, col2, col3} 元组，然后获取与来自 tablename.

的元组 {col1, col2, col3} 匹配的所有行

我有这个查询：

select t.*
from mytable t
where exists (
    select 1
    from mytable t1
    where 
        t1.col1 = t.col1 
        and t1.col2 = t.col2 
        and t1.col3 = t.col3 
        and t1.col4 >= 1000
)
LIMIT 1000

table 的大小非常大，所以我必须添加 limit。由于限制，对于一些 {col1, col2, col3} 没有得到结果数据集中的所有行。然后我想从 tablename.

中获取与元组 {col1, col2, col3} 匹配的所有行

我不介意我的结果中有较少的 {col1, col2, col3} 元组，但我想要我拥有的元组的完整信息。

我怎样才能做到这一点？

Answer 1

您没有提及哪个数据库，但以下查询应该运行更快。你可以这样做：

select t.*
from t
join (
  select distinct col1, col2, col3 
  from t
  where col4 >= 1000
  limit 100
) x on t.col1 = x.col1 and t.col2 = x.col2 and t.col3 = x.col3;

使用以下索引，查询应该会变得更快：

create index ix1 on t (col4, col1, col2, col3);

create index ix2 on t (col1, col2, col3);

Answer 2

一种更有效的方法是使用 window 函数：

select t.*
from (select t.*,
             sum(case when col4 > 1000 then 1 else 0 end) over (partition by col1, col2, col3) as cnt_matches
      from mytable t
     ) t
where cnt_matches > 0;

由于查询限制而丢失信息

Losing information due to limit in query

sql

presto