使用 HAVING 子句的慢查询 - 我可以加快速度吗?
Slow query with HAVING clause - can I speed it up?
我有以下查询产生了预期的结果,但速度很慢(大约需要 10 秒。gstats table 在我的开发环境中有大约 130k 行,在生产环境中要大得多):
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
AND s.id IN(
SELECT g.site_id
FROM gstats g
WHERE g.start_date > '2015-04-30'
GROUP BY g.site_id
HAVING SUM(g.results) > 100
)
GROUP BY s.id
ORDER BY dcount ASC
我是不是做错了什么?我怎样才能加快速度?
添加 indexes/using 视图会有帮助吗?
快速解决方法是在子查询中使用 filter:
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
AND s.id IN(
SELECT g.site_id
FROM gstats g
WHERE g.start_date > '2015-04-30'<b> AND g.site_id = s.id</b>
GROUP BY g.site_id
HAVING SUM(g.results) > 100
)
GROUP BY s.id
ORDER BY dcount ASC
否则,您将对每个可能的候选人进行这样的分组查询。我们可以用 EXISTS
:
让它更优雅
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
AND <b>EXISTS (
SELECT 1
FROM gstats g
WHERE g.site_id = s.id AND g.start_date > '2015-04-30'
HAVING SUM(g.results) > 100
)</b>
GROUP BY s.id
ORDER BY dcount ASC
但我们还没有完成,现在我们将为 每个 元素使用 EXISTS
。这很奇怪,因为查询只依赖于 s.id
,所以它只依赖于 group,而不是单独的行。因此 可能 加速,但这取决于表的大小等,是将条件移动到 HAVING
语句:
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
GROUP BY s.id
ORDER BY dcount ASC
<b>HAVING</b> EXISTS (
SELECT 1
FROM gstats g
WHERE g.site_id = s.id AND g.start_date > '2015-04-30'
HAVING SUM(g.results) > 100
)
尝试将子查询移动到 FROM
子句:
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s JOIN
(SELECT g.site_id
FROM gstats g
WHERE g.start_date > '2015-04-30'
GROUP BY g.site_id
HAVING SUM(g.results) > 100
) g
ON g.site_id = s.site_id LEFT JOIN
deals d
ON s.id = d.site_id AND d.is_active = 1
WHERE s.is_active = 1
GROUP BY s.id
ORDER BY dcount ASC;
我假设您在 join
列上有索引。您可能还会发现这有助于提高性能:
SELECT s.id, s.name,
(SELECT COUNT(*)
FROM deals d
WHERE d.site_id = s.id AND d.is_active = 1
) as dcount
FROM sites s JOIN
(SELECT g.site_id
FROM gstats g
WHERE g.start_date > '2015-04-30'
GROUP BY g.site_id
HAVING SUM(g.results) > 100
) g
ON g.site_id = s.site_id
WHERE s.is_active = 1
ORDER BY dcount ASC;
对于此版本,您需要 deals(site_id, is_active)
上的索引。
查询看起来很好。我建议以下索引:
create index idx_gstats on gstats(start_date, results, site_id);
create index idx_deals1 on deals(is_active, site_id);
create index idx_deals2 on deals(site_id, is_active);
然后查看查询的执行计划并删除未使用的交易索引。
我有以下查询产生了预期的结果,但速度很慢(大约需要 10 秒。gstats table 在我的开发环境中有大约 130k 行,在生产环境中要大得多):
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
AND s.id IN(
SELECT g.site_id
FROM gstats g
WHERE g.start_date > '2015-04-30'
GROUP BY g.site_id
HAVING SUM(g.results) > 100
)
GROUP BY s.id
ORDER BY dcount ASC
我是不是做错了什么?我怎样才能加快速度?
添加 indexes/using 视图会有帮助吗?
快速解决方法是在子查询中使用 filter:
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
AND s.id IN(
SELECT g.site_id
FROM gstats g
WHERE g.start_date > '2015-04-30'<b> AND g.site_id = s.id</b>
GROUP BY g.site_id
HAVING SUM(g.results) > 100
)
GROUP BY s.id
ORDER BY dcount ASC
否则,您将对每个可能的候选人进行这样的分组查询。我们可以用 EXISTS
:
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
AND <b>EXISTS (
SELECT 1
FROM gstats g
WHERE g.site_id = s.id AND g.start_date > '2015-04-30'
HAVING SUM(g.results) > 100
)</b>
GROUP BY s.id
ORDER BY dcount ASC
但我们还没有完成,现在我们将为 每个 元素使用 EXISTS
。这很奇怪,因为查询只依赖于 s.id
,所以它只依赖于 group,而不是单独的行。因此 可能 加速,但这取决于表的大小等,是将条件移动到 HAVING
语句:
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
GROUP BY s.id
ORDER BY dcount ASC
<b>HAVING</b> EXISTS (
SELECT 1
FROM gstats g
WHERE g.site_id = s.id AND g.start_date > '2015-04-30'
HAVING SUM(g.results) > 100
)
尝试将子查询移动到 FROM
子句:
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s JOIN
(SELECT g.site_id
FROM gstats g
WHERE g.start_date > '2015-04-30'
GROUP BY g.site_id
HAVING SUM(g.results) > 100
) g
ON g.site_id = s.site_id LEFT JOIN
deals d
ON s.id = d.site_id AND d.is_active = 1
WHERE s.is_active = 1
GROUP BY s.id
ORDER BY dcount ASC;
我假设您在 join
列上有索引。您可能还会发现这有助于提高性能:
SELECT s.id, s.name,
(SELECT COUNT(*)
FROM deals d
WHERE d.site_id = s.id AND d.is_active = 1
) as dcount
FROM sites s JOIN
(SELECT g.site_id
FROM gstats g
WHERE g.start_date > '2015-04-30'
GROUP BY g.site_id
HAVING SUM(g.results) > 100
) g
ON g.site_id = s.site_id
WHERE s.is_active = 1
ORDER BY dcount ASC;
对于此版本,您需要 deals(site_id, is_active)
上的索引。
查询看起来很好。我建议以下索引:
create index idx_gstats on gstats(start_date, results, site_id);
create index idx_deals1 on deals(is_active, site_id);
create index idx_deals2 on deals(site_id, is_active);
然后查看查询的执行计划并删除未使用的交易索引。