DBMS 优化器 - 最佳执行计划,无论查询的公式如何

DBMS optimizer - best execution plan, no matter the query's formulation

如果在关系型 DBMS 中编写查询 Q,优化器是否会选择最佳执行方式(取决于多种因素),无论您如何制定 Q?我对 SQL Server 和 Oracle 很好奇。

例如,设Q为:

SELECT * 
FROM t1, t2
WHERE t1.some_column = t2.some_column

如果存在正确的索引(具有正确的选择性),我们应该会看到索引查找之后可能是键查找。我们不会看到执行计划中选择的叉积。

那为什么https://technet.microsoft.com/en-us/library/ms189575(v=sql.105).aspx声明"In Transact-SQL, there is usually no performance difference between a statement that includes a subquery and a semantically equivalent version that does not. However, in some cases where existence must be checked, a join yields better performance. "不管你怎么写查询Q也不管Q的查询class(SPJ, SPJ + UNION, SPJ + subqueries等), 优化器不会找到最好的语义等效版本吗?

谢谢!

won't the optimizer choose the best way to execute it (depending on multiple factors) no matter how one formulates Q?

我想引用这本书中 Itzik Ben-Gan 的话:Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions

There are several reasons for this.

For one, SQL Server’s optimizer is not perfect. I don’t want to sound unappreciative—SQL Server’s optimizer is truly a marvel when you think of what this software component can achieve. But it’s a fact that it doesn’t have all possible optimization rules encoded within it.

Two, the optimizer has to limit the amount of time spent on optimization; otherwise, it could spend a much longer time optimizing a query than the amount of time the optimization shaves off from the run time of the query.

The situation could be as absurd as producing a plan in a matter of several dozen milliseconds without going over all possible plans and getting a run time of only seconds, but producing all possible plans in hopes of shaving off a couple of seconds might take a year or even several. You can see that, for practical reasons, the optimizer needs to limit the time spent on optimization.

Based on factors like the sizes of the tables involved in the query, SQL Server calculates two values: one is a cost consid- ered good enough for the query, and the other is the maximum amount of time to spend on optimization before stopping. If either threshold is reached, optimization stops, and SQL Server uses the best plan found at that point.

综上所述,优化的语句少,没优化的语句少

绝对不是。大多数时候,这将是最好的方法之一,是的,但总是最好的吗?不。 优化器必须处理应用于包含任何数据的任何模式的任何语句。具有完全相同逻辑(始终响应相同数据结果)的两个不同查询可能会有不同的执行计划。

对于非平凡的查询,它很可能不会为您提供最优化的执行计划。一个原因是找到最佳优化查询重写是一个 np-hard 问题。例如,成本最小化的连接排序被认为是 np-hard(从 n 个节点可能生成的树的数量是 n^(n-2) Cayley's formula),成本函数是启发式的(基于基数、稀疏性、存储模型等属性...)。连接排序只是连接优化工作的一个子集,它本身是整个查询优化工作的一个子集。