在 CTE 的 Hana 查询中,为什么类型转换在 WHERE 子句中失败(尽管它在 SELECT 子句中成功)

In Hana query of a CTE, why does type conversion fail in WHERE clause (though it succeeds in SELECT clause)

下面的 Hana 查询在没有 WHERE 子句的情况下工作并且 return 总和为 60。使用 WHERE 子句时,它会收到此错误:“SQL 错误 [339] [HY000]:SAP DBTech JDBC:[339]:无效数字:“to_decimal”函数(位置 538)处的数字字符串“10,30”无效。

WITH details AS (
  SELECT 2 AS n_vals, '10,30' AS vals FROM dummy UNION
  SELECT 1 AS n_vals, '10'    AS vals FROM dummy UNION
  SELECT 1 AS n_vals, '20'    AS vals FROM dummy UNION
  SELECT 1 AS n_vals, '30'    AS vals FROM dummy
)
,one_value AS (
  SELECT n_vals, vals
  FROM details
  WHERE n_vals=1
)
SELECT sum(to_number(vals)) AS total FROM one_value
WHERE to_number(vals)>=20
;

这个问题已被交叉post编辑并在此处回答,in the SAP Community

总结

由于 HANA 中的错误,示例语句失败。
当使用HANA ROW引擎执行语句时出现此错误。

(此错误与在 HANA 中使用 CTE 直接相关 - 正如下面的反例所示,使用 HEX 引擎的成功查询也使用了 CTE。原因使用 ROW 引擎的失败 OP 查询是从 DUMMY table 中选择的值,它在 HANA 中作为 ROWSTORE table 实现)

如果相同的值存储在列存储 table,则使用 HEX(或 COLUMN 引擎)执行查询。

在这种情况下,该语句可以正常运行。 成功执行的执行计划显示“仅在过滤后应用转换”步骤(使用 ROW 引擎时不存在)。

如果使用 ROW 引擎 - 如提供的示例中的情况 - 查询执行会遇到转换错误

HANA 2.00.54 已报告此错误。


分析/背景信息

要重现问题和工作版本,重要的是要看到示例查询使用 HANA 中的 ROW 引擎。这可以通过检查 执行计划 PlanViz 跟踪看到。

通过将源数据存储在列存储中table,可以通过HEX and/or COLUMN 计算查询 引擎。

CURRENT_DATE    SYSTEM_ID   DATABASE_NAME   HOST    START_TIME                      VERSION                 USAGE      
26 July 2021    HXE         HXE             hxehost 26 July 2021, 8:35:10.037 pm    2.00.054.00.1611906357  DEVELOPMENT

create row table r_dat as (
SELECT 2 AS n_vals, '10,30' AS vals FROM dummy UNION
  SELECT 1 AS n_vals, '10'    AS vals FROM dummy UNION
  SELECT 1 AS n_vals, '20'    AS vals FROM dummy UNION
  SELECT 1 AS n_vals, '30'    AS vals FROM dummy);
  
create column table c_dat as (
SELECT 2 AS n_vals, '10,30' AS vals FROM dummy UNION
  SELECT 1 AS n_vals, '10'    AS vals FROM dummy UNION
  SELECT 1 AS n_vals, '20'    AS vals FROM dummy UNION
  SELECT 1 AS n_vals, '30'    AS vals FROM dummy);
  
with one_value AS (
  SELECT n_vals, vals
  FROM r_dat
  WHERE n_vals=1
)
SELECT sum(to_number(vals)) AS total FROM one_value
WHERE to_number(vals)>=20
;

此查询使用 行存储 table 和 ROW 引擎 并且失败:

SAP DBTech JDBC: [339]: invalid number: not a valid number string '10,30' at "to_decimal" function (at pos 132) 


OPERATOR_NAME   OPERATOR_DETAILS                                       OPERATOR_PROPERTIES EXECUTION_ENGINE
ROW SEARCH      SUM(TO_DECIMAL(ONE_VALUE.VALS))                                            ROW
  AGGREGATION   AGGREGATION: SUM(TO_DECIMAL(ONE_VALUE.VALS))                               ROW
    TABLE SCAN  FILTER CONDITION: 
                R_DAT.N_VALS = 1 AND TO_DECIMAL(ONE_VALUE.VALS) >= 20                      ROW

另一方面,数据在 列存储 table 中的版本有效:

with one_value AS (
  SELECT n_vals, vals
  FROM c_dat
  WHERE n_vals=1
)
SELECT sum(to_number(vals)) AS total FROM one_value
WHERE to_number(vals)>=20

这个有效...在 HEX 引擎

中进行了很好的“post-过滤器”转换
OPERATOR_NAME       OPERATOR_DETAILS                                        OPERATOR_PROPERTIES EXECUTION_ENGINE
PROJECT             TOTAL                                                                       HEX 
  AGGREGATION       AGGREGATION: SUM(TO_DECIMAL(ONE_VALUE.VALS))                                HEX
    COLUMN TABLE    FILTER CONDITION: 
                    C_DAT.N_VALS = 1 AND TO_DECIMAL(ONE_VALUE.VALS) >= 20 
                    (DETAIL: ([SCAN] C_DAT.N_VALS = 1) 
                    AND ([POST-FILTER] TO_DECIMAL(ONE_VALUE.VALS) >= 20))                       HEX

将此与 Oracle 中的执行进行比较

OP 报告查询在 Oracle 中运行良好。 这可以通过检查 Oracle 执行计划来验证。

(http://sqlfiddle.com/#!4/e2ac5e/2350)

WITH details AS (
  SELECT 2 AS n_vals, '10,30' AS vals FROM dual UNION
  SELECT 1 AS n_vals, '10'    AS vals FROM dual UNION
  SELECT 1 AS n_vals, '20'    AS vals FROM dual UNION
  SELECT 1 AS n_vals, '30'    AS vals FROM dual
)
, one_value AS (
  SELECT n_vals, vals
  FROM details
  WHERE n_vals=1
)
SELECT sum(to_number(vals)) AS total 
FROM one_value
WHERE to_number(vals)>=20;


 Plan Hash Value  : 2738139054 

------------------------------------------------------------------
| Id  | Operation        | Name | Rows | Bytes | Cost | Time     |
------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |    1 |     7 |    8 | 00:00:01 |
|   1 |   SORT AGGREGATE |      |    1 |     7 |      |          |
|   2 |    VIEW          |      |    4 |    28 |    8 | 00:00:01 |
|   3 |     SORT UNIQUE  |      |    4 |       |    8 | 00:00:01 |
|   4 |      UNION-ALL   |      |      |       |      |          |
| * 5 |       FILTER     |      |      |       |      |          |
|   6 |        FAST DUAL |      |    1 |       |    2 | 00:00:01 |
| * 7 |       FILTER     |      |      |       |      |          |
|   8 |        FAST DUAL |      |    1 |       |    2 | 00:00:01 |
|   9 |       FAST DUAL  |      |    1 |       |    2 | 00:00:01 |
|  10 |       FAST DUAL  |      |    1 |       |    2 | 00:00:01 |
------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(NULL IS NOT NULL AND TO_NUMBER('10,30')>=20)
* 7 - filter(NULL IS NOT NULL)

执行计划显示操作 5 (FILTER) 实际上正在尝试将 TO_NUMBER 转换应用于 10,30 值。

但它也在同一操作中显示了一个 NULL IS NOT NULL 表达式。
此(短路)计算结果为 FALSE,并使整个 filter 表达式计算结果为 FALSE。实际上,这避免了字符串的转换。

NULL IS NOT NULL 表达式不是原始 SQL 的一部分,而是通过 Oracle 中的查询重写引入的。类似于成功的HANA计划中的[POST-FILTER]操作。


旁白:HANA 文本到十进制的转换

在第一次回答这个问题时,似乎确实打算将带有.的字符串转换为十进制数。虽然情况并非如此,但下面解释了 HANA 如何处理这种转换。

从字符类型到数字类型的 HANA 类型转换只接受 full-stop/dot (.) 作为小数点分隔符。

稍作试验表明允许的模式看起来有点像这样:

(((\+|\-)?([[:digit:]])*(\.([[:digit:]])*)?))|((([[:digit:]])*(\.([[:digit:]])*)?(\+|\-)?))

因此,在开头(不包括)或可能有一个单加号或减号字符串的结尾,后跟数字。

数字之间可以有 一个点,并且可以省略前导零。

但不允许使用多个点、其他分隔符(如逗号)或货币符号。

如果数据使用逗号作为小数点分隔符(可能使用点作为千位分隔符),则需要在字符串到数字转换之前替换这些字符串:

SELECT 
    to_decimal (
        REPLACE (
                REPLACE ('1.219.323,34', '.', '')
                        , ',' , '.'
                )
    )
FROM DUMMY

这个双重替换首先完全删除点,然后将剩余的逗号变成点 (.)。

如果此转换必须发生多次,可能需要检查是否可以将其转换为函数或相应地更改源数据。