当 regexp_like 和 regexp_extract 工作正常时,Impala regexp_like 查询返回 null

Impala regexp_like query returning null when regexp_like and regexp_extract works fine

我需要使用 regex_extract 从列中的字符串中提取数字。我在外部 Table 上使用 Impala。

我已经检查了正则表达式,为了测试它,我还使用了regexp_likeregexp_replace.他们两个都工作得很完美。

这里是查询:

select 
    sucursal,
    regexp_like(sucursal,'^[0-9]{1,3}') as match,
    regexp_extract(sucursal,'^[0-9]{1,3}',1) as CodSucusal,
    regexp_replace(sucursal,'^[0-9]{1,3}','lala') as RepCodSucusal
from jdv.stg_devoluciones limit 5;

这是结果:

+-------------------+-------+------------+--------------------+
| sucursal          | match | codsucusal | repcodsucusal      |
+-------------------+-------+------------+--------------------+
| 124 NAVOJOA       | true  |            | lala NAVOJOA       |
| 73 BOCA DEL RIO   | true  |            | lala BOCA DEL RIO  |
| 964 JIUTEPEC      | true  |            | lala JIUTEPEC      |
| 456 TEQUISQUIAPAN | true  |            | lala TEQUISQUIAPAN |
| 212 LANDIN        | true  |            | lala LANDIN        |
+-------------------+-------+------------+--------------------+

codsucursal 应该是 sucursal 编号,但 regexp_extract 正在返回 null相反

预期结果:

+-------------------+-------+------------+--------------------+
| sucursal          | match | codsucusal | repcodsucusal      |
+-------------------+-------+------------+--------------------+
| 124 NAVOJOA       | true  |   124      | lala NAVOJOA       |
| 73 BOCA DEL RIO   | true  |   73       | lala BOCA DEL RIO  |
| 964 JIUTEPEC      | true  |   964      | lala JIUTEPEC      |
| 456 TEQUISQUIAPAN | true  |   456      | lala TEQUISQUIAPAN |
| 212 LANDIN        | true  |   212      | lala LANDIN        |
+-------------------+-------+------------+--------------------+

我做错了什么?

Impala regexp_extract function 将捕获组的索引或整个匹配的 0 作为第三个参数。

regexp_extract(string subject, string pattern, int index)
Purpose: Returns the specified () group from a string based on a regular expression pattern. Group 0 refers to the entire extracted string, while group 1, 2, and so on refers to the first, second, and so on (...) portion.

由于您的正则表达式 - ^[0-9]{1,3} - 没有定义捕获组,您应该使用 0 作为第三个参数来引用整个匹配值:

regexp_extract(sucursal,'^[0-9]{1,3}', 0) as CodSucusal