当 regexp_like 和 regexp_extract 工作正常时,Impala regexp_like 查询返回 null
Impala regexp_like query returning null when regexp_like and regexp_extract works fine
我需要使用 regex_extract 从列中的字符串中提取数字。我在外部 Table 上使用 Impala。
我已经检查了正则表达式,为了测试它,我还使用了regexp_like和regexp_replace.他们两个都工作得很完美。
这里是查询:
select
sucursal,
regexp_like(sucursal,'^[0-9]{1,3}') as match,
regexp_extract(sucursal,'^[0-9]{1,3}',1) as CodSucusal,
regexp_replace(sucursal,'^[0-9]{1,3}','lala') as RepCodSucusal
from jdv.stg_devoluciones limit 5;
这是结果:
+-------------------+-------+------------+--------------------+
| sucursal | match | codsucusal | repcodsucusal |
+-------------------+-------+------------+--------------------+
| 124 NAVOJOA | true | | lala NAVOJOA |
| 73 BOCA DEL RIO | true | | lala BOCA DEL RIO |
| 964 JIUTEPEC | true | | lala JIUTEPEC |
| 456 TEQUISQUIAPAN | true | | lala TEQUISQUIAPAN |
| 212 LANDIN | true | | lala LANDIN |
+-------------------+-------+------------+--------------------+
codsucursal 应该是 sucursal 编号,但 regexp_extract 正在返回 null相反
预期结果:
+-------------------+-------+------------+--------------------+
| sucursal | match | codsucusal | repcodsucusal |
+-------------------+-------+------------+--------------------+
| 124 NAVOJOA | true | 124 | lala NAVOJOA |
| 73 BOCA DEL RIO | true | 73 | lala BOCA DEL RIO |
| 964 JIUTEPEC | true | 964 | lala JIUTEPEC |
| 456 TEQUISQUIAPAN | true | 456 | lala TEQUISQUIAPAN |
| 212 LANDIN | true | 212 | lala LANDIN |
+-------------------+-------+------------+--------------------+
我做错了什么?
Impala regexp_extract
function 将捕获组的索引或整个匹配的 0
作为第三个参数。
regexp_extract(string subject, string pattern, int index)
Purpose: Returns the specified () group from a string based on a regular expression pattern. Group 0 refers to the entire extracted string, while group 1, 2, and so on refers to the first, second, and so on (...)
portion.
由于您的正则表达式 - ^[0-9]{1,3}
- 没有定义捕获组,您应该使用 0
作为第三个参数来引用整个匹配值:
regexp_extract(sucursal,'^[0-9]{1,3}', 0) as CodSucusal
我需要使用 regex_extract 从列中的字符串中提取数字。我在外部 Table 上使用 Impala。
我已经检查了正则表达式,为了测试它,我还使用了regexp_like和regexp_replace.他们两个都工作得很完美。
这里是查询:
select
sucursal,
regexp_like(sucursal,'^[0-9]{1,3}') as match,
regexp_extract(sucursal,'^[0-9]{1,3}',1) as CodSucusal,
regexp_replace(sucursal,'^[0-9]{1,3}','lala') as RepCodSucusal
from jdv.stg_devoluciones limit 5;
这是结果:
+-------------------+-------+------------+--------------------+
| sucursal | match | codsucusal | repcodsucusal |
+-------------------+-------+------------+--------------------+
| 124 NAVOJOA | true | | lala NAVOJOA |
| 73 BOCA DEL RIO | true | | lala BOCA DEL RIO |
| 964 JIUTEPEC | true | | lala JIUTEPEC |
| 456 TEQUISQUIAPAN | true | | lala TEQUISQUIAPAN |
| 212 LANDIN | true | | lala LANDIN |
+-------------------+-------+------------+--------------------+
codsucursal 应该是 sucursal 编号,但 regexp_extract 正在返回 null相反
预期结果:
+-------------------+-------+------------+--------------------+
| sucursal | match | codsucusal | repcodsucusal |
+-------------------+-------+------------+--------------------+
| 124 NAVOJOA | true | 124 | lala NAVOJOA |
| 73 BOCA DEL RIO | true | 73 | lala BOCA DEL RIO |
| 964 JIUTEPEC | true | 964 | lala JIUTEPEC |
| 456 TEQUISQUIAPAN | true | 456 | lala TEQUISQUIAPAN |
| 212 LANDIN | true | 212 | lala LANDIN |
+-------------------+-------+------------+--------------------+
我做错了什么?
Impala regexp_extract
function 将捕获组的索引或整个匹配的 0
作为第三个参数。
regexp_extract(string subject, string pattern, int index)
Purpose: Returns the specified () group from a string based on a regular expression pattern. Group 0 refers to the entire extracted string, while group 1, 2, and so on refers to the first, second, and so on(...)
portion.
由于您的正则表达式 - ^[0-9]{1,3}
- 没有定义捕获组,您应该使用 0
作为第三个参数来引用整个匹配值:
regexp_extract(sucursal,'^[0-9]{1,3}', 0) as CodSucusal