Hive中使用RLIKE时如何写模糊多子串匹配
How to write fuzzy multiple substring matching when using RLIKE in Hive
例如:
df.select('category').show()
+---------------------------+
| category|
+---------------------------+
| money,insurance|
| life, housework|
| game,FPS,network|
| game,fight,jump|
| hotel|
| trip,hotel|
| null|
我想使用 RLIKE
编写一个正则表达式来模糊匹配子字符串列表之一,['money', 'life']
。
-- This is an exact match
SELECT *
FROM tb_name
WHERE col_name RLIKE '(money|life)'
-- This is a fuzzy match
SELECT *
FROM tb_name
WHERE col_name RLIKE '*.(money|life)'
但是模糊匹配代码段中的ast树有错误。
06-11 16:59:17-fatal filter ast tree
(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TAB tb_name))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR "hdfs://XXXX/XX")) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (RLIKE (TOK_TABLE_OR_COL col_name ) '*.(money|life)')) (TOK_LIMIT 2000)))
06-11 16:59:17-fatal Filter feature: .TOK_TAB \S tdw_inter_db.*|.TOK_(CUBE|ROLLUP) .
所以我看不出 模糊匹配 代码片段有什么问题。
那么有人可以帮助我吗?
提前致谢。
'(?i)money|life'
正则表达式将匹配包含任何 money
、life
的字符串,不区分大小写 - (?i)
例如:
df.select('category').show()
+---------------------------+
| category|
+---------------------------+
| money,insurance|
| life, housework|
| game,FPS,network|
| game,fight,jump|
| hotel|
| trip,hotel|
| null|
我想使用 RLIKE
编写一个正则表达式来模糊匹配子字符串列表之一,['money', 'life']
。
-- This is an exact match
SELECT *
FROM tb_name
WHERE col_name RLIKE '(money|life)'
-- This is a fuzzy match
SELECT *
FROM tb_name
WHERE col_name RLIKE '*.(money|life)'
但是模糊匹配代码段中的ast树有错误。
06-11 16:59:17-fatal filter ast tree
(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TAB tb_name))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR "hdfs://XXXX/XX")) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (RLIKE (TOK_TABLE_OR_COL col_name ) '*.(money|life)')) (TOK_LIMIT 2000)))
06-11 16:59:17-fatal Filter feature: .TOK_TAB \S tdw_inter_db.*|.TOK_(CUBE|ROLLUP) .
所以我看不出 模糊匹配 代码片段有什么问题。
那么有人可以帮助我吗?
提前致谢。
'(?i)money|life'
正则表达式将匹配包含任何 money
、life
的字符串,不区分大小写 - (?i)