SQL:正则表达式在 Snowflake 的元数据中获取 table 名称
SQL: RegEx to grab table names in metadata on Snowflake
我有 table 个关于 Snowflake 的查询,我正在尝试使用 RegEx 解析以获取所有 table 个名称:
PARENTID
TABLE_NAME
01
Select ... From V820LF.IIM as IIM Join V820LUF.IIME as IIME on IIM.IPROD = IIME.IMPROD Left join (Select CCCODE, CCDESC, CCNOT1 from V820LF.ZCC
02
Select .. As From V820LF.IIM as IIM Join V820LUF.IIME as IIME on IIM.IPROD = IIME.IMPROD Left join (Select CCCODE, CCDESC From V820LF.ZCC Where CCTABL = 'SIRF1') as C On IIM.IREF01 = C.CCCODE Left outer join (Select CCCODE, CCDESC From V820LF.ZCC Where CCTABL = 'SIRF2') as SC On IIM.IREF02 = SC.CCCODE Left outer join (Select CCCODE, CCDESC From V820LF.ZCC Where CCTABL = 'SIRF3') as ISC On IIM.IREF03 = ISC.CCCODE Left outer join (Select CCCODE, CCDESC From V820LF.ZCC Where CCTABL = 'SIRF4') as PG On IIM.IREF04 = PG.CCCODE Left outer join (Select CCCODE, CCDESC From V820LF.ZCC Where CCTABL = 'USER0107') as PT On IIM.IFCI = PT.CCCODE Left outer join (Select CCCODE, CCDESC from V820LF.ZCC Where CCTABL = 'PRDBRAND') as B On IIME.IMBRND = B.CCCODE Left outer join (Select CCCODE, CCDESC from V820LF.ZCC Where CCTABL = 'PRDBRGRP') as BG On IIME.IMBRGP = BG.CCCODE Left outer join (Select CCCODE, CCDESC, CCSDSC from V820LF.ZCC Where CCTABL = 'ITEMDISC') as DC On IIM.IDISC = DC.CCCODE Left outer join (Select CCCODE, CCDESC, CCNOT1 from V820LF.ZCC Where CCTABL = 'PRDSSTRM') as SS On IIME.IMSSTM = SS.CCCODE
03
Select ...
with data as
(select
parentid,
table_name
FROM
prod.log_analytics.metadata),
froms as
(select any_value(data.parentid) parentid, listagg(regexp_substr(value, '\.[^\.]+\.'), ' ') dependencies
from data, table(split_to_table(upper(table_name), 'FROM '))
where index>1
group by seq)
SELECT * FROM froms;
但是我的RegEx格式不对,输出:
PARENTID
DEPENDENCIES
01
V820LF. V820LUF. V820LF.
02
V820LF. V820LUF. V820LF. V820LF. V820LF. V820LF. V820LF. V820LF. V820LF. V820LF.
如何调整正则表达式以在句点前后都抓取?
计划对此进行调整以捕获更复杂的条件,如隐式连接、子查询连接等,但我想知道是否可以用几个 RegEx 而不是多个查询来完成这一切?
我认为使用 access_history 视图找出依赖关系会更好:
https://docs.snowflake.com/en/user-guide/access-history.html
PS:需要企业版
这将抓取 'FROM '
之后期间前后的所有内容
'\.[^\.]+\.[^.][^ ;\n()]*')
其中 [^ ;\n()]*
不是 space、分号、括号或换行符(多亏了这个 post )
PARENTID
DEPENDENCIES
01
V820LF.IIM V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC
02
V820LF.IIM V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC
03
V820LF.RCM V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC
04
V820LF.RCM V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC
我有 table 个关于 Snowflake 的查询,我正在尝试使用 RegEx 解析以获取所有 table 个名称:
PARENTID | TABLE_NAME |
---|---|
01 | Select ... From V820LF.IIM as IIM Join V820LUF.IIME as IIME on IIM.IPROD = IIME.IMPROD Left join (Select CCCODE, CCDESC, CCNOT1 from V820LF.ZCC |
02 | Select .. As From V820LF.IIM as IIM Join V820LUF.IIME as IIME on IIM.IPROD = IIME.IMPROD Left join (Select CCCODE, CCDESC From V820LF.ZCC Where CCTABL = 'SIRF1') as C On IIM.IREF01 = C.CCCODE Left outer join (Select CCCODE, CCDESC From V820LF.ZCC Where CCTABL = 'SIRF2') as SC On IIM.IREF02 = SC.CCCODE Left outer join (Select CCCODE, CCDESC From V820LF.ZCC Where CCTABL = 'SIRF3') as ISC On IIM.IREF03 = ISC.CCCODE Left outer join (Select CCCODE, CCDESC From V820LF.ZCC Where CCTABL = 'SIRF4') as PG On IIM.IREF04 = PG.CCCODE Left outer join (Select CCCODE, CCDESC From V820LF.ZCC Where CCTABL = 'USER0107') as PT On IIM.IFCI = PT.CCCODE Left outer join (Select CCCODE, CCDESC from V820LF.ZCC Where CCTABL = 'PRDBRAND') as B On IIME.IMBRND = B.CCCODE Left outer join (Select CCCODE, CCDESC from V820LF.ZCC Where CCTABL = 'PRDBRGRP') as BG On IIME.IMBRGP = BG.CCCODE Left outer join (Select CCCODE, CCDESC, CCSDSC from V820LF.ZCC Where CCTABL = 'ITEMDISC') as DC On IIM.IDISC = DC.CCCODE Left outer join (Select CCCODE, CCDESC, CCNOT1 from V820LF.ZCC Where CCTABL = 'PRDSSTRM') as SS On IIME.IMSSTM = SS.CCCODE |
03 | Select ... |
with data as
(select
parentid,
table_name
FROM
prod.log_analytics.metadata),
froms as
(select any_value(data.parentid) parentid, listagg(regexp_substr(value, '\.[^\.]+\.'), ' ') dependencies
from data, table(split_to_table(upper(table_name), 'FROM '))
where index>1
group by seq)
SELECT * FROM froms;
但是我的RegEx格式不对,输出:
PARENTID | DEPENDENCIES |
---|---|
01 | V820LF. V820LUF. V820LF. |
02 | V820LF. V820LUF. V820LF. V820LF. V820LF. V820LF. V820LF. V820LF. V820LF. V820LF. |
如何调整正则表达式以在句点前后都抓取?
计划对此进行调整以捕获更复杂的条件,如隐式连接、子查询连接等,但我想知道是否可以用几个 RegEx 而不是多个查询来完成这一切?
我认为使用 access_history 视图找出依赖关系会更好:
https://docs.snowflake.com/en/user-guide/access-history.html
PS:需要企业版
这将抓取 'FROM '
'\.[^\.]+\.[^.][^ ;\n()]*')
其中 [^ ;\n()]*
不是 space、分号、括号或换行符(多亏了这个 post
PARENTID | DEPENDENCIES |
---|---|
01 | V820LF.IIM V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC |
02 | V820LF.IIM V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC |
03 | V820LF.RCM V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC |
04 | V820LF.RCM V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC V820LF.ZCC |