Snowflake SQL 查询在尝试应用多个连接时 运行 花费了太多时间
Snowflake SQL Query taking too much time to run when trying to apply multiple joins
我正在尝试在 snowflake 上使用 sql 查询,我试图在其中加入多个 table,但我的查询永远需要运行,我不确定这是我的查询还是我采取了一些错误的方法。
我在下面 tables 雪花中有这些 -
1)RR_SUMM, 2) YY_TEXTENTR, 3) KK_SUBEVT, 4) LG_NBETR, 5) XX_RPOPO, 6) VV_KJIU, 7) LL_JJHHHIP, 8) UU_GHGGHJ,
9) QQ_BHBHGGG, 10) TT_HJHHSY
所以RR_SUMM是我的主要table
并且每个 table 都包含一个标记为 "_ID" 的公共列
我的目标是使用 _ID[将所有其他 9 个 table 与主 table RR_SUMM 连接起来列
因为我正在尝试从每个 table 中提取某些字段并将其与主要字段 table 合并。
我正在按照应用左外连接的方法将所有其他 table 与主 table RR_SUMM
但我的方法是 运行,因为大多数 table 的大小约为 25 GB。
SQL 我在SNOWFLAKE 中写的查询如下-
INSERT INTO "PRD"."POST"."_ALL_EVENTS"
SELECT
DISTINCT SUMMARY._ID,
SUMMARY.GEP_ID,
SUMMARY.AK_CD,
SUMMARY.AK1_CD,
SUMMARY.AK2_CD,
SUMMARY.JJ_DT,
SUMMARY.IL_OVRD,
SUMMARY.STRT_DT,
SUMMARY.EVENT_DT,
SUMMARY.PUNLICATION_DT,
SUMMARY.END_DT,
SUMMARY.END_1DT,
SUMMARY.OO_IND,
SUMMARY.EXPN_DT,
SUMMARY.STATHJJ_CD,
SUMMARY.STATHJJ_DT,
SUMMARY.ERSK_CD,
SUMMARY.DSRP_NBR,
SUMMARY.LNBR,
SUMMARY.LK_REF,
SUMMARY.OOLDESC_CD,
SUMMARY.LMN_CD,
TEXT.UTXCT,
TEXT.GL_CD,
SB.MN_CD,
SB.MN_DT,
SB.EVTEXT,
SB._START_DATE,
SB._END_DATE,
RE.RRONBT,
RE.NN_CD,
RP.RP_CD,
RP.RP_T_CD,
RP.RNME,
PP.FNBR,
PP.FDESC_CD,
IP.FL_DT,
IP.FL_DTTYPCD,
XP.JJ_DT,
XP.OO_CD,
OP.ORG_REF,
OP.FL_NBR,
KP.EVK_CD,
KP.EVJK_DT
FROM
"PRD"."POST"."RR_SUMM" SUMMARY
LEFT OUTER JOIN
"PRD"."POST"."YY_TEXTENTR" TEXT
ON TEXT._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."KK_SUBEVT" SB
ON SB._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."LG_NBETR" RE
ON RE._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."XX_RPOPO" RP
ON RP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."VV_KJIU" PP
ON PP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."LL_JJHHHIP" IP
ON IP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."UU_GHGGHJ" XP
ON XP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."QQ_BHBHGGG" OP
ON OP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."TT_HJHHSY" KP
ON KP._ID = SUMMARY._ID
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44;
请让我知道是否有任何其他方法可以解决我的问题,我可以快速按照 运行 我的查询。我无法限制我的数据,因为我需要所有数据进行分析。
任何帮助将不胜感激。
谢谢
在一个级别上,所有列的 GROUPING 与 DISTINCT 相同。
但是鉴于您要将其全部汇总,为了只获得一个,您可以将 DISTINCT 推入查询中,并且连接应该没有重复值
SELECT
summary._id,
summary.gep_id,
summary.ak_cd,
summary.ak1_cd,
summary.ak2_cd,
summary.jj_dt,
summary.il_ovrd,
summary.strt_dt,
summary.event_dt,
summary.punlication_dt,
summary.end_dt,
summary.end_1dt,
summary.oo_ind,
summary.expn_dt,
summary.stathjj_cd,
summary.stathjj_dt,
summary.ersk_cd,
summary.dsrp_nbr,
summary.lnbr,
summary.lk_ref,
summary.ooldesc_cd,
summary.lmn_cd,
text.utxct,
text.gl_cd,
sb.mn_cd,
sb.mn_dt,
sb.evtext,
sb._start_date,
sb._end_date,
re.rronbt,
re.nn_cd,
rp.rp_cd,
rp.rp_t_cd,
rp.rnme,
pp.fnbr,
pp.fdesc_cd,
ip.fl_dt,
ip.fl_dttypcd,
xp.jj_dt,
xp.oo_cd,
op.org_ref,
op.fl_nbr,
kp.evk_cd,
kp.evjk_dt
FROM (
SELECT DISTINCT
summary._id,
summary.gep_id,
summary.ak_cd,
summary.ak1_cd,
summary.ak2_cd,
summary.jj_dt,
summary.il_ovrd,
summary.strt_dt,
summary.event_dt,
summary.punlication_dt,
summary.end_dt,
summary.end_1dt,
summary.oo_ind,
summary.expn_dt,
summary.stathjj_cd,
summary.stathjj_dt,
summary.ersk_cd,
summary.dsrp_nbr,
summary.lnbr,
summary.lk_ref,
summary.ooldesc_cd,
summary.lmn_cd
FROM prd.post.rr_summ AS summary
) AS summary
LEFT OUTER JOIN (
SELECT DISTINCT
text._id
text.utxct,
text.gl_cd
FROM prd.post.yy_textentr AS text
) AS text
ON text._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
sb._id
sb.mn_cd,
sb.mn_dt,
sb.evtext,
sb._start_date,
sb._end_date
FROM prd.post.kk_subevt AS sb
) AS sb
ON sb._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
re._id
re.rronbt,
re.nn_cd
FROM prd.post.lg_nbetr AS re
) AS re
ON re._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
rp._id
rp.rp_cd,
rp.rp_t_cd,
rp.rnme
FROM
prd.post.xx_rpopo AS rp
) AS rp
ON rp._id = summary._id
LEFT OUTER JOIN(
SELECT DISTINCT
pp._id
pp.fnbr,
pp.fdesc_cd
FROM prd.post.vv_kjiu AS pp
) AS pp
ON pp._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
ip._id
ip.fl_dt,
ip.fl_dttypcd
FROM prd.post.ll_jjhhhip AS ip
) AS ip
ON ip._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
xp._id
xp.jj_dt,
xp.oo_cd
FROM prd.post.uu_ghgghj AS xp
) AS xp
ON xp._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
op._id,
op.org_ref,
op.fl_nbr
FROM prd.post.qq_bhbhggg AS op
) AS op
ON op._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
kp._id
kp.evk_cd,
kp.evjk_dt
FROM prd.post.tt_hjhhsy AS kp
) AS kp
ON kp._id = summary._id;
所以这应该快得多。
查询 运行 缓慢的原因有很多,JOIN ORDER、数据倾斜、错误的基数估计、仓库大小等等。
因为你有这么多 JOIN,如果不查看查询配置文件,很难说。
最好的方法是打开支持票以供审核。
我正在尝试在 snowflake 上使用 sql 查询,我试图在其中加入多个 table,但我的查询永远需要运行,我不确定这是我的查询还是我采取了一些错误的方法。
我在下面 tables 雪花中有这些 -
1)RR_SUMM, 2) YY_TEXTENTR, 3) KK_SUBEVT, 4) LG_NBETR, 5) XX_RPOPO, 6) VV_KJIU, 7) LL_JJHHHIP, 8) UU_GHGGHJ,
9) QQ_BHBHGGG, 10) TT_HJHHSY
所以RR_SUMM是我的主要table
并且每个 table 都包含一个标记为 "_ID" 的公共列
我的目标是使用 _ID[将所有其他 9 个 table 与主 table RR_SUMM 连接起来列
因为我正在尝试从每个 table 中提取某些字段并将其与主要字段 table 合并。
我正在按照应用左外连接的方法将所有其他 table 与主 table RR_SUMM
但我的方法是 运行,因为大多数 table 的大小约为 25 GB。
SQL 我在SNOWFLAKE 中写的查询如下-
INSERT INTO "PRD"."POST"."_ALL_EVENTS"
SELECT
DISTINCT SUMMARY._ID,
SUMMARY.GEP_ID,
SUMMARY.AK_CD,
SUMMARY.AK1_CD,
SUMMARY.AK2_CD,
SUMMARY.JJ_DT,
SUMMARY.IL_OVRD,
SUMMARY.STRT_DT,
SUMMARY.EVENT_DT,
SUMMARY.PUNLICATION_DT,
SUMMARY.END_DT,
SUMMARY.END_1DT,
SUMMARY.OO_IND,
SUMMARY.EXPN_DT,
SUMMARY.STATHJJ_CD,
SUMMARY.STATHJJ_DT,
SUMMARY.ERSK_CD,
SUMMARY.DSRP_NBR,
SUMMARY.LNBR,
SUMMARY.LK_REF,
SUMMARY.OOLDESC_CD,
SUMMARY.LMN_CD,
TEXT.UTXCT,
TEXT.GL_CD,
SB.MN_CD,
SB.MN_DT,
SB.EVTEXT,
SB._START_DATE,
SB._END_DATE,
RE.RRONBT,
RE.NN_CD,
RP.RP_CD,
RP.RP_T_CD,
RP.RNME,
PP.FNBR,
PP.FDESC_CD,
IP.FL_DT,
IP.FL_DTTYPCD,
XP.JJ_DT,
XP.OO_CD,
OP.ORG_REF,
OP.FL_NBR,
KP.EVK_CD,
KP.EVJK_DT
FROM
"PRD"."POST"."RR_SUMM" SUMMARY
LEFT OUTER JOIN
"PRD"."POST"."YY_TEXTENTR" TEXT
ON TEXT._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."KK_SUBEVT" SB
ON SB._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."LG_NBETR" RE
ON RE._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."XX_RPOPO" RP
ON RP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."VV_KJIU" PP
ON PP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."LL_JJHHHIP" IP
ON IP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."UU_GHGGHJ" XP
ON XP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."QQ_BHBHGGG" OP
ON OP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."TT_HJHHSY" KP
ON KP._ID = SUMMARY._ID
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44;
请让我知道是否有任何其他方法可以解决我的问题,我可以快速按照 运行 我的查询。我无法限制我的数据,因为我需要所有数据进行分析。
任何帮助将不胜感激。
谢谢
在一个级别上,所有列的 GROUPING 与 DISTINCT 相同。
但是鉴于您要将其全部汇总,为了只获得一个,您可以将 DISTINCT 推入查询中,并且连接应该没有重复值
SELECT
summary._id,
summary.gep_id,
summary.ak_cd,
summary.ak1_cd,
summary.ak2_cd,
summary.jj_dt,
summary.il_ovrd,
summary.strt_dt,
summary.event_dt,
summary.punlication_dt,
summary.end_dt,
summary.end_1dt,
summary.oo_ind,
summary.expn_dt,
summary.stathjj_cd,
summary.stathjj_dt,
summary.ersk_cd,
summary.dsrp_nbr,
summary.lnbr,
summary.lk_ref,
summary.ooldesc_cd,
summary.lmn_cd,
text.utxct,
text.gl_cd,
sb.mn_cd,
sb.mn_dt,
sb.evtext,
sb._start_date,
sb._end_date,
re.rronbt,
re.nn_cd,
rp.rp_cd,
rp.rp_t_cd,
rp.rnme,
pp.fnbr,
pp.fdesc_cd,
ip.fl_dt,
ip.fl_dttypcd,
xp.jj_dt,
xp.oo_cd,
op.org_ref,
op.fl_nbr,
kp.evk_cd,
kp.evjk_dt
FROM (
SELECT DISTINCT
summary._id,
summary.gep_id,
summary.ak_cd,
summary.ak1_cd,
summary.ak2_cd,
summary.jj_dt,
summary.il_ovrd,
summary.strt_dt,
summary.event_dt,
summary.punlication_dt,
summary.end_dt,
summary.end_1dt,
summary.oo_ind,
summary.expn_dt,
summary.stathjj_cd,
summary.stathjj_dt,
summary.ersk_cd,
summary.dsrp_nbr,
summary.lnbr,
summary.lk_ref,
summary.ooldesc_cd,
summary.lmn_cd
FROM prd.post.rr_summ AS summary
) AS summary
LEFT OUTER JOIN (
SELECT DISTINCT
text._id
text.utxct,
text.gl_cd
FROM prd.post.yy_textentr AS text
) AS text
ON text._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
sb._id
sb.mn_cd,
sb.mn_dt,
sb.evtext,
sb._start_date,
sb._end_date
FROM prd.post.kk_subevt AS sb
) AS sb
ON sb._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
re._id
re.rronbt,
re.nn_cd
FROM prd.post.lg_nbetr AS re
) AS re
ON re._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
rp._id
rp.rp_cd,
rp.rp_t_cd,
rp.rnme
FROM
prd.post.xx_rpopo AS rp
) AS rp
ON rp._id = summary._id
LEFT OUTER JOIN(
SELECT DISTINCT
pp._id
pp.fnbr,
pp.fdesc_cd
FROM prd.post.vv_kjiu AS pp
) AS pp
ON pp._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
ip._id
ip.fl_dt,
ip.fl_dttypcd
FROM prd.post.ll_jjhhhip AS ip
) AS ip
ON ip._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
xp._id
xp.jj_dt,
xp.oo_cd
FROM prd.post.uu_ghgghj AS xp
) AS xp
ON xp._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
op._id,
op.org_ref,
op.fl_nbr
FROM prd.post.qq_bhbhggg AS op
) AS op
ON op._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
kp._id
kp.evk_cd,
kp.evjk_dt
FROM prd.post.tt_hjhhsy AS kp
) AS kp
ON kp._id = summary._id;
所以这应该快得多。
查询 运行 缓慢的原因有很多,JOIN ORDER、数据倾斜、错误的基数估计、仓库大小等等。
因为你有这么多 JOIN,如果不查看查询配置文件,很难说。
最好的方法是打开支持票以供审核。