Snowflake SQL 查询在尝试应用多个连接时 运行 花费了太多时间

Snowflake SQL Query taking too much time to run when trying to apply multiple joins

我正在尝试在 snowflake 上使用 sql 查询,我试图在其中加入多个 table,但我的查询永远需要运行,我不确定这是我的查询还是我采取了一些错误的方法。

我在下面 tables 雪花中有这些 -

1)RR_SUMM, 2) YY_TEXTENTR, 3) KK_SUBEVT, 4) LG_NBETR, 5) XX_RPOPO, 6) VV_KJIU, 7) LL_JJHHHIP, 8) UU_GHGGHJ,
9) QQ_BHBHGGG, 10) TT_HJHHSY

所以RR_SUMM是我的主要table

并且每个 table 都包含一个标记为 "_ID" 的公共列

我的目标是使用 _ID[将所有其他 9 个 table 与主 table RR_SUMM 连接起来

因为我正在尝试从每个 table 中提取某些字段并将其与主要字段 table 合并。

我正在按照应用左外连接的方法将所有其他 table 与主 table RR_SUMM

但我的方法是 运行,因为大多数 table 的大小约为 25 GB。

SQL 我在SNOWFLAKE 中写的查询如下-

INSERT INTO  "PRD"."POST"."_ALL_EVENTS" 
SELECT
DISTINCT SUMMARY._ID,
SUMMARY.GEP_ID,
SUMMARY.AK_CD,
SUMMARY.AK1_CD,
SUMMARY.AK2_CD,
SUMMARY.JJ_DT,
SUMMARY.IL_OVRD,
SUMMARY.STRT_DT,
SUMMARY.EVENT_DT,
SUMMARY.PUNLICATION_DT,
SUMMARY.END_DT,
SUMMARY.END_1DT,
SUMMARY.OO_IND,
SUMMARY.EXPN_DT,
SUMMARY.STATHJJ_CD,
SUMMARY.STATHJJ_DT,
SUMMARY.ERSK_CD,
SUMMARY.DSRP_NBR,
SUMMARY.LNBR,
SUMMARY.LK_REF,
SUMMARY.OOLDESC_CD,
SUMMARY.LMN_CD,
TEXT.UTXCT,
TEXT.GL_CD,
SB.MN_CD,
SB.MN_DT,
SB.EVTEXT,
SB._START_DATE,
SB._END_DATE,
RE.RRONBT,
RE.NN_CD,
RP.RP_CD,
RP.RP_T_CD,
RP.RNME,
PP.FNBR,
PP.FDESC_CD,
IP.FL_DT,
IP.FL_DTTYPCD,
XP.JJ_DT,
XP.OO_CD,
OP.ORG_REF,
OP.FL_NBR,
KP.EVK_CD,
KP.EVJK_DT

 FROM 
 
"PRD"."POST"."RR_SUMM" SUMMARY
 
 LEFT OUTER JOIN 
 
 "PRD"."POST"."YY_TEXTENTR" TEXT
 
 ON TEXT._ID = SUMMARY._ID
 
 LEFT OUTER JOIN 
 
 "PRD"."POST"."KK_SUBEVT" SB
 
 ON  SB._ID = SUMMARY._ID
 
 LEFT OUTER JOIN 
 
"PRD"."POST"."LG_NBETR" RE
 
 ON RE._ID = SUMMARY._ID
 
 LEFT OUTER JOIN 
 
 "PRD"."POST"."XX_RPOPO" RP
 
 ON RP._ID = SUMMARY._ID
 
 LEFT OUTER JOIN 
 
 "PRD"."POST"."VV_KJIU" PP
 
  ON PP._ID = SUMMARY._ID
 
 LEFT OUTER JOIN 
 
 "PRD"."POST"."LL_JJHHHIP" IP
 
  ON IP._ID = SUMMARY._ID
 
  LEFT OUTER JOIN 
 
 "PRD"."POST"."UU_GHGGHJ" XP
 
  
  ON XP._ID = SUMMARY._ID
   
 LEFT OUTER JOIN 
 
 "PRD"."POST"."QQ_BHBHGGG" OP
 
 
  ON OP._ID = SUMMARY._ID
   
  LEFT OUTER JOIN 
 
 
"PRD"."POST"."TT_HJHHSY" KP
 
 ON KP._ID = SUMMARY._ID
 


GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44;

请让我知道是否有任何其他方法可以解决我的问题,我可以快速按照 运行 我的查询。我无法限制我的数据,因为我需要所有数据进行分析。

任何帮助将不胜感激。

谢谢

在一个级别上,所有列的 GROUPING 与 DISTINCT 相同。

但是鉴于您要将其全部汇总,为了只获得一个,您可以将 DISTINCT 推入查询中,并且连接应该没有重复值

SELECT
    summary._id,
    summary.gep_id,
    summary.ak_cd,
    summary.ak1_cd,
    summary.ak2_cd,
    summary.jj_dt,
    summary.il_ovrd,
    summary.strt_dt,
    summary.event_dt,
    summary.punlication_dt,
    summary.end_dt,
    summary.end_1dt,
    summary.oo_ind,
    summary.expn_dt,
    summary.stathjj_cd,
    summary.stathjj_dt,
    summary.ersk_cd,
    summary.dsrp_nbr,
    summary.lnbr,
    summary.lk_ref,
    summary.ooldesc_cd,
    summary.lmn_cd,
    text.utxct,
    text.gl_cd,
    sb.mn_cd,
    sb.mn_dt,
    sb.evtext,
    sb._start_date,
    sb._end_date,
    re.rronbt,
    re.nn_cd,
    rp.rp_cd,
    rp.rp_t_cd,
    rp.rnme,
    pp.fnbr,
    pp.fdesc_cd,
    ip.fl_dt,
    ip.fl_dttypcd,
    xp.jj_dt,
    xp.oo_cd,
    op.org_ref,
    op.fl_nbr,
    kp.evk_cd,
    kp.evjk_dt
FROM ( 
    SELECT DISTINCT
        summary._id,
        summary.gep_id,
        summary.ak_cd,
        summary.ak1_cd,
        summary.ak2_cd,
        summary.jj_dt,
        summary.il_ovrd,
        summary.strt_dt,
        summary.event_dt,
        summary.punlication_dt,
        summary.end_dt,
        summary.end_1dt,
        summary.oo_ind,
        summary.expn_dt,
        summary.stathjj_cd,
        summary.stathjj_dt,
        summary.ersk_cd,
        summary.dsrp_nbr,
        summary.lnbr,
        summary.lk_ref,
        summary.ooldesc_cd,
        summary.lmn_cd
    FROM prd.post.rr_summ AS summary
) AS summary
LEFT OUTER JOIN (
    SELECT DISTINCT
        text._id
        text.utxct,
        text.gl_cd
    FROM prd.post.yy_textentr AS text
) AS text
    ON text._id = summary._id
LEFT OUTER JOIN (
    SELECT DISTINCT
        sb._id
        sb.mn_cd,
        sb.mn_dt,
        sb.evtext,
        sb._start_date,
        sb._end_date
    FROM prd.post.kk_subevt AS sb
) AS sb
    ON sb._id = summary._id
LEFT OUTER JOIN (
    SELECT DISTINCT
        re._id
        re.rronbt,
        re.nn_cd
    FROM prd.post.lg_nbetr AS re
) AS re
    ON re._id = summary._id
LEFT OUTER JOIN (
    SELECT DISTINCT
        rp._id
        rp.rp_cd,
        rp.rp_t_cd,
        rp.rnme
    FROM    
 prd.post.xx_rpopo AS rp
) AS rp
    ON rp._id = summary._id
 LEFT OUTER JOIN(
    SELECT DISTINCT
        pp._id
        pp.fnbr,
        pp.fdesc_cd
    FROM prd.post.vv_kjiu AS pp
) AS pp
    ON pp._id = summary._id
LEFT OUTER JOIN (
    SELECT DISTINCT
        ip._id
        ip.fl_dt,
        ip.fl_dttypcd
    FROM prd.post.ll_jjhhhip AS ip
) AS ip
    ON ip._id = summary._id
LEFT OUTER JOIN (
    SELECT DISTINCT
        xp._id
        xp.jj_dt,
        xp.oo_cd
    FROM prd.post.uu_ghgghj AS xp
) AS xp  
    ON xp._id = summary._id 
LEFT OUTER JOIN (
    SELECT DISTINCT
        op._id,
        op.org_ref,
        op.fl_nbr
    FROM prd.post.qq_bhbhggg AS op
) AS op
    ON op._id = summary._id
LEFT OUTER JOIN (
    SELECT DISTINCT
        kp._id
        kp.evk_cd,
        kp.evjk_dt
    FROM prd.post.tt_hjhhsy AS kp
) AS kp
    ON kp._id = summary._id;

所以这应该快得多。

查询 运行 缓慢的原因有很多,JOIN ORDER、数据倾斜、错误的基数估计、仓库大小等等。

因为你有这么多 JOIN,如果不查看查询配置文件,很难说。

最好的方法是打开支持票以供审核。