提高性能

Improve the performance

我有两组来自外部来源的数据 - 客户的购买日期和客户最后一封电子邮件 click/open 的日期。这分别存储在两个 tables PURCHASE_INTER 和 ACTIVITY_INTER tables 中。购买数据有多个,我需要提取最后一次购买日期。但是 activity 每个客户的数据都是唯一的。数据彼此独立,其他数据集可能不存在。我们写了下面的查询,它结合了两个 tables,根据 person_id 对它们进行分组,这是来自外部来源的客户的 ID 并获取最新的日期,加入我们的客户 table 获取客户电子邮件,并再次加入另一个 table 最终存储此数据的位置,以便了解它是插入操作还是更新操作。您能否建议我如何提高此查询的性能。它非常慢,需要 10 多个小时。在 PURCHASE_INTER 和 ACTIVITY_INTER table 秒内有数百万条记录。

SELECT INTER.*, C.ID AS CUSTOMER_ID, C.EMAIL AS CUSTOMER_EMAIL, LSI.ID AS INTERACTION_ID, ROW_NUMBER() OVER (ORDER BY PERSON_ID ASC) AS RN FROM (
   SELECT PERSON_ID               AS PERSON_ID,
        MAX(LAST_CLICK_DATE)    AS LAST_CLICK_DATE,
        MAX(LAST_OPEN_DATE)     AS LAST_OPEN_DATE,
        MAX(LAST_PURCHASE_DATE) AS LAST_PURCHASE_DATE
   FROM (
     SELECT ACT.PERSON_ID AS PERSON_ID,
          ACT.LAST_CLICK_DATE AS LAST_CLICK_DATE,
          ACT.LAST_OPEN_DATE AS LAST_OPEN_DATE,
          NULL AS LAST_PURCHASE_DATE
     FROM ACTIVITY_INTER ACT
     WHERE ACT.JOB_ID = 77318317
     UNION
     SELECT PUR.PERSON_ID AS PERSON_ID,
          NULL AS LAST_CLICK_DATE,
          NULL AS LAST_OPEN_DATE,
          PUR.LAST_PURCHASE_DATE AS LAST_PURCHASE_DATE
     FROM PURCHASE_INTER PUR
     WHERE PUR.JOB_ID = 77318317
   ) GROUP BY PERSON_ID
 ) INTER LEFT JOIN CUSTOMER C ON INTER.PERSON_ID = C.PERSON_ID
         LEFT JOIN INTERACTION LSI ON C.ID = LSI.CUSTOMER_ID;

您的查询建议使用以下索引:

  • ACTIVITY_INTER(JOB_ID, PERSON_ID, LAST_CLICK_DATE, LAST_OPEN_DATE)
  • PURCHASE_INTER(JOB_ID, PERSON_ID, LAST_PURCHASE_DATE)
  • CUSTOMER(PERSON_ID)
  • INTERACTION(CUSTOMER_ID)

(对于前两个索引,第一列比其他两列更重要,除非匹配数非常大。)

此外,将 UNION 更改为 UNION ALLUNION 会产生删除重复项的开销——这是不可能的(至少在两个子查询之间),因为每个子查询 returns 不同的列。

此外,您可能希望将第一个子查询替换为 full outer join:

SELECT COALESCE(a.PERSON_ID, p.PERSON_ID) as PERSON_ID,
       a.LAST_CLICK_DATE, a.LAST_OPEN_DATE,p.LAST_PURCHASE_DATE
FROM (SELECT ACT.PERSON_ID AS PERSON_ID,
             MAX(ACT.LAST_CLICK_DATE) AS LAST_CLICK_DATE,
             MAX(ACT.LAST_OPEN_DATE) AS LAST_OPEN_DATE
      FROM ACTIVITY_INTER ACT
      WHERE ACT.JOB_ID = 77318317
      GROUP BY ACT.PERSON_ID
     ) a FULL OUTER JOIN
     (SELECT PUR.PERSON_ID AS PERSON_ID,
             MAX(PUR.LAST_PURCHASE_DATE) AS LAST_PURCHASE_DATE
      FROM PURCHASE_INTER PUR
      WHERE PUR.JOB_ID = 77318317
      GROUP BY PER.PERSON_ID
     ) p
     ON a.PERSON_ID = p.PERSON_ID

这为 Oracle 提供了更多优化选项,因为聚合是直接在表上完成的——使索引和更好的统计信息可用于处理。