提高性能
Improve the performance
我有两组来自外部来源的数据 - 客户的购买日期和客户最后一封电子邮件 click/open 的日期。这分别存储在两个 tables PURCHASE_INTER 和 ACTIVITY_INTER tables 中。购买数据有多个,我需要提取最后一次购买日期。但是 activity 每个客户的数据都是唯一的。数据彼此独立,其他数据集可能不存在。我们写了下面的查询,它结合了两个 tables,根据 person_id 对它们进行分组,这是来自外部来源的客户的 ID 并获取最新的日期,加入我们的客户 table 获取客户电子邮件,并再次加入另一个 table 最终存储此数据的位置,以便了解它是插入操作还是更新操作。您能否建议我如何提高此查询的性能。它非常慢,需要 10 多个小时。在 PURCHASE_INTER 和 ACTIVITY_INTER table 秒内有数百万条记录。
SELECT INTER.*, C.ID AS CUSTOMER_ID, C.EMAIL AS CUSTOMER_EMAIL, LSI.ID AS INTERACTION_ID, ROW_NUMBER() OVER (ORDER BY PERSON_ID ASC) AS RN FROM (
SELECT PERSON_ID AS PERSON_ID,
MAX(LAST_CLICK_DATE) AS LAST_CLICK_DATE,
MAX(LAST_OPEN_DATE) AS LAST_OPEN_DATE,
MAX(LAST_PURCHASE_DATE) AS LAST_PURCHASE_DATE
FROM (
SELECT ACT.PERSON_ID AS PERSON_ID,
ACT.LAST_CLICK_DATE AS LAST_CLICK_DATE,
ACT.LAST_OPEN_DATE AS LAST_OPEN_DATE,
NULL AS LAST_PURCHASE_DATE
FROM ACTIVITY_INTER ACT
WHERE ACT.JOB_ID = 77318317
UNION
SELECT PUR.PERSON_ID AS PERSON_ID,
NULL AS LAST_CLICK_DATE,
NULL AS LAST_OPEN_DATE,
PUR.LAST_PURCHASE_DATE AS LAST_PURCHASE_DATE
FROM PURCHASE_INTER PUR
WHERE PUR.JOB_ID = 77318317
) GROUP BY PERSON_ID
) INTER LEFT JOIN CUSTOMER C ON INTER.PERSON_ID = C.PERSON_ID
LEFT JOIN INTERACTION LSI ON C.ID = LSI.CUSTOMER_ID;
您的查询建议使用以下索引:
ACTIVITY_INTER(JOB_ID, PERSON_ID, LAST_CLICK_DATE, LAST_OPEN_DATE)
PURCHASE_INTER(JOB_ID, PERSON_ID, LAST_PURCHASE_DATE)
CUSTOMER(PERSON_ID)
INTERACTION(CUSTOMER_ID)
(对于前两个索引,第一列比其他两列更重要,除非匹配数非常大。)
此外,将 UNION
更改为 UNION ALL
。 UNION
会产生删除重复项的开销——这是不可能的(至少在两个子查询之间),因为每个子查询 returns 不同的列。
此外,您可能希望将第一个子查询替换为 full outer join
:
SELECT COALESCE(a.PERSON_ID, p.PERSON_ID) as PERSON_ID,
a.LAST_CLICK_DATE, a.LAST_OPEN_DATE,p.LAST_PURCHASE_DATE
FROM (SELECT ACT.PERSON_ID AS PERSON_ID,
MAX(ACT.LAST_CLICK_DATE) AS LAST_CLICK_DATE,
MAX(ACT.LAST_OPEN_DATE) AS LAST_OPEN_DATE
FROM ACTIVITY_INTER ACT
WHERE ACT.JOB_ID = 77318317
GROUP BY ACT.PERSON_ID
) a FULL OUTER JOIN
(SELECT PUR.PERSON_ID AS PERSON_ID,
MAX(PUR.LAST_PURCHASE_DATE) AS LAST_PURCHASE_DATE
FROM PURCHASE_INTER PUR
WHERE PUR.JOB_ID = 77318317
GROUP BY PER.PERSON_ID
) p
ON a.PERSON_ID = p.PERSON_ID
这为 Oracle 提供了更多优化选项,因为聚合是直接在表上完成的——使索引和更好的统计信息可用于处理。
我有两组来自外部来源的数据 - 客户的购买日期和客户最后一封电子邮件 click/open 的日期。这分别存储在两个 tables PURCHASE_INTER 和 ACTIVITY_INTER tables 中。购买数据有多个,我需要提取最后一次购买日期。但是 activity 每个客户的数据都是唯一的。数据彼此独立,其他数据集可能不存在。我们写了下面的查询,它结合了两个 tables,根据 person_id 对它们进行分组,这是来自外部来源的客户的 ID 并获取最新的日期,加入我们的客户 table 获取客户电子邮件,并再次加入另一个 table 最终存储此数据的位置,以便了解它是插入操作还是更新操作。您能否建议我如何提高此查询的性能。它非常慢,需要 10 多个小时。在 PURCHASE_INTER 和 ACTIVITY_INTER table 秒内有数百万条记录。
SELECT INTER.*, C.ID AS CUSTOMER_ID, C.EMAIL AS CUSTOMER_EMAIL, LSI.ID AS INTERACTION_ID, ROW_NUMBER() OVER (ORDER BY PERSON_ID ASC) AS RN FROM (
SELECT PERSON_ID AS PERSON_ID,
MAX(LAST_CLICK_DATE) AS LAST_CLICK_DATE,
MAX(LAST_OPEN_DATE) AS LAST_OPEN_DATE,
MAX(LAST_PURCHASE_DATE) AS LAST_PURCHASE_DATE
FROM (
SELECT ACT.PERSON_ID AS PERSON_ID,
ACT.LAST_CLICK_DATE AS LAST_CLICK_DATE,
ACT.LAST_OPEN_DATE AS LAST_OPEN_DATE,
NULL AS LAST_PURCHASE_DATE
FROM ACTIVITY_INTER ACT
WHERE ACT.JOB_ID = 77318317
UNION
SELECT PUR.PERSON_ID AS PERSON_ID,
NULL AS LAST_CLICK_DATE,
NULL AS LAST_OPEN_DATE,
PUR.LAST_PURCHASE_DATE AS LAST_PURCHASE_DATE
FROM PURCHASE_INTER PUR
WHERE PUR.JOB_ID = 77318317
) GROUP BY PERSON_ID
) INTER LEFT JOIN CUSTOMER C ON INTER.PERSON_ID = C.PERSON_ID
LEFT JOIN INTERACTION LSI ON C.ID = LSI.CUSTOMER_ID;
您的查询建议使用以下索引:
ACTIVITY_INTER(JOB_ID, PERSON_ID, LAST_CLICK_DATE, LAST_OPEN_DATE)
PURCHASE_INTER(JOB_ID, PERSON_ID, LAST_PURCHASE_DATE)
CUSTOMER(PERSON_ID)
INTERACTION(CUSTOMER_ID)
(对于前两个索引,第一列比其他两列更重要,除非匹配数非常大。)
此外,将 UNION
更改为 UNION ALL
。 UNION
会产生删除重复项的开销——这是不可能的(至少在两个子查询之间),因为每个子查询 returns 不同的列。
此外,您可能希望将第一个子查询替换为 full outer join
:
SELECT COALESCE(a.PERSON_ID, p.PERSON_ID) as PERSON_ID,
a.LAST_CLICK_DATE, a.LAST_OPEN_DATE,p.LAST_PURCHASE_DATE
FROM (SELECT ACT.PERSON_ID AS PERSON_ID,
MAX(ACT.LAST_CLICK_DATE) AS LAST_CLICK_DATE,
MAX(ACT.LAST_OPEN_DATE) AS LAST_OPEN_DATE
FROM ACTIVITY_INTER ACT
WHERE ACT.JOB_ID = 77318317
GROUP BY ACT.PERSON_ID
) a FULL OUTER JOIN
(SELECT PUR.PERSON_ID AS PERSON_ID,
MAX(PUR.LAST_PURCHASE_DATE) AS LAST_PURCHASE_DATE
FROM PURCHASE_INTER PUR
WHERE PUR.JOB_ID = 77318317
GROUP BY PER.PERSON_ID
) p
ON a.PERSON_ID = p.PERSON_ID
这为 Oracle 提供了更多优化选项,因为聚合是直接在表上完成的——使索引和更好的统计信息可用于处理。