DataBricks 和 MERGE INTO:如何使用两列作为合并键?
DataBricks and MERGE INTO: How to use two columns as the merge key?
在此示例中:https://docs.databricks.com/_static/notebooks/merge-in-scd-type-2.html,单个自然键用于执行 MERGE INTO 逻辑,如下所示:
MERGE INTO customers
USING (
-- These rows will either UPDATE the current addresses of existing customers or INSERT the new addresses of new customers
SELECT updates.customerId as mergeKey, updates.*
FROM updates
UNION ALL
-- These rows will INSERT new addresses of existing customers
-- Setting the mergeKey to NULL forces these rows to NOT MATCH and be INSERTed.
SELECT NULL as mergeKey, updates.*
FROM updates JOIN customers
ON updates.customerid = customers.customerid
WHERE customers.current = true AND updates.address <> customers.address
) staged_updates
ON customers.customerId = mergeKey
WHEN MATCHED AND customers.current = true AND customers.address <> staged_updates.address THEN
UPDATE SET current = false, endDate = staged_updates.effectiveDate -- Set current to false and endDate to source's effective date.
WHEN NOT MATCHED THEN
INSERT(customerid, address, current, effectivedate, enddate)
VALUES(staged_updates.customerId, staged_updates.address, true, staged_updates.effectiveDate, null) -- Set current to true along with the new address and its effective date.
在这种情况下,除了 customerId
之外,我如何使用第二列作为 mergeKey
?
只需使用 AND
:
组合它们
ON customers.customerId = staged_updates.customerId
AND customers.<second_column> = staged_updates.<second_column>
和两表JOIN一样,需要提供join条件
在此示例中:https://docs.databricks.com/_static/notebooks/merge-in-scd-type-2.html,单个自然键用于执行 MERGE INTO 逻辑,如下所示:
MERGE INTO customers
USING (
-- These rows will either UPDATE the current addresses of existing customers or INSERT the new addresses of new customers
SELECT updates.customerId as mergeKey, updates.*
FROM updates
UNION ALL
-- These rows will INSERT new addresses of existing customers
-- Setting the mergeKey to NULL forces these rows to NOT MATCH and be INSERTed.
SELECT NULL as mergeKey, updates.*
FROM updates JOIN customers
ON updates.customerid = customers.customerid
WHERE customers.current = true AND updates.address <> customers.address
) staged_updates
ON customers.customerId = mergeKey
WHEN MATCHED AND customers.current = true AND customers.address <> staged_updates.address THEN
UPDATE SET current = false, endDate = staged_updates.effectiveDate -- Set current to false and endDate to source's effective date.
WHEN NOT MATCHED THEN
INSERT(customerid, address, current, effectivedate, enddate)
VALUES(staged_updates.customerId, staged_updates.address, true, staged_updates.effectiveDate, null) -- Set current to true along with the new address and its effective date.
在这种情况下,除了 customerId
之外,我如何使用第二列作为 mergeKey
?
只需使用 AND
:
ON customers.customerId = staged_updates.customerId
AND customers.<second_column> = staged_updates.<second_column>
和两表JOIN一样,需要提供join条件